[med-svn] [Git][med-team/python-xopen][upstream] New upstream version 0.9.0
Nilesh Patra
gitlab at salsa.debian.org
Wed Jul 22 21:09:53 BST 2020
Nilesh Patra pushed to branch upstream at Debian Med / python-xopen
Commits:
a2294648 by Nilesh Patra at 2020-07-23T01:30:36+05:30
New upstream version 0.9.0
- - - - -
16 changed files:
- + .codecov.yml
- .travis.yml
- PKG-INFO
- README.rst
- setup.cfg
- setup.py
- src/xopen.egg-info/PKG-INFO
- src/xopen.egg-info/SOURCES.txt
- src/xopen.egg-info/requires.txt
- src/xopen/__init__.py
- src/xopen/_version.py
- + tests/file.txt.bz2.test
- + tests/file.txt.gz.test
- + tests/file.txt.xz.test
- tests/test_xopen.py
- tox.ini
Changes:
=====================================
.codecov.yml
=====================================
@@ -0,0 +1,16 @@
+comment: off
+
+codecov:
+ require_ci_to_pass: no
+
+coverage:
+ precision: 1
+ round: down
+ range: "70...100"
+
+ status:
+ project: yes
+ patch: no
+ changes: no
+
+comment: off
=====================================
.travis.yml
=====================================
@@ -7,20 +7,24 @@ cache:
- $HOME/.cache/pip
python:
- - "2.7"
- - "3.4"
- "3.5"
- "3.6"
- "3.7"
- - "3.8-dev"
+ - "3.8"
+ - "pypy3"
install:
+ - sudo apt-get update && sudo apt-get install -y pigz
+ - pip install --upgrade coverage codecov
- pip install .
script:
- - sudo apt-get update && sudo apt-get install -y pigz
- python setup.py --version # Detect encoding problems
- - python -m pytest
+ - coverage run -m pytest
+
+after_success:
+ - coverage combine
+ - codecov
env:
global:
@@ -37,8 +41,11 @@ jobs:
script:
- |
python3 setup.py sdist
+ python3 -m pip wheel -w dist/ .
ls -l dist/
- python3 -m twine upload dist/*
+ python3 -m twine upload dist/xopen-*
- allowed_failures:
- - python: "3.8-dev"
\ No newline at end of file
+ - name: flake8
+ python: "3.6"
+ install: python3 -m pip install flake8
+ script: flake8 src/ tests/
=====================================
PKG-INFO
=====================================
@@ -1,16 +1,25 @@
Metadata-Version: 2.1
Name: xopen
-Version: 0.7.3
+Version: 0.9.0
Summary: Open compressed files transparently
Home-page: https://github.com/marcelm/xopen/
Author: Marcel Martin
Author-email: mail at marcelm.net
License: MIT
Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
- :target: https://travis-ci.org/marcelm/xopen
-
+ :target: https://travis-ci.org/marcelm/xopen
+ :alt:
+
.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
- :target: https://pypi.python.org/pypi/xopen
+ :target: https://pypi.python.org/pypi/xopen
+
+ .. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg
+ :target: https://anaconda.org/conda-forge/xopen
+ :alt:
+
+ .. image:: https://codecov.io/gh/marcelm/xopen/branch/master/graph/badge.svg
+ :target: https://codecov.io/gh/marcelm/xopen
+ :alt:
=====
xopen
@@ -28,12 +37,12 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
when reading ``.gz`` files, so it is used both for reading and writing if it is
available.
- This module has originally been developed as part of the `cutadapt
+ This module has originally been developed as part of the `Cutadapt
tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
manipulate sequencing data. It has been in successful use within that software
for a few years.
- ``xopen`` is compatible with Python versions 2.7 and 3.4 to 3.7.
+ ``xopen`` is compatible with Python versions 3.5 to 3.8.
Usage
@@ -75,6 +84,9 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to
make reading gzipped files faster.
+ Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
+ format detection from content.
+
Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
If you also want to open S3 files, you may want to use that module instead.
@@ -82,11 +94,40 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
Changes
-------
+ v0.9.0
+ ~~~~~~
+
+ * When the file name extension of a file to be opened for reading is not
+ available, the content is inspected (if possible) and used to determine
+ which compression format applies.
+ * This release drops Python 2.7 and 3.4 support. Python 3.5 or later is
+ now required.
+
+ v0.8.4
+ ~~~~~~
+ * When reading gzipped files, force ``pigz`` to use only a single process.
+ ``pigz`` cannot use multiple cores anyway when decompressing. By default,
+ it would use extra I/O processes, which slightly reduces wall-clock time,
+ but increases CPU time. Single-core decompression with ``pigz`` is still
+ about twice as fast as regular ``gzip``.
+ * Allow ``threads=0`` for specifying that no external ``pigz``/``gzip``
+ process should be used (then regular ``gzip.open()`` is used instead).
+
+ v0.8.3
+ ~~~~~~
+ * When reading gzipped files, let ``pigz`` use at most four threads by default.
+ This limit previously only applied when writing to a file.
+ * Support Python 3.8
+
+ v0.8.0
+ ~~~~~~
+ * Speed improvements when iterating over gzipped files.
+
v0.6.0
~~~~~~
* For reading from gzipped files, xopen will now use a ``pigz`` subprocess.
This is faster than using ``gzip.open``.
- * Python 2 supported will be dropped in one of the next releases..
+ * Python 2 support will be dropped in one of the next releases.
v0.5.0
~~~~~~
@@ -108,13 +149,12 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
* `Project page on PyPI (Python package index) <https://pypi.python.org/pypi/xopen/>`_
Platform: UNKNOWN
-Classifier: Development Status :: 4 - Beta
+Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: MIT License
-Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
-Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4
+Classifier: Programming Language :: Python :: 3.8
+Requires-Python: >=3.5
Provides-Extra: dev
=====================================
README.rst
=====================================
@@ -1,8 +1,17 @@
.. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
- :target: https://travis-ci.org/marcelm/xopen
-
+ :target: https://travis-ci.org/marcelm/xopen
+ :alt:
+
.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
- :target: https://pypi.python.org/pypi/xopen
+ :target: https://pypi.python.org/pypi/xopen
+
+.. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg
+ :target: https://anaconda.org/conda-forge/xopen
+ :alt:
+
+.. image:: https://codecov.io/gh/marcelm/xopen/branch/master/graph/badge.svg
+ :target: https://codecov.io/gh/marcelm/xopen
+ :alt:
=====
xopen
@@ -20,12 +29,12 @@ function. ``pigz`` can use multiple threads when compressing, but is also faster
when reading ``.gz`` files, so it is used both for reading and writing if it is
available.
-This module has originally been developed as part of the `cutadapt
+This module has originally been developed as part of the `Cutadapt
tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
manipulate sequencing data. It has been in successful use within that software
for a few years.
-``xopen`` is compatible with Python versions 2.7 and 3.4 to 3.7.
+``xopen`` is compatible with Python versions 3.5 to 3.8.
Usage
@@ -67,6 +76,9 @@ appending to files.
Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to
make reading gzipped files faster.
+Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
+format detection from content.
+
Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
If you also want to open S3 files, you may want to use that module instead.
@@ -74,11 +86,40 @@ If you also want to open S3 files, you may want to use that module instead.
Changes
-------
+v0.9.0
+~~~~~~
+
+* When the file name extension of a file to be opened for reading is not
+ available, the content is inspected (if possible) and used to determine
+ which compression format applies.
+* This release drops Python 2.7 and 3.4 support. Python 3.5 or later is
+ now required.
+
+v0.8.4
+~~~~~~
+* When reading gzipped files, force ``pigz`` to use only a single process.
+ ``pigz`` cannot use multiple cores anyway when decompressing. By default,
+ it would use extra I/O processes, which slightly reduces wall-clock time,
+ but increases CPU time. Single-core decompression with ``pigz`` is still
+ about twice as fast as regular ``gzip``.
+* Allow ``threads=0`` for specifying that no external ``pigz``/``gzip``
+ process should be used (then regular ``gzip.open()`` is used instead).
+
+v0.8.3
+~~~~~~
+* When reading gzipped files, let ``pigz`` use at most four threads by default.
+ This limit previously only applied when writing to a file.
+* Support Python 3.8
+
+v0.8.0
+~~~~~~
+* Speed improvements when iterating over gzipped files.
+
v0.6.0
~~~~~~
* For reading from gzipped files, xopen will now use a ``pigz`` subprocess.
This is faster than using ``gzip.open``.
-* Python 2 supported will be dropped in one of the next releases.
+* Python 2 support will be dropped in one of the next releases.
v0.5.0
~~~~~~
=====================================
setup.cfg
=====================================
@@ -1,6 +1,17 @@
[bdist_wheel]
universal = 1
+[coverage:run]
+parallel = True
+include =
+ */site-packages/xopen/*
+ tests/*
+
+[coverage:paths]
+source =
+ src/
+ **/site-packages/
+
[egg_info]
tag_build =
tag_date = 0
=====================================
setup.py
=====================================
@@ -1,8 +1,8 @@
import sys
from setuptools import setup, find_packages
-if sys.version_info < (2, 7):
- sys.stdout.write("At least Python 2.7 is required.\n")
+if sys.version_info < (3, 5):
+ sys.stdout.write("At least Python 3.5 is required.\n")
sys.exit(1)
with open('README.rst') as f:
@@ -20,21 +20,17 @@ setup(
license='MIT',
package_dir={'': 'src'},
packages=find_packages('src'),
- install_requires=[
- 'bz2file; python_version=="2.7"',
- ],
extras_require={
'dev': ['pytest'],
},
- python_requires='>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4',
+ python_requires='>=3.5',
classifiers=[
- "Development Status :: 4 - Beta",
+ "Development Status :: 5 - Production/Stable",
"License :: OSI Approved :: MIT License",
- "Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3",
- "Programming Language :: Python :: 3.4",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
+ "Programming Language :: Python :: 3.8",
]
)
=====================================
src/xopen.egg-info/PKG-INFO
=====================================
@@ -1,16 +1,25 @@
Metadata-Version: 2.1
Name: xopen
-Version: 0.7.3
+Version: 0.9.0
Summary: Open compressed files transparently
Home-page: https://github.com/marcelm/xopen/
Author: Marcel Martin
Author-email: mail at marcelm.net
License: MIT
Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
- :target: https://travis-ci.org/marcelm/xopen
-
+ :target: https://travis-ci.org/marcelm/xopen
+ :alt:
+
.. image:: https://img.shields.io/pypi/v/xopen.svg?branch=master
- :target: https://pypi.python.org/pypi/xopen
+ :target: https://pypi.python.org/pypi/xopen
+
+ .. image:: https://img.shields.io/conda/v/conda-forge/xopen.svg
+ :target: https://anaconda.org/conda-forge/xopen
+ :alt:
+
+ .. image:: https://codecov.io/gh/marcelm/xopen/branch/master/graph/badge.svg
+ :target: https://codecov.io/gh/marcelm/xopen
+ :alt:
=====
xopen
@@ -28,12 +37,12 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
when reading ``.gz`` files, so it is used both for reading and writing if it is
available.
- This module has originally been developed as part of the `cutadapt
+ This module has originally been developed as part of the `Cutadapt
tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
manipulate sequencing data. It has been in successful use within that software
for a few years.
- ``xopen`` is compatible with Python versions 2.7 and 3.4 to 3.7.
+ ``xopen`` is compatible with Python versions 3.5 to 3.8.
Usage
@@ -75,6 +84,9 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to
make reading gzipped files faster.
+ Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
+ format detection from content.
+
Some ideas were taken from the `canopener project <https://github.com/selassid/canopener>`_.
If you also want to open S3 files, you may want to use that module instead.
@@ -82,11 +94,40 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
Changes
-------
+ v0.9.0
+ ~~~~~~
+
+ * When the file name extension of a file to be opened for reading is not
+ available, the content is inspected (if possible) and used to determine
+ which compression format applies.
+ * This release drops Python 2.7 and 3.4 support. Python 3.5 or later is
+ now required.
+
+ v0.8.4
+ ~~~~~~
+ * When reading gzipped files, force ``pigz`` to use only a single process.
+ ``pigz`` cannot use multiple cores anyway when decompressing. By default,
+ it would use extra I/O processes, which slightly reduces wall-clock time,
+ but increases CPU time. Single-core decompression with ``pigz`` is still
+ about twice as fast as regular ``gzip``.
+ * Allow ``threads=0`` for specifying that no external ``pigz``/``gzip``
+ process should be used (then regular ``gzip.open()`` is used instead).
+
+ v0.8.3
+ ~~~~~~
+ * When reading gzipped files, let ``pigz`` use at most four threads by default.
+ This limit previously only applied when writing to a file.
+ * Support Python 3.8
+
+ v0.8.0
+ ~~~~~~
+ * Speed improvements when iterating over gzipped files.
+
v0.6.0
~~~~~~
* For reading from gzipped files, xopen will now use a ``pigz`` subprocess.
This is faster than using ``gzip.open``.
- * Python 2 supported will be dropped in one of the next releases..
+ * Python 2 support will be dropped in one of the next releases.
v0.5.0
~~~~~~
@@ -108,13 +149,12 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
* `Project page on PyPI (Python package index) <https://pypi.python.org/pypi/xopen/>`_
Platform: UNKNOWN
-Classifier: Development Status :: 4 - Beta
+Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: MIT License
-Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
-Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4
+Classifier: Programming Language :: Python :: 3.8
+Requires-Python: >=3.5
Provides-Extra: dev
=====================================
src/xopen.egg-info/SOURCES.txt
=====================================
@@ -1,3 +1,4 @@
+.codecov.yml
.editorconfig
.gitignore
.travis.yml
@@ -16,7 +17,10 @@ src/xopen.egg-info/requires.txt
src/xopen.egg-info/top_level.txt
tests/file.txt
tests/file.txt.bz2
+tests/file.txt.bz2.test
tests/file.txt.gz
+tests/file.txt.gz.test
tests/file.txt.xz
+tests/file.txt.xz.test
tests/hello.gz
tests/test_xopen.py
\ No newline at end of file
=====================================
src/xopen.egg-info/requires.txt
=====================================
@@ -1,6 +1,3 @@
-[:python_version == "2.7"]
-bz2file
-
[dev]
pytest
=====================================
src/xopen/__init__.py
=====================================
@@ -1,51 +1,41 @@
"""
Open compressed files transparently.
"""
-from __future__ import print_function, division, absolute_import
+
+__all__ = ["xopen", "PipedGzipWriter", "PipedGzipReader", "__version__"]
import gzip
import sys
import io
import os
+import bz2
import time
+import stat
+import signal
+import pathlib
from subprocess import Popen, PIPE
from ._version import version as __version__
-_PY3 = sys.version > '3'
-
-if not _PY3:
- import bz2file as bz2
-else:
- try:
- import bz2
- except ImportError:
- bz2 = None
-
try:
import lzma
except ImportError:
lzma = None
-if _PY3:
- basestring = str
-
-try:
- import pathlib # Exists in Python 3.4+
-except ImportError:
- pathlib = None
-
try:
from os import fspath # Exists in Python 3.6+
except ImportError:
def fspath(path):
if hasattr(path, "__fspath__"):
return path.__fspath__()
- # Python 3.4 and 3.5 do not support the file system path protocol
+ # Python 3.4 and 3.5 have pathlib, but do not support the file system
+ # path protocol
if pathlib is not None and isinstance(path, pathlib.Path):
return str(path)
+ if not isinstance(path, str):
+ raise TypeError("path must be a string")
return path
@@ -67,7 +57,7 @@ def _available_cpu_count():
res = bin(int(m.group(1).replace(',', ''), 16)).count('1')
if res > 0:
return res
- except IOError:
+ except OSError:
pass
try:
import multiprocessing
@@ -76,11 +66,12 @@ def _available_cpu_count():
return 1
-class Closing(object):
+class Closing:
"""
Inherit from this class and implement a close() method to offer context
manager functionality.
"""
+
def __enter__(self):
return self
@@ -90,16 +81,20 @@ class Closing(object):
def __del__(self):
try:
self.close()
- except:
+ except Exception:
pass
class PipedGzipWriter(Closing):
"""
Write gzip-compressed files by running an external gzip or pigz process and
- piping into it. On Python 2, this is faster than using gzip.open(). On
- Python 3, it allows to run the compression in a separate process and can
- therefore also be faster.
+ piping into it. pigz is tried first. It is fast because it can compress using
+ multiple cores.
+
+ If pigz is not available, a gzip subprocess is used. On Python 2, this saves
+ CPU time because gzip.GzipFile is slower. On Python 3, gzip.GzipFile is on
+ par with gzip itself, but running an external gzip can still reduce wall-clock
+ time because the compression happens in a separate process.
"""
def __init__(self, path, mode='wt', compresslevel=6, threads=None):
@@ -111,7 +106,8 @@ class PipedGzipWriter(Closing):
at four to avoid creating too many threads. Use 0 to let pigz use all available cores.
"""
if mode not in ('w', 'wt', 'wb', 'a', 'at', 'ab'):
- raise ValueError("Mode is '{0}', but it must be 'w', 'wt', 'wb', 'a', 'at' or 'ab'".format(mode))
+ raise ValueError(
+ "Mode is '{}', but it must be 'w', 'wt', 'wb', 'a', 'at' or 'ab'".format(mode))
# TODO use a context manager
self.outfile = open(path, mode)
@@ -119,56 +115,62 @@ class PipedGzipWriter(Closing):
self.closed = False
self.name = path
- kwargs = dict(stdin=PIPE, stdout=self.outfile, stderr=self.devnull)
- # Setting close_fds to True in the Popen arguments is necessary due to
- # <http://bugs.python.org/issue12786>.
- # However, close_fds is not supported on Windows. See
- # <https://github.com/marcelm/cutadapt/issues/315>.
- if sys.platform != 'win32':
- kwargs['close_fds'] = True
-
- if 'w' in mode and compresslevel != 6:
- extra_args = ['-' + str(compresslevel)]
- else:
- extra_args = []
-
- pigz_args = ['pigz']
if threads is None:
threads = min(_available_cpu_count(), 4)
- if threads != 0:
- pigz_args += ['-p', str(threads)]
try:
- self.process = Popen(pigz_args + extra_args, **kwargs)
- self.program = 'pigz'
+ self.process, self.program = self._open_process(
+ mode, compresslevel, threads, self.outfile, self.devnull)
except OSError:
- # pigz not found, try regular gzip
- try:
- self.process = Popen(['gzip'] + extra_args, **kwargs)
- self.program = 'gzip'
- except (IOError, OSError):
- self.outfile.close()
- self.devnull.close()
- raise
- except IOError: # TODO IOError is the same as OSError on Python 3.3
self.outfile.close()
self.devnull.close()
raise
- if _PY3 and 'b' not in mode:
+
+ if 'b' not in mode:
self._file = io.TextIOWrapper(self.process.stdin)
else:
self._file = self.process.stdin
+ @staticmethod
+ def _open_process(mode, compresslevel, threads, outfile, devnull):
+ pigz_args = ['pigz']
+ if threads != 0:
+ pigz_args += ['-p', str(threads)]
+ extra_args = []
+ if 'w' in mode and compresslevel != 6:
+ extra_args += ['-' + str(compresslevel)]
+
+ kwargs = dict(stdin=PIPE, stdout=outfile, stderr=devnull)
+
+ # Setting close_fds to True in the Popen arguments is necessary due to
+ # <http://bugs.python.org/issue12786>.
+ # However, close_fds is not supported on Windows. See
+ # <https://github.com/marcelm/cutadapt/issues/315>.
+ if sys.platform != 'win32':
+ kwargs['close_fds'] = True
+
+ try:
+ process = Popen(pigz_args + extra_args, **kwargs)
+ program = 'pigz'
+ except OSError: # TODO Use FileNotFound instead (Python 3)
+ # pigz not found, try regular gzip
+ process = Popen(['gzip'] + extra_args, **kwargs)
+ program = 'gzip'
+ return process, program
+
def write(self, arg):
self._file.write(arg)
def close(self):
+ if self.closed:
+ return
self.closed = True
self._file.close()
retcode = self.process.wait()
self.outfile.close()
self.devnull.close()
if retcode != 0:
- raise IOError("Output {0} process terminated with exit code {1}".format(self.program, retcode))
+ raise OSError(
+ "Output {} process terminated with exit code {}".format(self.program, retcode))
def __iter__(self):
return self
@@ -180,26 +182,38 @@ class PipedGzipWriter(Closing):
class PipedGzipReader(Closing):
"""
Open a pipe to pigz for reading a gzipped file. Even though pigz is mostly
- used to speed up writing, when it can use many compression threads, it is
- also faster than gzip when reading (three times faster).
+ used to speed up writing by using many compression threads, it is
+ also faster when reading, even when forced to use a single thread
+ (ca. 2x speedup).
"""
- def __init__(self, path, mode='r'):
+ def __init__(self, path, mode='r', threads=None):
"""
Raise an OSError when pigz could not be found.
"""
if mode not in ('r', 'rt', 'rb'):
- raise ValueError("Mode is '{0}', but it must be 'r', 'rt' or 'rb'".format(mode))
- self.process = Popen(['pigz', '-cd', path], stdout=PIPE, stderr=PIPE)
+ raise ValueError("Mode is '{}', but it must be 'r', 'rt' or 'rb'".format(mode))
+
+ pigz_args = ['pigz', '-cd', path]
+
+ if threads is None:
+ # Single threaded behaviour by default because:
+ # - Using a single thread to read a file is the least unexpected
+ # behaviour. (For users of xopen, who do not know which backend is used.)
+ # - There is quite a substantial overhead (+25% CPU time) when
+ # using multiple threads while there is only a 10% gain in wall
+ # clock time.
+ threads = 1
+
+ pigz_args += ['-p', str(threads)]
+
+ self.process = Popen(pigz_args, stdout=PIPE, stderr=PIPE)
self.name = path
- if _PY3 and 'b' not in mode:
+ if 'b' not in mode:
self._file = io.TextIOWrapper(self.process.stdout)
else:
self._file = self.process.stdout
- if _PY3:
- self._stderr = io.TextIOWrapper(self.process.stderr)
- else:
- self._stderr = self.process.stderr
+ self._stderr = io.TextIOWrapper(self.process.stderr)
self.closed = False
# Give the subprocess a little bit of time to report any errors (such as
# a non-existing file)
@@ -207,48 +221,51 @@ class PipedGzipReader(Closing):
self._raise_if_error()
def close(self):
+ if self.closed:
+ return
self.closed = True
retcode = self.process.poll()
if retcode is None:
# still running
self.process.terminate()
- self._raise_if_error()
+ allow_sigterm = True
+ else:
+ allow_sigterm = False
+ self.process.wait()
+ self._file.close()
+ self._raise_if_error(allow_sigterm=allow_sigterm)
+ self._stderr.close()
def __iter__(self):
- for line in self._file:
- yield line
- self.process.wait()
- self._raise_if_error()
+ return self
+
+ def __next__(self):
+ return self._file.__next__()
- def _raise_if_error(self):
+ def _raise_if_error(self, allow_sigterm=False):
"""
- Raise IOError if process is not running anymore and the
- exit code is nonzero.
+ Raise IOError if process is not running anymore and the exit code is
+ nonzero. If allow_sigterm is set and a SIGTERM exit code is
+ encountered, no error is raised.
"""
retcode = self.process.poll()
- if retcode is not None and retcode != 0:
+ if (
+ retcode is not None and retcode != 0
+ and not (allow_sigterm and retcode == -signal.SIGTERM)
+ ):
message = self._stderr.read().strip()
- raise IOError(message)
+ self._file.close()
+ self._stderr.close()
+ raise OSError("{} (exit code {})".format(message, retcode))
def read(self, *args):
- data = self._file.read(*args)
- if len(args) == 0 or args[0] <= 0:
- # wait for process to terminate until we check the exit code
- self.process.wait()
- self._raise_if_error()
- return data
+ return self._file.read(*args)
def readinto(self, *args):
- data = self._file.readinto(*args)
- return data
+ return self._file.readinto(*args)
def readline(self, *args):
- data = self._file.readline(*args)
- if len(args) == 0 or args[0] <= 0:
- # wait for process to terminate until we check the exit code
- self.process.wait()
- self._raise_if_error()
- return data
+ return self._file.readline(*args)
def seekable(self):
return self._file.seekable()
@@ -256,9 +273,8 @@ class PipedGzipReader(Closing):
def peek(self, n=None):
return self._file.peek(n)
- if _PY3:
- def readable(self):
- return self._file.readable()
+ def readable(self):
+ return self._file.readable()
def writable(self):
return self._file.writable()
@@ -271,23 +287,11 @@ def _open_stdin_or_out(mode):
# Do not return sys.stdin or sys.stdout directly as we want the returned object
# to be closable without closing sys.stdout.
std = dict(r=sys.stdin, w=sys.stdout)[mode[0]]
- if not _PY3:
- # Enforce str type on Python 2
- # Note that io.open is slower than regular open() on Python 2.7, but
- # it appears to be the only API that has a closefd parameter.
- mode = mode[0] + 'b'
- return io.open(std.fileno(), mode=mode, closefd=False)
+ return open(std.fileno(), mode=mode, closefd=False)
def _open_bz2(filename, mode):
- if bz2 is None:
- raise ImportError("Cannot open bz2 files: The bz2 module is not available")
- if _PY3:
- return bz2.open(filename, mode)
- else:
- if mode[0] == 'a':
- raise ValueError("mode '{0}' not supported with BZ2 compression".format(mode))
- return bz2.BZ2File(filename, mode)
+ return bz2.open(filename, mode)
def _open_xz(filename, mode):
@@ -298,44 +302,67 @@ def _open_xz(filename, mode):
def _open_gz(filename, mode, compresslevel, threads):
- if sys.version_info[:2] == (2, 7):
- buffered_reader = io.BufferedReader
- buffered_writer = io.BufferedWriter
- else:
- buffered_reader = lambda x: x
- buffered_writer = lambda x: x
- if _PY3:
- exc = FileNotFoundError # was introduced in Python 3.3
- else:
- exc = OSError
- if 'r' in mode:
+ if threads != 0:
try:
- return PipedGzipReader(filename, mode)
- except exc:
- # pigz is not installed
- return buffered_reader(gzip.open(filename, mode))
+ if 'r' in mode:
+ return PipedGzipReader(filename, mode, threads=threads)
+ else:
+ return PipedGzipWriter(filename, mode, compresslevel, threads=threads)
+ except FileNotFoundError:
+ pass # We try without threads.
+
+ if 'r' in mode:
+ return gzip.open(filename, mode)
else:
- try:
- return PipedGzipWriter(filename, mode, compresslevel, threads=threads)
- except exc:
- return buffered_writer(gzip.open(filename, mode, compresslevel=compresslevel))
+ return gzip.open(filename, mode, compresslevel=compresslevel)
+
+
+def _detect_format_from_content(filename):
+ """
+ Attempts to detect file format from the content by reading the first
+ 6 bytes. Returns None if no format could be detected.
+ """
+ try:
+ if stat.S_ISREG(os.stat(filename).st_mode):
+ with open(filename, "rb") as fh:
+ bs = fh.read(6)
+ if bs[:2] == b'\x1f\x8b':
+ # https://tools.ietf.org/html/rfc1952#page-6
+ return "gz"
+ elif bs[:3] == b'\x42\x5a\x68':
+ # https://en.wikipedia.org/wiki/List_of_file_signatures
+ return "bz2"
+ elif bs[:6] == b'\xfd\x37\x7a\x58\x5a\x00':
+ # https://tukaani.org/xz/xz-file-format.txt
+ return "xz"
+ except OSError:
+ return None
+
+
+def _detect_format_from_extension(filename):
+ """
+ Attempts to detect file format from the filename extension.
+ Returns None if no format could be detected.
+ """
+ if filename.endswith('.bz2'):
+ return "bz2"
+ elif filename.endswith('.xz'):
+ return "xz"
+ elif filename.endswith('.gz'):
+ return "gz"
+ else:
+ return None
def xopen(filename, mode='r', compresslevel=6, threads=None):
"""
A replacement for the "open" function that can also read and write
compressed files transparently. The supported compression formats are gzip,
- bzip2 and xz. If the filename is '-', standard output (mode 'w') or input (mode 'r') is returned.
-
- The file type is determined based on the filename: .gz is gzip, .bz2 is bzip2 and .xz is
- xz/lzma.
+ bzip2 and xz. If the filename is '-', standard output (mode 'w') or
+ standard input (mode 'r') is returned.
- When writing a gzip-compressed file, the following methods are tried in order to get the
- best speed 1) using a pigz (parallel gzip) subprocess; 2) using a gzip subprocess;
- 3) gzip.open. A single gzip subprocess can be faster than gzip.open because it runs in a
- separate process.
-
- Uncompressed files are opened with the regular open().
+ The file type is determined based on the filename: .gz is gzip, .bz2 is bzip2, .xz is
+ xz/lzma and no compression assumed otherwise.
mode can be: 'rt', 'rb', 'at', 'ab', 'wt', or 'wb'. Also, the 't' can be omitted,
so instead of 'rt', 'wt' and 'at', the abbreviations 'r', 'w' and 'a' can be used.
@@ -345,35 +372,36 @@ def xopen(filename, mode='r', compresslevel=6, threads=None):
Append mode ('a', 'at', 'ab') is not available with BZ2 compression and
will raise an error.
- compresslevel is the gzip compression level. It is not used for bz2 and xz.
+ compresslevel is the compression level for writing to gzip files.
+ This parameter is ignored for the other compression formats.
+
+ threads only has a meaning when reading or writing gzip files.
+
+ When threads is None (the default), reading or writing a gzip file is done with a pigz
+ (parallel gzip) subprocess if possible. See PipedGzipWriter and PipedGzipReader.
- threads is the number of threads for pigz. If left at None, then the pigz
- default is used. With pigz 2.4, this is "the number of online processors,
- or 8 if unknown".
+ When threads = 0, no subprocess is used.
"""
if mode in ('r', 'w', 'a'):
mode += 't'
if mode not in ('rt', 'rb', 'wt', 'wb', 'at', 'ab'):
- raise ValueError("mode '{0}' not supported".format(mode))
- if not _PY3:
- mode = mode[0]
+ raise ValueError("Mode '{}' not supported".format(mode))
filename = fspath(filename)
- if not isinstance(filename, basestring):
- raise ValueError("the filename must be a string")
if compresslevel not in range(1, 10):
raise ValueError("compresslevel must be between 1 and 9")
if filename == '-':
return _open_stdin_or_out(mode)
- elif filename.endswith('.bz2'):
- return _open_bz2(filename, mode)
- elif filename.endswith('.xz'):
- return _open_xz(filename, mode)
- elif filename.endswith('.gz'):
+
+ detected_format = _detect_format_from_extension(filename)
+ if detected_format is None and "w" not in mode:
+ detected_format = _detect_format_from_content(filename)
+
+ if detected_format == "gz":
return _open_gz(filename, mode, compresslevel, threads)
+ elif detected_format == "xz":
+ return _open_xz(filename, mode)
+ elif detected_format == "bz2":
+ return _open_bz2(filename, mode)
else:
- # Python 2.6 and 2.7 have io.open, which we could use to make the returned
- # object consistent with the one returned in Python 3, but reading a file
- # with io.open() is 100 times slower (!) on Python 2.6, and still about
- # three times slower on Python 2.7 (tested with "for _ in io.open(path): pass")
return open(filename, mode)
=====================================
src/xopen/_version.py
=====================================
@@ -1,4 +1,4 @@
# coding: utf-8
# file generated by setuptools_scm
# don't change, don't track in version control
-version = '0.7.3'
+version = '0.9.0'
=====================================
tests/file.txt.bz2.test
=====================================
Binary files /dev/null and b/tests/file.txt.bz2.test differ
=====================================
tests/file.txt.gz.test
=====================================
Binary files /dev/null and b/tests/file.txt.gz.test differ
=====================================
tests/file.txt.xz.test
=====================================
Binary files /dev/null and b/tests/file.txt.xz.test differ
=====================================
tests/test_xopen.py
=====================================
@@ -1,13 +1,11 @@
-# coding: utf-8
-from __future__ import print_function, division, absolute_import
-
import io
import os
import random
-import sys
import signal
-from contextlib import contextmanager
+import time
import pytest
+from pathlib import Path
+
from xopen import xopen, PipedGzipReader, PipedGzipWriter
@@ -24,11 +22,6 @@ files = [base + ext for ext in extensions]
CONTENT_LINES = ['Testing, testing ...\n', 'The second line.\n']
CONTENT = ''.join(CONTENT_LINES)
-# File extensions for which appending is supported
-append_extensions = extensions[:]
-if sys.version_info[0] == 2:
- append_extensions.remove(".bz2")
-
@pytest.fixture(params=extensions)
def ext(request):
@@ -40,14 +33,23 @@ def fname(request):
return request.param
- at contextmanager
-def temporary_path(name):
- directory = os.path.join(os.path.dirname(__file__), 'testtmp')
- if not os.path.isdir(directory):
- os.mkdir(directory)
- path = os.path.join(directory, name)
- yield path
- os.remove(path)
+ at pytest.fixture
+def large_gzip(tmpdir):
+ path = str(tmpdir.join("large.gz"))
+ random_text = ''.join(random.choice('ABCDEFGHIJKLMNOPQRSTUVWXYZ\n') for _ in range(1024))
+ # Make the text a lot bigger in order to ensure that it is larger than the
+ # pipe buffer size.
+ random_text *= 1024
+ with xopen(path, 'w') as f:
+ f.write(random_text)
+ return path
+
+
+ at pytest.fixture
+def truncated_gzip(large_gzip):
+ with open(large_gzip, 'a') as f:
+ f.truncate(os.stat(large_gzip).st_size - 10)
+ return large_gzip
def test_xopen_text(fname):
@@ -102,11 +104,20 @@ def test_pipedgzipreader_readinto():
assert b[:length] == content
-if sys.version_info[0] != 2:
- def test_pipedgzipreader_textiowrapper():
- with PipedGzipReader("tests/file.txt.gz", "rb") as f:
- wrapped = io.TextIOWrapper(f)
- assert wrapped.read() == CONTENT
+def test_pipedgzipreader_textiowrapper():
+ with PipedGzipReader("tests/file.txt.gz", "rb") as f:
+ wrapped = io.TextIOWrapper(f)
+ assert wrapped.read() == CONTENT
+
+
+def test_detect_gzip_file_format_from_content():
+ with xopen("tests/file.txt.gz.test", "rb") as fh:
+ assert fh.readline() == CONTENT_LINES[0].encode("utf-8")
+
+
+def test_detect_bz2_file_format_from_content():
+ with xopen("tests/file.txt.bz2.test", "rb") as fh:
+ assert fh.readline() == CONTENT_LINES[0].encode("utf-8")
def test_readline(fname):
@@ -131,6 +142,20 @@ def test_readline_text_pipedgzipreader():
assert f.readline() == CONTENT_LINES[0]
+ at pytest.mark.parametrize("threads", [None, 1, 2])
+def test_pipedgzipreader_iter(threads):
+ with PipedGzipReader("tests/file.txt.gz", mode="r", threads=threads) as f:
+ lines = list(f)
+ assert lines[0] == CONTENT_LINES[0]
+
+
+def test_next(fname):
+ with xopen(fname, "rt") as f:
+ _ = next(f)
+ line2 = next(f)
+ assert line2 == 'The second line.\n', fname
+
+
def test_xopen_has_iter_method(ext, tmpdir):
path = str(tmpdir.join("out" + ext))
with xopen(path, mode='w') as f:
@@ -142,70 +167,99 @@ def test_pipedgzipwriter_has_iter_method(tmpdir):
assert hasattr(f, '__iter__')
+def test_iter_without_with(fname):
+ it = iter(xopen(fname, "rt"))
+ assert CONTENT_LINES[0] == next(it)
+
+
+def test_pipedgzipreader_iter_without_with():
+ it = iter(PipedGzipReader("tests/file.txt.gz"))
+ assert CONTENT_LINES[0] == next(it)
+
+
+ at pytest.mark.parametrize("mode", ["rb", "rt"])
+def test_pipedgzipreader_close(large_gzip, mode):
+ with PipedGzipReader(large_gzip, mode=mode) as f:
+ f.readline()
+ time.sleep(0.2)
+ # The subprocess should be properly terminated now
+
+
+def test_partial_gzip_iteration_closes_correctly(large_gzip):
+ class LineReader:
+ def __init__(self, file):
+ self.file = xopen(file, "rb")
+
+ def __iter__(self):
+ wrapper = io.TextIOWrapper(self.file)
+ yield from wrapper
+
+ f = LineReader(large_gzip)
+ next(iter(f))
+ f.file.close()
+
+
def test_nonexisting_file(ext):
with pytest.raises(IOError):
- with xopen('this-file-does-not-exist' + ext) as f:
- pass
+ with xopen('this-file-does-not-exist' + ext):
+ pass # pragma: no cover
def test_write_to_nonexisting_dir(ext):
with pytest.raises(IOError):
- with xopen('this/path/does/not/exist/file.txt' + ext, 'w') as f:
- pass
+ with xopen('this/path/does/not/exist/file.txt' + ext, 'w'):
+ pass # pragma: no cover
+
+
+def test_invalid_mode():
+ with pytest.raises(ValueError):
+ with xopen("tests/file.txt.gz", mode="hallo"):
+ pass # pragma: no cover
- at pytest.mark.parametrize("aext", append_extensions)
-def test_append(aext):
- text = "AB".encode("utf-8")
+def test_filename_not_a_string():
+ with pytest.raises(TypeError):
+ with xopen(123, mode="r"):
+ pass # pragma: no cover
+
+
+def test_invalid_compression_level(tmpdir):
+ path = str(tmpdir.join("out.gz"))
+ with pytest.raises(ValueError) as e:
+ with xopen(path, mode="w", compresslevel=17) as f:
+ f.write("hello") # pragma: no cover
+ assert "between 1 and 9" in e.value.args[0]
+
+
+ at pytest.mark.parametrize("ext", extensions)
+def test_append(ext, tmpdir):
+ text = b"AB"
reference = text + text
- with temporary_path('truncated.fastq' + aext) as path:
- try:
- os.unlink(path)
- except OSError:
+ path = str(tmpdir.join("the-file" + ext))
+ with xopen(path, "ab") as f:
+ f.write(text)
+ with xopen(path, "ab") as f:
+ f.write(text)
+ with xopen(path, "r") as f:
+ for appended in f:
pass
- with xopen(path, 'ab') as f:
- f.write(text)
- with xopen(path, 'ab') as f:
- f.write(text)
- with xopen(path, 'r') as f:
- for appended in f:
- pass
- try:
- reference = reference.decode("utf-8")
- except AttributeError:
- pass
- assert appended == reference
+ reference = reference.decode("utf-8")
+ assert appended == reference
- at pytest.mark.parametrize("aext", append_extensions)
-def test_append_text(aext):
+ at pytest.mark.parametrize("ext", extensions)
+def test_append_text(ext, tmpdir):
text = "AB"
reference = text + text
- with temporary_path('truncated.fastq' + aext) as path:
- try:
- os.unlink(path)
- except OSError:
+ path = str(tmpdir.join("the-file" + ext))
+ with xopen(path, "at") as f:
+ f.write(text)
+ with xopen(path, "at") as f:
+ f.write(text)
+ with xopen(path, "rt") as f:
+ for appended in f:
pass
- with xopen(path, 'at') as f:
- f.write(text)
- with xopen(path, 'at') as f:
- f.write(text)
- with xopen(path, 'rt') as f:
- for appended in f:
- pass
- assert appended == reference
-
-
-def create_truncated_file(path):
- # Random text
- random_text = ''.join(random.choice('ABCDEFGHIJKLMNOPQRSTUVWXYZ') for _ in range(1024))
- # Make the text a lot bigger in order to ensure that it is larger than the
- # pipe buffer size.
- random_text *= 1024 # 1MB
- with xopen(path, 'w') as f:
- f.write(random_text)
- with open(path, 'a') as f:
- f.truncate(os.stat(path).st_size - 10)
+ assert appended == reference
class TookTooLongError(Exception):
@@ -218,7 +272,7 @@ class timeout:
self.seconds = seconds
def handle_timeout(self, signum, frame):
- raise TookTooLongError()
+ raise TookTooLongError() # pragma: no cover
def __enter__(self):
signal.signal(signal.SIGALRM, self.handle_timeout)
@@ -228,26 +282,36 @@ class timeout:
signal.alarm(0)
-if sys.version_info[:2] != (3, 3):
- def test_truncated_gz():
- with temporary_path('truncated.gz') as path:
- create_truncated_file(path)
- with timeout(seconds=2):
- with pytest.raises((EOFError, IOError)):
- f = xopen(path, 'r')
- f.read()
- f.close()
+def test_truncated_gz(truncated_gzip):
+ with timeout(seconds=2):
+ with pytest.raises((EOFError, IOError)):
+ f = xopen(truncated_gzip, "r")
+ f.read()
+ f.close() # pragma: no cover
+
+
+def test_truncated_gz_iter(truncated_gzip):
+ with timeout(seconds=2):
+ with pytest.raises((EOFError, IOError)):
+ f = xopen(truncated_gzip, 'r')
+ for line in f:
+ pass
+ f.close() # pragma: no cover
+
+
+def test_truncated_gz_with(truncated_gzip):
+ with timeout(seconds=2):
+ with pytest.raises((EOFError, IOError)):
+ with xopen(truncated_gzip, 'r') as f:
+ f.read()
- def test_truncated_gz_iter():
- with temporary_path('truncated.gz') as path:
- create_truncated_file(path)
- with timeout(seconds=2):
- with pytest.raises((EOFError, IOError)):
- f = xopen(path, 'r')
- for line in f:
- pass
- f.close()
+def test_truncated_gz_iter_with(truncated_gzip):
+ with timeout(seconds=2):
+ with pytest.raises((EOFError, IOError)):
+ with xopen(truncated_gzip, 'r') as f:
+ for line in f:
+ pass
def test_bare_read_from_gz():
@@ -268,6 +332,19 @@ def test_write_pigz_threads(tmpdir):
assert f.read() == 'hello'
+def test_read_gzip_no_threads():
+ import gzip
+ with xopen("tests/hello.gz", "rb", threads=0) as f:
+ assert isinstance(f, gzip.GzipFile), f
+
+
+def test_write_gzip_no_threads(tmpdir):
+ import gzip
+ path = str(tmpdir.join("out.gz"))
+ with xopen(path, "wb", threads=0) as f:
+ assert isinstance(f, gzip.GzipFile), f
+
+
def test_write_stdout():
f = xopen('-', mode='w')
print("Hello", file=f)
@@ -284,30 +361,36 @@ def test_write_stdout_contextmanager():
print("Still there?")
-if sys.version_info[:2] >= (3, 4):
- # pathlib was added in Python 3.4
- from pathlib import Path
-
- def test_read_pathlib(fname):
- path = Path(fname)
- with xopen(path, mode='rt') as f:
- assert f.read() == CONTENT
-
- def test_read_pathlib_binary(fname):
- path = Path(fname)
- with xopen(path, mode='rb') as f:
- assert f.read() == bytes(CONTENT, 'ascii')
-
- def test_write_pathlib(ext, tmpdir):
- path = Path(str(tmpdir)) / ('hello.txt' + ext)
- with xopen(path, mode='wt') as f:
- f.write('hello')
- with xopen(path, mode='rt') as f:
- assert f.read() == 'hello'
-
- def test_write_pathlib_binary(ext, tmpdir):
- path = Path(str(tmpdir)) / ('hello.txt' + ext)
- with xopen(path, mode='wb') as f:
- f.write(b'hello')
- with xopen(path, mode='rb') as f:
- assert f.read() == b'hello'
+def test_read_pathlib(fname):
+ path = Path(fname)
+ with xopen(path, mode='rt') as f:
+ assert f.read() == CONTENT
+
+
+def test_read_pathlib_binary(fname):
+ path = Path(fname)
+ with xopen(path, mode='rb') as f:
+ assert f.read() == bytes(CONTENT, 'ascii')
+
+
+def test_write_pathlib(ext, tmpdir):
+ path = Path(str(tmpdir)) / ('hello.txt' + ext)
+ with xopen(path, mode='wt') as f:
+ f.write('hello')
+ with xopen(path, mode='rt') as f:
+ assert f.read() == 'hello'
+
+
+def test_write_pathlib_binary(ext, tmpdir):
+ path = Path(str(tmpdir)) / ('hello.txt' + ext)
+ with xopen(path, mode='wb') as f:
+ f.write(b'hello')
+ with xopen(path, mode='rb') as f:
+ assert f.read() == b'hello'
+
+
+# lzma doesn’t work on PyPy3 at the moment
+if lzma is not None:
+ def test_detect_xz_file_format_from_content():
+ with xopen("tests/file.txt.xz.test", "rb") as fh:
+ assert fh.readline() == CONTENT_LINES[0].encode("utf-8")
=====================================
tox.ini
=====================================
@@ -1,6 +1,17 @@
[tox]
-envlist = py27,py34,py35,py36,py37
+envlist = flake8,py35,py36,py37,py38,pypy3
[testenv]
deps = pytest
+setenv = PYTHONDEVMODE = 1
commands = pytest --doctest-modules --pyargs src/xopen tests
+
+[testenv:flake8]
+basepython = python3.6
+deps = flake8
+commands = flake8 src/ tests/
+
+[flake8]
+max-line-length = 99
+max-complexity = 10
+extend_ignore = E731
View it on GitLab: https://salsa.debian.org/med-team/python-xopen/-/commit/a2294648df253a7fcad8fbe4ab298f92ebe9b320
--
View it on GitLab: https://salsa.debian.org/med-team/python-xopen/-/commit/a2294648df253a7fcad8fbe4ab298f92ebe9b320
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200722/6b258775/attachment-0001.html>
More information about the debian-med-commit
mailing list