[med-svn] [Git][med-team/python-xopen][master] 5 commits: New upstream version 1.0.0

Nilesh Patra gitlab at salsa.debian.org
Sun Nov 8 18:33:56 GMT 2020



Nilesh Patra pushed to branch master at Debian Med / python-xopen


Commits:
52425f70 by Nilesh Patra at 2020-11-09T00:01:07+05:30
New upstream version 1.0.0
- - - - -
494f8d2c by Nilesh Patra at 2020-11-09T00:01:07+05:30
routine-update: New upstream version

- - - - -
7c57e3ef by Nilesh Patra at 2020-11-09T00:01:08+05:30
Update upstream source from tag 'upstream/1.0.0'

Update to upstream version '1.0.0'
with Debian dir b79c6a140d88c3103cf017d2d76c4d611a7edaa9
- - - - -
c3eeae3f by Nilesh Patra at 2020-11-09T00:01:14+05:30
routine-update: Add salsa-ci file

- - - - -
6db0f815 by Nilesh Patra at 2020-11-09T00:01:18+05:30
routine-update: Ready to upload to unstable

- - - - -


11 changed files:

- .travis.yml
- PKG-INFO
- README.rst
- debian/changelog
- + debian/salsa-ci.yml
- setup.py
- src/xopen.egg-info/PKG-INFO
- src/xopen/__init__.py
- src/xopen/_version.py
- tests/test_xopen.py
- tox.ini


Changes:

=====================================
.travis.yml
=====================================
@@ -1,6 +1,6 @@
 language: python
 
-dist: xenial
+dist: focal
 
 cache:
   directories:
@@ -11,6 +11,7 @@ python:
   - "3.6"
   - "3.7"
   - "3.8"
+  - "3.9"
   - "pypy3"
 
 install:
@@ -45,7 +46,16 @@ jobs:
           ls -l dist/
           python3 -m twine upload dist/xopen-*
 
-    - name: flake8
+    - stage: test
+      name: flake8
       python: "3.6"
       install: python3 -m pip install flake8
       script: flake8 src/ tests/
+
+    - stage: test
+      name: igzip
+      python: "3.6"
+      install:
+        - sudo apt-get update && sudo apt-get install -y pigz isal
+        - pip install --upgrade coverage codecov
+        - pip install .


=====================================
PKG-INFO
=====================================
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: xopen
-Version: 0.9.0
+Version: 1.0.0
 Summary: Open compressed files transparently
 Home-page: https://github.com/marcelm/xopen/
 Author: Marcel Martin
@@ -35,14 +35,15 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
         to open ``.gz`` files, which is faster than using the built-in ``gzip.open``
         function. ``pigz`` can use multiple threads when compressing, but is also faster
         when reading ``.gz`` files, so it is used both for reading and writing if it is
-        available.
+        available. For gzip compression levels 1 to 3,
+        `igzip <https://github.com/intel/isa-l/>`_ is used for an even greater speedup.
         
         This module has originally been developed as part of the `Cutadapt
         tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
         manipulate sequencing data. It has been in successful use within that software
         for a few years.
         
-        ``xopen`` is compatible with Python versions 3.5 to 3.8.
+        ``xopen`` is compatible with Python versions 3.5 and later.
         
         
         Usage
@@ -82,7 +83,7 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
         appending to files.
         
         Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to
-        make reading gzipped files faster.
+        make reading and writing gzipped files faster.
         
         Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
         format detection from content.
@@ -94,9 +95,15 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
         Changes
         -------
         
-        v0.9.0
+        v1.0.0
         ~~~~~~
+        * If installed, the ``igzip`` program (part of
+          `Intel ISA-L <https://github.com/intel/isa-l/>`_) is now used for reading
+          and writing gzip-compressed files at compression levels 1-3, which results
+          in a significant speedup.
         
+        v0.9.0
+        ~~~~~~
         * When the file name extension of a file to be opened for reading is not
           available, the content is inspected (if possible) and used to determine
           which compression format applies.
@@ -136,10 +143,13 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
         * xopen now accepts pathlib.Path objects.
         
         
-        Author
-        ------
+        Contributors
+        ------------
+        
+        * Marcel Martin
+        * Ruben Vorderman
+        * For more contributors, see <https://github.com/marcelm/xopen/graphs/contributors>
         
-        Marcel Martin <mail at marcelm.net> (`@marcelm_ on Twitter <https://twitter.com/marcelm_>`_)
         
         Links
         -----
@@ -152,9 +162,5 @@ Platform: UNKNOWN
 Classifier: Development Status :: 5 - Production/Stable
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.5
-Classifier: Programming Language :: Python :: 3.6
-Classifier: Programming Language :: Python :: 3.7
-Classifier: Programming Language :: Python :: 3.8
 Requires-Python: >=3.5
 Provides-Extra: dev


=====================================
README.rst
=====================================
@@ -27,14 +27,15 @@ For example, ``xopen`` uses ``pigz``, which is a parallel version of ``gzip``,
 to open ``.gz`` files, which is faster than using the built-in ``gzip.open``
 function. ``pigz`` can use multiple threads when compressing, but is also faster
 when reading ``.gz`` files, so it is used both for reading and writing if it is
-available.
+available. For gzip compression levels 1 to 3,
+`igzip <https://github.com/intel/isa-l/>`_ is used for an even greater speedup.
 
 This module has originally been developed as part of the `Cutadapt
 tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
 manipulate sequencing data. It has been in successful use within that software
 for a few years.
 
-``xopen`` is compatible with Python versions 3.5 to 3.8.
+``xopen`` is compatible with Python versions 3.5 and later.
 
 
 Usage
@@ -74,7 +75,7 @@ Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for
 appending to files.
 
 Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to
-make reading gzipped files faster.
+make reading and writing gzipped files faster.
 
 Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
 format detection from content.
@@ -86,9 +87,15 @@ If you also want to open S3 files, you may want to use that module instead.
 Changes
 -------
 
-v0.9.0
+v1.0.0
 ~~~~~~
+* If installed, the ``igzip`` program (part of
+  `Intel ISA-L <https://github.com/intel/isa-l/>`_) is now used for reading
+  and writing gzip-compressed files at compression levels 1-3, which results
+  in a significant speedup.
 
+v0.9.0
+~~~~~~
 * When the file name extension of a file to be opened for reading is not
   available, the content is inspected (if possible) and used to determine
   which compression format applies.
@@ -128,10 +135,13 @@ v0.5.0
 * xopen now accepts pathlib.Path objects.
 
 
-Author
-------
+Contributors
+------------
+
+* Marcel Martin
+* Ruben Vorderman
+* For more contributors, see <https://github.com/marcelm/xopen/graphs/contributors>
 
-Marcel Martin <mail at marcelm.net> (`@marcelm_ on Twitter <https://twitter.com/marcelm_>`_)
 
 Links
 -----


=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+python-xopen (1.0.0-1) unstable; urgency=medium
+
+  * New upstream version
+  * Add salsa-ci file (routine-update)
+
+ -- Nilesh Patra <npatra974 at gmail.com>  Mon, 09 Nov 2020 00:01:18 +0530
+
 python-xopen (0.9.0-1) unstable; urgency=medium
 
   * New upstream version 0.9.0


=====================================
debian/salsa-ci.yml
=====================================
@@ -0,0 +1,4 @@
+---
+include:
+  - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/salsa-ci.yml
+  - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/pipeline-jobs.yml


=====================================
setup.py
=====================================
@@ -1,10 +1,6 @@
 import sys
 from setuptools import setup, find_packages
 
-if sys.version_info < (3, 5):
-    sys.stdout.write("At least Python 3.5 is required.\n")
-    sys.exit(1)
-
 with open('README.rst') as f:
     long_description = f.read()
 
@@ -28,9 +24,5 @@ setup(
         "Development Status :: 5 - Production/Stable",
         "License :: OSI Approved :: MIT License",
         "Programming Language :: Python :: 3",
-        "Programming Language :: Python :: 3.5",
-        "Programming Language :: Python :: 3.6",
-        "Programming Language :: Python :: 3.7",
-        "Programming Language :: Python :: 3.8",
     ]
 )


=====================================
src/xopen.egg-info/PKG-INFO
=====================================
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: xopen
-Version: 0.9.0
+Version: 1.0.0
 Summary: Open compressed files transparently
 Home-page: https://github.com/marcelm/xopen/
 Author: Marcel Martin
@@ -35,14 +35,15 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
         to open ``.gz`` files, which is faster than using the built-in ``gzip.open``
         function. ``pigz`` can use multiple threads when compressing, but is also faster
         when reading ``.gz`` files, so it is used both for reading and writing if it is
-        available.
+        available. For gzip compression levels 1 to 3,
+        `igzip <https://github.com/intel/isa-l/>`_ is used for an even greater speedup.
         
         This module has originally been developed as part of the `Cutadapt
         tool <https://cutadapt.readthedocs.io/>`_ that is used in bioinformatics to
         manipulate sequencing data. It has been in successful use within that software
         for a few years.
         
-        ``xopen`` is compatible with Python versions 3.5 to 3.8.
+        ``xopen`` is compatible with Python versions 3.5 and later.
         
         
         Usage
@@ -82,7 +83,7 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
         appending to files.
         
         Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to
-        make reading gzipped files faster.
+        make reading and writing gzipped files faster.
         
         Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for
         format detection from content.
@@ -94,9 +95,15 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
         Changes
         -------
         
-        v0.9.0
+        v1.0.0
         ~~~~~~
+        * If installed, the ``igzip`` program (part of
+          `Intel ISA-L <https://github.com/intel/isa-l/>`_) is now used for reading
+          and writing gzip-compressed files at compression levels 1-3, which results
+          in a significant speedup.
         
+        v0.9.0
+        ~~~~~~
         * When the file name extension of a file to be opened for reading is not
           available, the content is inspected (if possible) and used to determine
           which compression format applies.
@@ -136,10 +143,13 @@ Description: .. image:: https://travis-ci.org/marcelm/xopen.svg?branch=master
         * xopen now accepts pathlib.Path objects.
         
         
-        Author
-        ------
+        Contributors
+        ------------
+        
+        * Marcel Martin
+        * Ruben Vorderman
+        * For more contributors, see <https://github.com/marcelm/xopen/graphs/contributors>
         
-        Marcel Martin <mail at marcelm.net> (`@marcelm_ on Twitter <https://twitter.com/marcelm_>`_)
         
         Links
         -----
@@ -152,9 +162,5 @@ Platform: UNKNOWN
 Classifier: Development Status :: 5 - Production/Stable
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.5
-Classifier: Programming Language :: Python :: 3.6
-Classifier: Programming Language :: Python :: 3.7
-Classifier: Programming Language :: Python :: 3.8
 Requires-Python: >=3.5
 Provides-Extra: dev


=====================================
src/xopen/__init__.py
=====================================
@@ -13,7 +13,10 @@ import time
 import stat
 import signal
 import pathlib
+import subprocess
+import tempfile
 from subprocess import Popen, PIPE
+from typing import Optional
 
 from ._version import version as __version__
 
@@ -23,6 +26,22 @@ try:
 except ImportError:
     lzma = None
 
+try:
+    import fcntl
+    # fcntl.F_SETPIPE_SZ will be available in python 3.10.
+    # https://github.com/python/cpython/pull/21921
+    # If not available: set it to the correct value for known platforms.
+    if not hasattr(fcntl, "F_SETPIPE_SZ") and sys.platform == "linux":
+        setattr(fcntl, "F_SETPIPE_SZ", 1031)
+except ImportError:
+    fcntl = None
+
+_MAX_PIPE_SIZE_PATH = pathlib.Path("/proc/sys/fs/pipe-max-size")
+if _MAX_PIPE_SIZE_PATH.exists():
+    _MAX_PIPE_SIZE = int(_MAX_PIPE_SIZE_PATH.read_text())
+else:
+    _MAX_PIPE_SIZE = None
+
 
 try:
     from os import fspath  # Exists in Python 3.6+
@@ -66,6 +85,39 @@ def _available_cpu_count():
         return 1
 
 
+def _set_pipe_size_to_max(fd: int):
+    """
+    Set pipe size to maximum on platforms that support it.
+    :param fd: The file descriptor to increase the pipe size for.
+    """
+    if not hasattr(fcntl, "F_SETPIPE_SZ") or not _MAX_PIPE_SIZE:
+        return
+    fcntl.fcntl(fd, fcntl.F_SETPIPE_SZ, _MAX_PIPE_SIZE)
+
+
+def _can_read_concatenated_gz(program: str) -> bool:
+    """
+    Check if a concatenated gzip file can be read properly. Not all deflate
+    programs handle this properly.
+    """
+    fd, temp_path = tempfile.mkstemp(suffix=".gz", prefix="xopen.")
+    try:
+        # Create a concatenated gzip file. gzip.compress recreates the contents
+        # of a gzip file including header and trailer.
+        with open(temp_path, "wb") as temp_file:
+            temp_file.write(gzip.compress(b"AB") + gzip.compress(b"CD"))
+        try:
+            result = subprocess.run([program, "-c", "-d", temp_path],
+                                    check=True, stderr=PIPE, stdout=PIPE)
+            return result.stdout == b"ABCD"
+        except subprocess.CalledProcessError:
+            # Program can't read zip
+            return False
+    finally:
+        os.close(fd)
+        os.remove(temp_path)
+
+
 class Closing:
     """
     Inherit from this class and implement a close() method to offer context
@@ -85,25 +137,22 @@ class Closing:
             pass
 
 
-class PipedGzipWriter(Closing):
+class PipedCompressionWriter(Closing):
     """
-    Write gzip-compressed files by running an external gzip or pigz process and
-    piping into it. pigz is tried first. It is fast because it can compress using
-    multiple cores.
-
-    If pigz is not available, a gzip subprocess is used. On Python 2, this saves
-    CPU time because gzip.GzipFile is slower. On Python 3, gzip.GzipFile is on
-    par with gzip itself, but running an external gzip can still reduce wall-clock
-    time because the compression happens in a separate process.
+    Write Compressed files by running an external process and piping into it.
     """
-
-    def __init__(self, path, mode='wt', compresslevel=6, threads=None):
+    def __init__(self, path, program, mode='wt',
+                 compresslevel: Optional[int] = None,
+                 threads_flag: str = None,
+                 threads: Optional[int] = None):
         """
         mode -- one of 'w', 'wt', 'wb', 'a', 'at', 'ab'
-        compresslevel -- gzip compression level
-        threads (int) -- number of pigz threads. If this is set to None, a reasonable default is
+        compresslevel -- compression level
+        threads_flag -- which flag is used to denote the number of threads in the program.
+            If set to none, program will be called without threads flag.
+        threads (int) -- number of threads. If this is set to None, a reasonable default is
             used. At the moment, this means that the number of available CPU cores is used, capped
-            at four to avoid creating too many threads. Use 0 to let pigz use all available cores.
+            at four to avoid creating too many threads. Use 0 to use all available cores.
         """
         if mode not in ('w', 'wt', 'wb', 'a', 'at', 'ab'):
             raise ValueError(
@@ -114,29 +163,33 @@ class PipedGzipWriter(Closing):
         self.devnull = open(os.devnull, mode)
         self.closed = False
         self.name = path
+        self._mode = mode
+        self._program = program
+        self._threads_flag = threads_flag
 
         if threads is None:
             threads = min(_available_cpu_count(), 4)
         try:
-            self.process, self.program = self._open_process(
+            self.process = self._open_process(
                 mode, compresslevel, threads, self.outfile, self.devnull)
         except OSError:
             self.outfile.close()
             self.devnull.close()
             raise
 
+        _set_pipe_size_to_max(self.process.stdin.fileno())
+
         if 'b' not in mode:
             self._file = io.TextIOWrapper(self.process.stdin)
         else:
             self._file = self.process.stdin
 
-    @staticmethod
-    def _open_process(mode, compresslevel, threads, outfile, devnull):
-        pigz_args = ['pigz']
-        if threads != 0:
-            pigz_args += ['-p', str(threads)]
+    def _open_process(self, mode, compresslevel, threads, outfile, devnull):
+        program_args = [self._program]
+        if threads != 0 and self._threads_flag is not None:
+            program_args += [self._threads_flag, str(threads)]
         extra_args = []
-        if 'w' in mode and compresslevel != 6:
+        if 'w' in mode and compresslevel is not None:
             extra_args += ['-' + str(compresslevel)]
 
         kwargs = dict(stdin=PIPE, stdout=outfile, stderr=devnull)
@@ -148,14 +201,9 @@ class PipedGzipWriter(Closing):
         if sys.platform != 'win32':
             kwargs['close_fds'] = True
 
-        try:
-            process = Popen(pigz_args + extra_args, **kwargs)
-            program = 'pigz'
-        except OSError:  # TODO Use FileNotFound instead (Python 3)
-            # pigz not found, try regular gzip
-            process = Popen(['gzip'] + extra_args, **kwargs)
-            program = 'gzip'
-        return process, program
+        process = Popen(program_args + extra_args, **kwargs)
+
+        return process
 
     def write(self, arg):
         self._file.write(arg)
@@ -170,7 +218,7 @@ class PipedGzipWriter(Closing):
         self.devnull.close()
         if retcode != 0:
             raise OSError(
-                "Output {} process terminated with exit code {}".format(self.program, retcode))
+                "Output {} process terminated with exit code {}".format(self._program, retcode))
 
     def __iter__(self):
         return self
@@ -179,36 +227,36 @@ class PipedGzipWriter(Closing):
         raise io.UnsupportedOperation('not readable')
 
 
-class PipedGzipReader(Closing):
+class PipedCompressionReader(Closing):
     """
-    Open a pipe to pigz for reading a gzipped file. Even though pigz is mostly
-    used to speed up writing by using many compression threads, it is
-    also faster when reading, even when forced to use a single thread
-    (ca. 2x speedup).
+    Open a pipe to a process for reading a compressed file.
     """
 
-    def __init__(self, path, mode='r', threads=None):
+    def __init__(self, path, program, mode='r', threads_flag=None, threads=None):
         """
         Raise an OSError when pigz could not be found.
         """
         if mode not in ('r', 'rt', 'rb'):
             raise ValueError("Mode is '{}', but it must be 'r', 'rt' or 'rb'".format(mode))
 
-        pigz_args = ['pigz', '-cd', path]
-
-        if threads is None:
-            # Single threaded behaviour by default because:
-            # - Using a single thread to read a file is the least unexpected
-            #   behaviour. (For users of xopen, who do not know which backend is used.)
-            # - There is quite a substantial overhead (+25% CPU time) when
-            #   using multiple threads while there is only a 10% gain in wall
-            #   clock time.
-            threads = 1
+        program_args = [program, '-cd', path]
 
-        pigz_args += ['-p', str(threads)]
+        if threads_flag is not None:
+            if threads is None:
+                # Single threaded behaviour by default because:
+                # - Using a single thread to read a file is the least unexpected
+                #   behaviour. (For users of xopen, who do not know which backend is used.)
+                # - There is quite a substantial overhead (+25% CPU time) when
+                #   using multiple threads while there is only a 10% gain in wall
+                #   clock time.
+                threads = 1
+            program_args += [threads_flag, str(threads)]
 
-        self.process = Popen(pigz_args, stdout=PIPE, stderr=PIPE)
+        self.process = Popen(program_args, stdout=PIPE, stderr=PIPE)
         self.name = path
+
+        _set_pipe_size_to_max(self.process.stdout.fileno())
+
         if 'b' not in mode:
             self._file = io.TextIOWrapper(self.process.stdout)
         else:
@@ -283,6 +331,84 @@ class PipedGzipReader(Closing):
         return None
 
 
+class PipedGzipReader(PipedCompressionReader):
+    """
+    Open a pipe to pigz for reading a gzipped file. Even though pigz is mostly
+    used to speed up writing by using many compression threads, it is
+    also faster when reading, even when forced to use a single thread
+    (ca. 2x speedup).
+    """
+    def __init__(self, path, mode='r', threads=None):
+        try:
+            super().__init__(path, "pigz", mode, "-p", threads)
+        except OSError:
+            super().__init__(path, "gzip", mode, None, threads)
+
+
+class PipedGzipWriter(PipedCompressionWriter):
+    """
+    Write gzip-compressed files by running an external gzip or pigz process and
+    piping into it. pigz is tried first. It is fast because it can compress using
+    multiple cores. Also it is more efficient on one core.
+    If pigz is not available, a gzip subprocess is used. On Python 3, gzip.GzipFile is on
+    par with gzip itself, but running an external gzip can still reduce wall-clock
+    time because the compression happens in a separate process.
+    """
+    def __init__(self, path, mode='wt', compresslevel=None, threads=None):
+        """
+        mode -- one of 'w', 'wt', 'wb', 'a', 'at', 'ab'
+        compresslevel -- compression level
+        threads (int) -- number of pigz threads. If this is set to None, a reasonable default is
+            used. At the moment, this means that the number of available CPU cores is used, capped
+            at four to avoid creating too many threads. Use 0 to let pigz use all available cores.
+        """
+        if compresslevel is not None and compresslevel not in range(1, 10):
+            raise ValueError("compresslevel must be between 1 and 9")
+        try:
+            super().__init__(path, "pigz", mode, compresslevel, "-p", threads)
+        except OSError:
+            super().__init__(path, "gzip", mode, compresslevel, None, threads)
+
+
+class PipedIGzipReader(PipedCompressionReader):
+    """
+    Uses igzip for reading of a gzipped file. This is much faster than either
+    gzip or pigz which were written to run on a wide array of systems. igzip
+    can only run on x86 and ARM architectures, but is able to use more
+    architecture-specific optimizations as a result.
+    """
+    def __init__(self, path, mode="r"):
+        if not _can_read_concatenated_gz("igzip"):
+            # Instead of elaborate version string checking once the problem is
+            # fixed, it is much easier to use this, "proof in the pudding" type
+            # of evaluation.
+            raise ValueError(
+                "This version of igzip does not support reading "
+                "concatenated gzip files and is therefore not "
+                "safe to use. See: https://github.com/intel/isa-l/issues/143")
+        super().__init__(path, "igzip", mode)
+
+
+class PipedIGzipWriter(PipedCompressionWriter):
+    """
+    Uses igzip for writing a gzipped file. This is much faster than either
+    gzip or pigz which were written to run on a wide array of systems. igzip
+    can only run on x86 and ARM architectures, but is able to use more
+    architecture-specific optimizations as a result.
+
+    Threads are supported by a flag, but do not add any speed. Also on some
+    distro version (isal package in debian buster) the thread flag is not
+    present. For these reason threads are omitted from the interface.
+    Only compresslevel 0-3 are supported and these output slightly different
+    filesizes from their pigz/gzip counterparts.
+    See: https://gist.github.com/rhpvorderman/4f1201c3f39518ff28dde45409eb696b
+    """
+    def __init__(self, path, mode="wt", compresslevel=None):
+        if compresslevel is not None and compresslevel not in range(0, 4):
+            raise ValueError("compresslevel must be between 0 and 3")
+        super().__init__(path, "igzip", mode, compresslevel)
+
+
 def _open_stdin_or_out(mode):
     # Do not return sys.stdin or sys.stdout directly as we want the returned object
     # to be closable without closing sys.stdout.
@@ -305,16 +431,27 @@ def _open_gz(filename, mode, compresslevel, threads):
     if threads != 0:
         try:
             if 'r' in mode:
-                return PipedGzipReader(filename, mode, threads=threads)
+                try:
+                    return PipedIGzipReader(filename, mode)
+                except (OSError, ValueError):
+                    # No igzip installed or version does not support reading
+                    # concatenated files.
+                    return PipedGzipReader(filename, mode, threads=threads)
             else:
-                return PipedGzipWriter(filename, mode, compresslevel, threads=threads)
-        except FileNotFoundError:
+                try:
+                    return PipedIGzipWriter(filename, mode, compresslevel)
+                except (OSError, ValueError):
+                    # No igzip installed or compression level higher than 3
+                    return PipedGzipWriter(filename, mode, compresslevel, threads=threads)
+        except OSError:
             pass  # We try without threads.
 
     if 'r' in mode:
         return gzip.open(filename, mode)
     else:
-        return gzip.open(filename, mode, compresslevel=compresslevel)
+        # Override gzip.open's default of 9 for consistency with command-line gzip.
+        return gzip.open(filename, mode,
+                         compresslevel=6 if compresslevel is None else compresslevel)
 
 
 def _detect_format_from_content(filename):
@@ -354,7 +491,7 @@ def _detect_format_from_extension(filename):
         return None
 
 
-def xopen(filename, mode='r', compresslevel=6, threads=None):
+def xopen(filename, mode='r', compresslevel=None, threads=None):
     """
     A replacement for the "open" function that can also read and write
     compressed files transparently. The supported compression formats are gzip,
@@ -373,7 +510,8 @@ def xopen(filename, mode='r', compresslevel=6, threads=None):
     will raise an error.
 
     compresslevel is the compression level for writing to gzip files.
-    This parameter is ignored for the other compression formats.
+    This parameter is ignored for the other compression formats. If set to
+    None (default), level 6 is used.
 
     threads only has a meaning when reading or writing gzip files.
 
@@ -387,8 +525,6 @@ def xopen(filename, mode='r', compresslevel=6, threads=None):
     if mode not in ('rt', 'rb', 'wt', 'wb', 'at', 'ab'):
         raise ValueError("Mode '{}' not supported".format(mode))
     filename = fspath(filename)
-    if compresslevel not in range(1, 10):
-        raise ValueError("compresslevel must be between 1 and 9")
 
     if filename == '-':
         return _open_stdin_or_out(mode)


=====================================
src/xopen/_version.py
=====================================
@@ -1,4 +1,4 @@
 # coding: utf-8
 # file generated by setuptools_scm
 # don't change, don't track in version control
-version = '0.9.0'
+version = '1.0.0'


=====================================
tests/test_xopen.py
=====================================
@@ -1,13 +1,15 @@
 import io
 import os
 import random
+import shutil
 import signal
+import sys
 import time
 import pytest
 from pathlib import Path
 
-from xopen import xopen, PipedGzipReader, PipedGzipWriter
-
+from xopen import xopen, PipedCompressionWriter, PipedGzipReader, \
+    PipedGzipWriter, _MAX_PIPE_SIZE, _can_read_concatenated_gz
 
 extensions = ["", ".gz", ".bz2"]
 
@@ -17,6 +19,13 @@ try:
 except ImportError:
     lzma = None
 
+try:
+    import fcntl
+    if not hasattr(fcntl, "F_GETPIPE_SZ") and sys.platform == "linux":
+        setattr(fcntl, "F_GETPIPE_SZ", 1032)
+except ImportError:
+    fcntl = None
+
 base = "tests/file.txt"
 files = [base + ext for ext in extensions]
 CONTENT_LINES = ['Testing, testing ...\n', 'The second line.\n']
@@ -33,6 +42,23 @@ def fname(request):
     return request.param
 
 
+ at pytest.fixture
+def lacking_pigz_permissions(tmp_path):
+    """
+    Set PATH to a directory that contains a pigz binary with permissions set to 000.
+    If no suitable pigz binary could be found, PATH is set to an empty directory
+    """
+    pigz_path = shutil.which("pigz")
+    if pigz_path:
+        shutil.copy(pigz_path, str(tmp_path))
+        os.chmod(str(tmp_path / "pigz"), 0)
+
+    path = os.environ["PATH"]
+    os.environ["PATH"] = str(tmp_path)
+    yield
+    os.environ["PATH"] = path
+
+
 @pytest.fixture
 def large_gzip(tmpdir):
     path = str(tmpdir.join("large.gz"))
@@ -394,3 +420,25 @@ if lzma is not None:
     def test_detect_xz_file_format_from_content():
         with xopen("tests/file.txt.xz.test", "rb") as fh:
             assert fh.readline() == CONTENT_LINES[0].encode("utf-8")
+
+
+def test_concatenated_gzip_function():
+    assert _can_read_concatenated_gz("gzip") is True
+    assert _can_read_concatenated_gz("pigz") is True
+    assert _can_read_concatenated_gz("xz") is False
+
+
+ at pytest.mark.skipif(
+    not hasattr(fcntl, "F_GETPIPE_SZ") and _MAX_PIPE_SIZE is not None,
+    reason="Pipe size modifications not available on this platform.")
+def test_pipesize_changed(tmpdir):
+    path = Path(str(tmpdir), "hello.gz")
+    with xopen(path, "wb") as f:
+        assert isinstance(f, PipedCompressionWriter)
+        assert fcntl.fcntl(f._file.fileno(),
+                           fcntl.F_GETPIPE_SZ) == _MAX_PIPE_SIZE
+
+
+def test_xopen_falls_back_to_gzip_open(lacking_pigz_permissions):
+    with xopen("tests/file.txt.gz", "rb") as f:
+        assert f.readline() == CONTENT_LINES[0].encode("utf-8")


=====================================
tox.ini
=====================================
@@ -1,5 +1,5 @@
 [tox]
-envlist = flake8,py35,py36,py37,py38,pypy3
+envlist = flake8,py35,py36,py37,py38,py39,pypy3
 
 [testenv]
 deps = pytest



View it on GitLab: https://salsa.debian.org/med-team/python-xopen/-/compare/d17838a43679a605d22f5e50fbf0e50bbf990481...6db0f81505f5a02253d9a43952d33df6e94f0ff7

-- 
View it on GitLab: https://salsa.debian.org/med-team/python-xopen/-/compare/d17838a43679a605d22f5e50fbf0e50bbf990481...6db0f81505f5a02253d9a43952d33df6e94f0ff7
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20201108/23250e77/attachment-0001.html>


More information about the debian-med-commit mailing list