[Debian-med-packaging] Bug#1055687: khmer ftbfs with Python 3.12

Wed Nov 29 13:26:01 GMT 2023

Hi Olivier,

thanks a lot for your patches.  When looking at the first patch
it applies to a series file containing the patches

 refresh_cython
 find_object_files_at_right_loc.patch

at the end.  I'd love to profit from all those patches.  Where
can I find these?

Kind regards
   Andreas.

Am Sun, Nov 26, 2023 at 09:06:39PM +0100 schrieb Olivier Gayot:
> Package: khmer
> Followup-For: Bug #1055687
> User: ubuntu-devel at lists.ubuntu.com
> Usertags: origin-ubuntu noble ubuntu-patch
> Control: tags -1 patch
> 
> Dear Maintainer,
> 
> My previous patch was unfortunately very incomplete. I am submitting
> another patch that fixes the remaining issues when building with Python
> 3.12. Both debdiffs should be applied for the build to succeed.
> 
> Additional fixes that were needed for the build to succeed:
> 
>  * Python 3.12 dropped the "imp" module. Updated to use importlib
> instead.
>  * Python 3.12 is much less forgiving when a script opens a file for
> writing and forgets to close it. Most Python scripts in scripts/ failed
> to do so or did so inconsistently. This resulted in the test suite
> either hanging or failing. I went through all invocations of open() /
> get_file_writer() and ensured that the resources are cleaned up. The
> resulting patch is sadly difficult to read though.
> 
> I submitted all changes upstream, although I doubt somebody will pick
> them up:
> 
> https://github.com/dib-lab/khmer/pull/1922
> 
> In Ubuntu, the attached patch was applied to achieve the following:
> 
>   * Fix build against Python 3.12 (LP: #2044383).
> 
> Thanks for considering the patch.
> 
> 
> -- System Information:
> Debian Release: trixie/sid
>   APT prefers mantic-updates
>   APT policy: (500, 'mantic-updates'), (500, 'mantic-security'), (500, 'mantic')
> Architecture: amd64 (x86_64)
> Foreign Architectures: i386
> 
> Kernel: Linux 6.1.0-16-generic (SMP w/8 CPU threads; PREEMPT)
> Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
> Shell: /bin/sh linked to /usr/bin/dash
> Init: systemd (via /run/systemd/system)
> LSM: AppArmor: enabled

> diff -Nru khmer-3.0.0~a3+dfsg/debian/control khmer-3.0.0~a3+dfsg/debian/control
> --- khmer-3.0.0~a3+dfsg/debian/control	2023-11-25 17:44:28.000000000 +0100
> +++ khmer-3.0.0~a3+dfsg/debian/control	2023-11-26 02:28:32.000000000 +0100
> @@ -1,6 +1,5 @@
>  Source: khmer
> -Maintainer: Ubuntu Developers <ubuntu-devel-discuss at lists.ubuntu.com>
> -XSBC-Original-Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
> +Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
>  Uploaders: Michael R. Crusoe <crusoe at debian.org>,
>             Kevin Murray <spam at kdmurray.id.au>
>  Section: science
> diff -Nru khmer-3.0.0~a3+dfsg/debian/patches/close-opened-files.patch khmer-3.0.0~a3+dfsg/debian/patches/close-opened-files.patch
> --- khmer-3.0.0~a3+dfsg/debian/patches/close-opened-files.patch	1970-01-01 01:00:00.000000000 +0100
> +++ khmer-3.0.0~a3+dfsg/debian/patches/close-opened-files.patch	2023-11-26 02:28:32.000000000 +0100
> @@ -0,0 +1,1124 @@
> +Description: ensure that Python scripts close files that they open for writing 
> + Python scripts under scripts/ in the source tree do not consistently close
> + files that they open for writing. While some of the scripts use context
> + managers, most of them do not (or do so inconsistently).
> + In previous releases of Ubuntu, this apparently was not much of a concern.
> + However, Python 3.12 seems to be much less forgiving when files are not
> + properly closed. When running the test suite, many of the files that are not
> + explicitly closed appear truncated. This leads to various tests failing or
> + hanging and causing FTBFS when the test suite runs at build time.
> + .
> + Furthermore, khmer defines the get_file_writer() function, but it cannot be
> + consistently used as a context manager because it sometimes closes the
> + underlying file descriptor ; and sometimes does not depending on the
> + arguments.
> + .
> + Fixed by defining a new FileWriter context manager and ensuring that
> + each call to open() / get_file_writer() frees up resources properly.
> +Author: Olivier Gayot <olivier.gayot at canonical.com>
> +Bug-Ubuntu: https://launchpad.net/bugs/2044383
> +Bug-Debian: https://bugs.debian.org/1055687
> +Forwarded: https://github.com/dib-lab/khmer/pull/1922
> +Last-Update: 2023-11-26
> +---
> +This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
> +Index: khmer-3.0.0~a3+dfsg/scripts/abundance-dist.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/abundance-dist.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/abundance-dist.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -42,6 +42,7 @@
> + Use '-h' for parameter help.
> + """
> + 
> ++import contextlib
> + import sys
> + import csv
> + import khmer
> +@@ -143,26 +144,28 @@
> +         sys.exit(1)
> + 
> +     if args.output_histogram_filename in ('-', '/dev/stdout'):
> +-        countgraph_fp = sys.stdout
> ++        countgraph_ctx = contextlib.nullcontext(enter_result=sys.stdout)
> +     else:
> +-        countgraph_fp = open(args.output_histogram_filename, 'w')
> +-    countgraph_fp_csv = csv.writer(countgraph_fp)
> +-    # write headers:
> +-    countgraph_fp_csv.writerow(['abundance', 'count', 'cumulative',
> +-                                'cumulative_fraction'])
> +-
> +-    sofar = 0
> +-    for _, i in enumerate(abundances):
> +-        if i == 0 and not args.output_zero:
> +-            continue
> ++        countgraph_ctx = open(args.output_histogram_filename, 'w')
> + 
> +-        sofar += i
> +-        frac = sofar / float(total)
> ++    with countgraph_ctx as countgraph_fp:
> ++        countgraph_fp_csv = csv.writer(countgraph_fp)
> ++        # write headers:
> ++        countgraph_fp_csv.writerow(['abundance', 'count', 'cumulative',
> ++                                    'cumulative_fraction'])
> ++
> ++        sofar = 0
> ++        for _, i in enumerate(abundances):
> ++            if i == 0 and not args.output_zero:
> ++                continue
> + 
> +-        countgraph_fp_csv.writerow([_, i, sofar, round(frac, 3)])
> ++            sofar += i
> ++            frac = sofar / float(total)
> + 
> +-        if sofar == total:
> +-            break
> ++            countgraph_fp_csv.writerow([_, i, sofar, round(frac, 3)])
> ++
> ++            if sofar == total:
> ++                break
> + 
> + 
> + if __name__ == '__main__':
> +Index: khmer-3.0.0~a3+dfsg/scripts/abundance-dist-single.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/abundance-dist-single.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/abundance-dist-single.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -218,6 +218,10 @@
> + 
> +     log_info('wrote to: {output}', output=args.output_histogram_filename)
> + 
> ++    # Ensure that the output files are properly written. Python 3.12 seems to
> ++    # be less forgiving here ..
> ++    hist_fp.close()
> ++
> + 
> + if __name__ == '__main__':
> +     main()
> +Index: khmer-3.0.0~a3+dfsg/scripts/do-partition.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/do-partition.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/do-partition.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -168,8 +168,8 @@
> +         worker_q.put((nodegraph, _, start, end))
> + 
> +     print('enqueued %d subset tasks' % n_subsets, file=sys.stderr)
> +-    open('%s.info' % args.graphbase, 'w').write('%d subsets total\n'
> +-                                                % (n_subsets))
> ++    with open('%s.info' % args.graphbase, 'w') as info_fp:
> ++        info_fp.write('%d subsets total\n' % (n_subsets))
> + 
> +     if n_subsets < args.threads:
> +         args.threads = n_subsets
> +Index: khmer-3.0.0~a3+dfsg/scripts/extract-long-sequences.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/extract-long-sequences.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/extract-long-sequences.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -52,7 +52,7 @@
> + import sys
> + from khmer import __version__
> + from khmer.utils import write_record
> +-from khmer.kfile import add_output_compression_type, get_file_writer
> ++from khmer.kfile import add_output_compression_type, FileWriter
> + from khmer.khmer_args import sanitize_help, KhmerArgumentParser
> + 
> + 
> +@@ -81,12 +81,12 @@
> + 
> + def main():
> +     args = sanitize_help(get_parser()).parse_args()
> +-    outfp = get_file_writer(args.output, args.gzip, args.bzip)
> +-    for filename in args.input_filenames:
> +-        for record in screed.open(filename):
> +-            if len(record['sequence']) >= args.length:
> +-                write_record(record, outfp)
> +-    print('wrote to: ' + args.output.name, file=sys.stderr)
> ++    with FileWriter(args.output, args.gzip, args.bzip) as outfp:
> ++        for filename in args.input_filenames:
> ++            for record in screed.open(filename):
> ++                if len(record['sequence']) >= args.length:
> ++                    write_record(record, outfp)
> ++        print('wrote to: ' + args.output.name, file=sys.stderr)
> + 
> + 
> + if __name__ == '__main__':
> +Index: khmer-3.0.0~a3+dfsg/khmer/kfile.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/khmer/kfile.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/khmer/kfile.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -35,6 +35,7 @@
> + """File handling/checking utilities for command-line scripts."""
> + 
> + 
> ++import contextlib
> + import os
> + import sys
> + import errno
> +@@ -252,3 +253,37 @@
> +         ofile = file_handle
> + 
> +     return ofile
> ++
> ++
> ++ at contextlib.contextmanager
> ++def FileWriter(file_handle, do_gzip, do_bzip, *, steal_ownership=False):
> ++    """Alternative to get_file_writer that requires the use of a with block.
> ++    The intent is to address an inherent problem with get_file_writer() that
> ++    makes it difficult to use as a context manager. When get_file_writer() is
> ++    called with both gzip=False and bzip=False, the underlying file handle is
> ++    returned. As a consequence, doing:
> ++    >  with get_file_writer(sys.stdout, bzip=False, gzip=False) as fh:
> ++    >      pass
> ++    ends up closing sys.stdout when the with block is exited. Using the
> ++    function without a context manager avoids the issue, but then it results in
> ++    leaked open files when either bzip=True or gzip=True.
> ++    FileWriter must be used as a context manager, but it ensures that resources
> ++    are closed upon exiting the with block. Furthermore, it can be explicitly
> ++    requested to close the underlying file_handle."""
> ++    ofile = None
> ++
> ++    if do_gzip and do_bzip:
> ++        raise ValueError("Cannot specify both bzip and gzip compression!")
> ++
> ++    if do_gzip:
> ++        ofile = gzip.GzipFile(fileobj=file_handle, mode='w')
> ++    elif do_bzip:
> ++        ofile = bz2.open(file_handle, mode='w')
> ++    else:
> ++        ofile = contextlib.nullcontext(enter_result=file_handle)
> ++
> ++    with ofile as x:
> ++        yield x
> ++
> ++    if steal_ownership:
> ++        file_handle.close()
> +Index: khmer-3.0.0~a3+dfsg/scripts/extract-paired-reads.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/extract-paired-reads.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/extract-paired-reads.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -44,6 +44,7 @@
> + 
> + Reads FASTQ and FASTA input, retains format for output.
> + """
> ++from contextlib import nullcontext
> + import sys
> + import os.path
> + import textwrap
> +@@ -53,7 +54,7 @@
> + from khmer.khmer_args import sanitize_help, KhmerArgumentParser
> + from khmer.khmer_args import FileType as khFileType
> + from khmer.kfile import add_output_compression_type
> +-from khmer.kfile import get_file_writer
> ++from khmer.kfile import FileWriter
> + 
> + from khmer.utils import broken_paired_reader, write_record, write_record_pair
> + 
> +@@ -132,39 +133,40 @@
> + 
> +     # OVERRIDE default output file locations with -p, -s
> +     if args.output_paired:
> +-        paired_fp = get_file_writer(args.output_paired, args.gzip, args.bzip)
> +-        out2 = paired_fp.name
> ++        paired_ctx = FileWriter(args.output_paired, args.gzip, args.bzip)
> ++        out2 = args.output_paired.name
> +     else:
> +         # Don't override, just open the default filename from above
> +-        paired_fp = get_file_writer(open(out2, 'wb'), args.gzip, args.bzip)
> ++        paired_ctx = FileWriter(open(out2, 'wb'), args.gzip, args.bzip,
> ++                                steal_ownership=True)
> +     if args.output_single:
> +-        single_fp = get_file_writer(args.output_single, args.gzip, args.bzip)
> ++        single_ctx = FileWriter(args.output_single, args.gzip, args.bzip)
> +         out1 = args.output_single.name
> +     else:
> +         # Don't override, just open the default filename from above
> +-        single_fp = get_file_writer(open(out1, 'wb'), args.gzip, args.bzip)
> ++        single_ctx = FileWriter(open(out1, 'wb'), args.gzip, args.bzip,
> ++                                steal_ownership=True)
> + 
> +-    print('reading file "%s"' % infile, file=sys.stderr)
> +-    print('outputting interleaved pairs to "%s"' % out2, file=sys.stderr)
> +-    print('outputting orphans to "%s"' % out1, file=sys.stderr)
> +-
> +-    n_pe = 0
> +-    n_se = 0
> +-
> +-    reads = ReadParser(infile)
> +-    for index, is_pair, read1, read2 in broken_paired_reader(reads):
> +-        if index % 100000 == 0 and index > 0:
> +-            print('...', index, file=sys.stderr)
> +-
> +-        if is_pair:
> +-            write_record_pair(read1, read2, paired_fp)
> +-            n_pe += 1
> +-        else:
> +-            write_record(read1, single_fp)
> +-            n_se += 1
> +-
> +-    single_fp.close()
> +-    paired_fp.close()
> ++    with paired_ctx as paired_fp, single_ctx as single_fp:
> ++        print('reading file "%s"' % infile, file=sys.stderr)
> ++        print('outputting interleaved pairs to "%s"' % out2,
> ++              file=sys.stderr)
> ++        print('outputting orphans to "%s"' % out1, file=sys.stderr)
> ++
> ++        n_pe = 0
> ++        n_se = 0
> ++
> ++        reads = ReadParser(infile)
> ++        for index, is_pair, read1, read2 in broken_paired_reader(reads):
> ++            if index % 100000 == 0 and index > 0:
> ++                print('...', index, file=sys.stderr)
> ++
> ++            if is_pair:
> ++                write_record_pair(read1, read2, paired_fp)
> ++                n_pe += 1
> ++            else:
> ++                write_record(read1, single_fp)
> ++                n_se += 1
> + 
> +     if n_pe == 0:
> +         raise TypeError("no paired reads!? check file formats...")
> +Index: khmer-3.0.0~a3+dfsg/scripts/extract-partitions.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/extract-partitions.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/extract-partitions.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -301,10 +301,10 @@
> +                                    args.max_size)
> + 
> +     if args.output_unassigned:
> +-        ofile = open('%s.unassigned.%s' % (args.prefix, suffix), 'wb')
> +-        unassigned_fp = get_file_writer(ofile, args.gzip, args.bzip)
> +-        extractor.process_unassigned(unassigned_fp)
> +-        unassigned_fp.close()
> ++        with open('%s.unassigned.%s' % (args.prefix, suffix), 'wb') as ofile:
> ++            unassigned_fp = get_file_writer(ofile, args.gzip, args.bzip)
> ++            extractor.process_unassigned(unassigned_fp)
> ++            unassigned_fp.close()
> +     else:
> +         extractor.process_unassigned()
> + 
> +@@ -320,13 +320,21 @@
> +         print('nothing to output; exiting!', file=sys.stderr)
> +         return
> + 
> ++    to_close = []
> +     # open a bunch of output files for the different groups
> +     group_fps = {}
> +     for index in range(extractor.group_n):
> +         fname = '%s.group%04d.%s' % (args.prefix, index, suffix)
> +-        group_fp = get_file_writer(open(fname, 'wb'), args.gzip,
> ++        back_fp = open(fname, 'wb')
> ++        group_fp = get_file_writer(back_fp, args.gzip,
> +                                    args.bzip)
> +         group_fps[index] = group_fp
> ++        # It feels more natural to close the writer before closing the
> ++        # underlying file. fp.close() is theoretically idempotent, so it should
> ++        # be fine even though sometimes get_file_writer "steals" ownership of
> ++        # the underlying stream.
> ++        to_close.append(group_fp)
> ++        to_close.append(back_fp)
> + 
> +     # write 'em all out!
> +     # refresh the generator
> +@@ -351,6 +359,9 @@
> +            args.prefix,
> +            suffix), file=sys.stderr)
> + 
> ++    for fp in to_close:
> ++        fp.close()
> ++
> + 
> + if __name__ == '__main__':
> +     main()
> +Index: khmer-3.0.0~a3+dfsg/scripts/fastq-to-fasta.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/fastq-to-fasta.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/fastq-to-fasta.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -45,7 +45,7 @@
> + import sys
> + import screed
> + from khmer import __version__
> +-from khmer.kfile import (add_output_compression_type, get_file_writer,
> ++from khmer.kfile import (add_output_compression_type, FileWriter,
> +                          describe_file_handle)
> + from khmer.utils import write_record
> + from khmer.khmer_args import sanitize_help, KhmerArgumentParser
> +@@ -74,21 +74,21 @@
> +     args = sanitize_help(get_parser()).parse_args()
> + 
> +     print('fastq from ', args.input_sequence, file=sys.stderr)
> +-    outfp = get_file_writer(args.output, args.gzip, args.bzip)
> +-    n_count = 0
> +-    for n, record in enumerate(screed.open(args.input_sequence)):
> +-        if n % 10000 == 0:
> +-            print('...', n, file=sys.stderr)
> +-
> +-        sequence = record['sequence']
> +-
> +-        if 'N' in sequence:
> +-            if not args.n_keep:
> +-                n_count += 1
> +-                continue
> ++    with FileWriter(args.output, args.gzip, args.bzip) as outfp:
> ++        n_count = 0
> ++        for n, record in enumerate(screed.open(args.input_sequence)):
> ++            if n % 10000 == 0:
> ++                print('...', n, file=sys.stderr)
> ++
> ++            sequence = record['sequence']
> ++
> ++            if 'N' in sequence:
> ++                if not args.n_keep:
> ++                    n_count += 1
> ++                    continue
> + 
> +-        del record['quality']
> +-        write_record(record, outfp)
> ++            del record['quality']
> ++            write_record(record, outfp)
> + 
> +     print('\n' + 'lines from ' + args.input_sequence, file=sys.stderr)
> + 
> +Index: khmer-3.0.0~a3+dfsg/scripts/filter-abund.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/filter-abund.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/filter-abund.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -44,6 +44,7 @@
> + 
> + Use '-h' for parameter help.
> + """
> ++from contextlib import nullcontext
> + import sys
> + import os
> + import textwrap
> +@@ -56,7 +57,7 @@
> +                               sanitize_help, check_argument_range)
> + from khmer.khmer_args import FileType as khFileType
> + from khmer.kfile import (check_input_files, check_space,
> +-                         add_output_compression_type, get_file_writer)
> ++                         add_output_compression_type, FileWriter)
> + from khmer.khmer_logger import (configure_logging, log_info, log_error,
> +                                 log_warn)
> + from khmer.trimming import (trim_record)
> +@@ -137,31 +138,38 @@
> + 
> +     if args.single_output_file:
> +         outfile = args.single_output_file.name
> +-        outfp = get_file_writer(args.single_output_file, args.gzip, args.bzip)
> ++        out_single_ctx = FileWriter(args.single_output_file, args.gzip,
> ++                                    args.bzip)
> ++    else:
> ++        out_single_ctx = nullcontext()
> ++
> ++    with out_single_ctx as out_single_fp:
> ++        # the filtering loop
> ++        for infile in infiles:
> ++            log_info('filtering {infile}', infile=infile)
> ++            if not args.single_output_file:
> ++                outfile = os.path.basename(infile) + '.abundfilt'
> ++                out_ctx = FileWriter(open(outfile, 'wb'), args.gzip,
> ++                                     args.bzip, steal_ownership=True)
> ++            else:
> ++                out_ctx = nullcontext(enter_result=out_single_fp)
> ++
> ++            paired_iter = broken_paired_reader(ReadParser(infile),
> ++                                               min_length=ksize,
> ++                                               force_single=True)
> ++
> ++            with out_ctx as outfp:
> ++                for n, is_pair, read1, read2 in paired_iter:
> ++                    assert not is_pair
> ++                    assert read2 is None
> ++
> ++                    trimmed_record, _ = trim_record(countgraph, read1, args.cutoff,
> ++                                                    args.variable_coverage,
> ++                                                    args.normalize_to)
> ++                    if trimmed_record:
> ++                        write_record(trimmed_record, outfp)
> + 
> +-    # the filtering loop
> +-    for infile in infiles:
> +-        log_info('filtering {infile}', infile=infile)
> +-        if not args.single_output_file:
> +-            outfile = os.path.basename(infile) + '.abundfilt'
> +-            outfp = open(outfile, 'wb')
> +-            outfp = get_file_writer(outfp, args.gzip, args.bzip)
> +-
> +-        paired_iter = broken_paired_reader(ReadParser(infile),
> +-                                           min_length=ksize,
> +-                                           force_single=True)
> +-
> +-        for n, is_pair, read1, read2 in paired_iter:
> +-            assert not is_pair
> +-            assert read2 is None
> +-
> +-            trimmed_record, _ = trim_record(countgraph, read1, args.cutoff,
> +-                                            args.variable_coverage,
> +-                                            args.normalize_to)
> +-            if trimmed_record:
> +-                write_record(trimmed_record, outfp)
> +-
> +-        log_info('output in {outfile}', outfile=outfile)
> ++                log_info('output in {outfile}', outfile=outfile)
> + 
> + 
> + if __name__ == '__main__':
> +Index: khmer-3.0.0~a3+dfsg/scripts/filter-abund-single.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/filter-abund-single.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/filter-abund-single.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -60,7 +60,7 @@
> + from khmer.kfile import (check_input_files, check_space,
> +                          check_space_for_graph,
> +                          add_output_compression_type,
> +-                         get_file_writer)
> ++                         FileWriter)
> + from khmer.khmer_logger import (configure_logging, log_info, log_error,
> +                                 log_warn)
> + from khmer.trimming import (trim_record)
> +@@ -160,22 +160,23 @@
> +         outfile = os.path.basename(args.datafile) + '.abundfilt'
> +     else:
> +         outfile = args.outfile
> +-    outfp = open(outfile, 'wb')
> +-    outfp = get_file_writer(outfp, args.gzip, args.bzip)
> + 
> +-    paired_iter = broken_paired_reader(ReadParser(args.datafile),
> +-                                       min_length=graph.ksize(),
> +-                                       force_single=True)
> +-
> +-    for n, is_pair, read1, read2 in paired_iter:
> +-        assert not is_pair
> +-        assert read2 is None
> +-
> +-        trimmed_record, _ = trim_record(graph, read1, args.cutoff,
> +-                                        args.variable_coverage,
> +-                                        args.normalize_to)
> +-        if trimmed_record:
> +-            write_record(trimmed_record, outfp)
> ++    with FileWriter(open(outfile, 'wb'), args.gzip, args.bzip,
> ++                    steal_ownership=True) as outfp:
> ++
> ++        paired_iter = broken_paired_reader(ReadParser(args.datafile),
> ++                                           min_length=graph.ksize(),
> ++                                           force_single=True)
> ++
> ++        for n, is_pair, read1, read2 in paired_iter:
> ++            assert not is_pair
> ++            assert read2 is None
> ++
> ++            trimmed_record, _ = trim_record(graph, read1, args.cutoff,
> ++                                            args.variable_coverage,
> ++                                            args.normalize_to)
> ++            if trimmed_record:
> ++                write_record(trimmed_record, outfp)
> + 
> +     log_info('output in {outfile}', outfile=outfile)
> + 
> +Index: khmer-3.0.0~a3+dfsg/scripts/filter-stoptags.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/filter-stoptags.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/filter-stoptags.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -108,10 +108,9 @@
> +         print('filtering', infile, file=sys.stderr)
> +         outfile = os.path.basename(infile) + '.stopfilt'
> + 
> +-        outfp = open(outfile, 'w')
> +-
> +-        tsp = ThreadedSequenceProcessor(process_fn)
> +-        tsp.start(verbose_loader(infile), outfp)
> ++        with open(outfile, 'w') as outfp:
> ++            tsp = ThreadedSequenceProcessor(process_fn)
> ++            tsp.start(verbose_loader(infile), outfp)
> + 
> +         print('output in', outfile, file=sys.stderr)
> + 
> +Index: khmer-3.0.0~a3+dfsg/scripts/interleave-reads.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/interleave-reads.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/interleave-reads.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -52,7 +52,7 @@
> + from khmer.kfile import check_input_files, check_space
> + from khmer.khmer_args import sanitize_help, KhmerArgumentParser
> + from khmer.khmer_args import FileType as khFileType
> +-from khmer.kfile import (add_output_compression_type, get_file_writer,
> ++from khmer.kfile import (add_output_compression_type, FileWriter,
> +                          describe_file_handle)
> + from khmer.utils import (write_record_pair, check_is_left, check_is_right,
> +                          check_is_pair)
> +@@ -109,42 +109,41 @@
> + 
> +     print("Interleaving:\n\t%s\n\t%s" % (s1_file, s2_file), file=sys.stderr)
> + 
> +-    outfp = get_file_writer(args.output, args.gzip, args.bzip)
> +-
> +-    counter = 0
> +-    screed_iter_1 = screed.open(s1_file)
> +-    screed_iter_2 = screed.open(s2_file)
> +-    for read1, read2 in zip_longest(screed_iter_1, screed_iter_2):
> +-        if read1 is None or read2 is None:
> +-            print(("ERROR: Input files contain different number"
> +-                   " of records."), file=sys.stderr)
> +-            sys.exit(1)
> +-
> +-        if counter % 100000 == 0:
> +-            print('...', counter, 'pairs', file=sys.stderr)
> +-        counter += 1
> +-
> +-        name1 = read1.name
> +-        name2 = read2.name
> +-
> +-        if not args.no_reformat:
> +-            if not check_is_left(name1):
> +-                name1 += '/1'
> +-            if not check_is_right(name2):
> +-                name2 += '/2'
> +-
> +-            read1.name = name1
> +-            read2.name = name2
> +-
> +-            if not check_is_pair(read1, read2):
> +-                print("ERROR: This doesn't look like paired data! "
> +-                      "%s %s" % (read1.name, read2.name), file=sys.stderr)
> ++    with FileWriter(args.output, args.gzip, args.bzip) as outfp:
> ++        counter = 0
> ++        screed_iter_1 = screed.open(s1_file)
> ++        screed_iter_2 = screed.open(s2_file)
> ++        for read1, read2 in zip_longest(screed_iter_1, screed_iter_2):
> ++            if read1 is None or read2 is None:
> ++                print(("ERROR: Input files contain different number"
> ++                       " of records."), file=sys.stderr)
> +                 sys.exit(1)
> + 
> +-        write_record_pair(read1, read2, outfp)
> ++            if counter % 100000 == 0:
> ++                print('...', counter, 'pairs', file=sys.stderr)
> ++            counter += 1
> ++
> ++            name1 = read1.name
> ++            name2 = read2.name
> ++
> ++            if not args.no_reformat:
> ++                if not check_is_left(name1):
> ++                    name1 += '/1'
> ++                if not check_is_right(name2):
> ++                    name2 += '/2'
> ++
> ++                read1.name = name1
> ++                read2.name = name2
> ++
> ++                if not check_is_pair(read1, read2):
> ++                    print("ERROR: This doesn't look like paired data! "
> ++                          "%s %s" % (read1.name, read2.name), file=sys.stderr)
> ++                    sys.exit(1)
> ++
> ++            write_record_pair(read1, read2, outfp)
> + 
> +-    print('final: interleaved %d pairs' % counter, file=sys.stderr)
> +-    print('output written to', describe_file_handle(outfp), file=sys.stderr)
> ++        print('final: interleaved %d pairs' % counter, file=sys.stderr)
> ++        print('output written to', describe_file_handle(outfp), file=sys.stderr)
> + 
> + 
> + if __name__ == '__main__':
> +Index: khmer-3.0.0~a3+dfsg/scripts/normalize-by-median.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/normalize-by-median.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/normalize-by-median.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -46,6 +46,7 @@
> + Use '-h' for parameter help.
> + """
> + 
> ++from contextlib import nullcontext
> + import sys
> + import screed
> + import os
> +@@ -60,7 +61,7 @@
> + import argparse
> + from khmer.kfile import (check_space, check_space_for_graph,
> +                          check_valid_file_exists, add_output_compression_type,
> +-                         get_file_writer, describe_file_handle)
> ++                         FileWriter, describe_file_handle)
> + from khmer.utils import (write_record, broken_paired_reader, ReadBundle,
> +                          clean_input_reads)
> + from khmer.khmer_logger import (configure_logging, log_info, log_error)
> +@@ -360,39 +361,43 @@
> +     output_name = None
> + 
> +     if args.single_output_file:
> +-        outfp = get_file_writer(args.single_output_file, args.gzip, args.bzip)
> ++        out_single_ctx = FileWriter(args.single_output_file, args.gzip, args.bzip)
> +     else:
> ++        out_single_ctx = nullcontext()
> +         if '-' in filenames or '/dev/stdin' in filenames:
> +             print("Accepting input from stdin; output filename must "
> +                   "be provided with '-o'.", file=sys.stderr)
> +             sys.exit(1)
> + 
> +-    #
> +-    # main loop: iterate over all files given, do diginorm.
> +-    #
> +-
> +-    for filename, require_paired in files:
> +-        if not args.single_output_file:
> +-            output_name = os.path.basename(filename) + '.keep'
> +-            outfp = open(output_name, 'wb')
> +-            outfp = get_file_writer(outfp, args.gzip, args.bzip)
> +-
> +-        # failsafe context manager in case an input file breaks
> +-        with catch_io_errors(filename, outfp, args.single_output_file,
> +-                             args.force, corrupt_files):
> +-            screed_iter = clean_input_reads(screed.open(filename))
> +-            reader = broken_paired_reader(screed_iter, min_length=args.ksize,
> +-                                          force_single=force_single,
> +-                                          require_paired=require_paired)
> +-
> +-            # actually do diginorm
> +-            for record in with_diagnostics(reader, filename):
> +-                if record is not None:
> +-                    write_record(record, outfp)
> +-
> +-            log_info('output in {name}', name=describe_file_handle(outfp))
> ++    with out_single_ctx as out_single_fp:
> ++        #
> ++        # main loop: iterate over all files given, do diginorm.
> ++        #
> ++        for filename, require_paired in files:
> +             if not args.single_output_file:
> +-                outfp.close()
> ++                output_name = os.path.basename(filename) + '.keep'
> ++                out_ctx = FileWriter(open(output_name, 'wb'), args.gzip,
> ++                                     args.bzip, steal_ownership=True)
> ++            else:
> ++                out_ctx = nullcontext(enter_result=out_single_fp)
> ++
> ++            with out_ctx as outfp:
> ++                # failsafe context manager in case an input file breaks
> ++                with catch_io_errors(filename, outfp, args.single_output_file,
> ++                                     args.force, corrupt_files):
> ++                    screed_iter = clean_input_reads(screed.open(filename))
> ++                    reader = broken_paired_reader(screed_iter,
> ++                                                  min_length=args.ksize,
> ++                                                  force_single=force_single,
> ++                                                  require_paired=require_paired)
> ++
> ++                    # actually do diginorm
> ++                    for record in with_diagnostics(reader, filename):
> ++                        if record is not None:
> ++                            write_record(record, outfp)
> ++
> ++                    log_info('output in {name}',
> ++                             name=describe_file_handle(outfp))
> + 
> +     # finished - print out some diagnostics.
> + 
> +Index: khmer-3.0.0~a3+dfsg/scripts/partition-graph.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/partition-graph.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/partition-graph.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -143,7 +143,8 @@
> +         worker_q.put((nodegraph, _, start, end))
> + 
> +     print('enqueued %d subset tasks' % n_subsets, file=sys.stderr)
> +-    open('%s.info' % basename, 'w').write('%d subsets total\n' % (n_subsets))
> ++    with open('%s.info' % basename, 'w') as info_fp:
> ++        info_fp.write('%d subsets total\n' % (n_subsets))
> + 
> +     n_threads = args.threads
> +     if n_subsets < n_threads:
> +Index: khmer-3.0.0~a3+dfsg/scripts/sample-reads-randomly.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/sample-reads-randomly.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/sample-reads-randomly.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -47,6 +47,7 @@
> + """
> + 
> + import argparse
> ++from contextlib import nullcontext
> + import os.path
> + import random
> + import textwrap
> +@@ -55,7 +56,7 @@
> + from khmer import __version__
> + from khmer import ReadParser
> + from khmer.kfile import (check_input_files, add_output_compression_type,
> +-                         get_file_writer)
> ++                         FileWriter)
> + from khmer.khmer_args import sanitize_help, KhmerArgumentParser
> + from khmer.utils import write_record, broken_paired_reader
> + 
> +@@ -201,27 +202,27 @@
> +         print('Writing %d sequences to %s' %
> +               (len(reads[0]), output_filename), file=sys.stderr)
> + 
> +-        output_file = args.output_file
> +-        if not output_file:
> +-            output_file = open(output_filename, 'wb')
> +-
> +-        output_file = get_file_writer(output_file, args.gzip, args.bzip)
> +-
> +-        for records in reads[0]:
> +-            write_record(records[0], output_file)
> +-            if records[1] is not None:
> +-                write_record(records[1], output_file)
> ++        output_back_ctx = nullcontext(args.output_file)
> ++        if not args.output_file:
> ++            output_back_ctx = open(output_filename, 'wb')
> ++
> ++        with output_back_ctx as output_back_fp:
> ++            with FileWriter(output_back_fp, args.gzip, args.bzip) as output_fp:
> ++                for records in reads[0]:
> ++                    write_record(records[0], output_fp)
> ++                    if records[1] is not None:
> ++                        write_record(records[1], output_fp)
> +     else:
> +         for n in range(num_samples):
> +             n_filename = output_filename + '.%d' % n
> +             print('Writing %d sequences to %s' %
> +                   (len(reads[n]), n_filename), file=sys.stderr)
> +-            output_file = get_file_writer(open(n_filename, 'wb'), args.gzip,
> +-                                          args.bzip)
> +-            for records in reads[n]:
> +-                write_record(records[0], output_file)
> +-                if records[1] is not None:
> +-                    write_record(records[1], output_file)
> ++            with FileWriter(open(n_filename, 'wb'), args.gzip, args.bzip,
> ++                            steal_ownership=True) as output_fp:
> ++                for records in reads[n]:
> ++                    write_record(records[0], output_fp)
> ++                    if records[1] is not None:
> ++                        write_record(records[1], output_fp)
> + 
> + 
> + if __name__ == '__main__':
> +Index: khmer-3.0.0~a3+dfsg/scripts/split-paired-reads.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/split-paired-reads.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/split-paired-reads.py	2023-11-26 20:23:39.911485747 +0100
> +@@ -44,6 +44,7 @@
> + 
> + Reads FASTQ and FASTA input, retains format for output.
> + """
> ++from contextlib import nullcontext
> + import sys
> + import os
> + import textwrap
> +@@ -56,7 +57,7 @@
> +                          UnpairedReadsError)
> + from khmer.kfile import (check_input_files, check_space,
> +                          add_output_compression_type,
> +-                         get_file_writer, describe_file_handle)
> ++                         FileWriter, describe_file_handle)
> + 
> + 
> + def get_parser():
> +@@ -145,22 +146,26 @@
> + 
> +     # OVERRIDE output file locations with -1, -2
> +     if args.output_first:
> +-        fp_out1 = get_file_writer(args.output_first, args.gzip, args.bzip)
> +-        out1 = fp_out1.name
> ++        out1_ctx = FileWriter(args.output_first, args.gzip, args.bzip)
> ++        out1 = args.output_first.name
> +     else:
> +         # Use default filename created above
> +-        fp_out1 = get_file_writer(open(out1, 'wb'), args.gzip, args.bzip)
> ++        out1_ctx = FileWriter(open(out1, 'wb'), args.gzip, args.bzip,
> ++                              steal_ownership=True)
> +     if args.output_second:
> +-        fp_out2 = get_file_writer(args.output_second, args.gzip, args.bzip)
> +-        out2 = fp_out2.name
> ++        out2_ctx = FileWriter(args.output_second, args.gzip, args.bzip)
> ++        out2 = args.output_second.name
> +     else:
> +         # Use default filename created above
> +-        fp_out2 = get_file_writer(open(out2, 'wb'), args.gzip, args.bzip)
> ++        out2_ctx = FileWriter(open(out2, 'wb'), args.gzip, args.bzip,
> ++                              steal_ownership=True)
> + 
> +     # put orphaned reads here, if -0!
> +     if args.output_orphaned:
> +-        fp_out0 = get_file_writer(args.output_orphaned, args.gzip, args.bzip)
> ++        out0_ctx = FileWriter(args.output_orphaned, args.gzip, args.bzip)
> +         out0 = describe_file_handle(args.output_orphaned)
> ++    else:
> ++        out0_ctx = nullcontext()
> + 
> +     counter1 = 0
> +     counter2 = 0
> +@@ -171,23 +176,24 @@
> +     paired_iter = broken_paired_reader(ReadParser(infile),
> +                                        require_paired=not args.output_orphaned)
> + 
> +-    try:
> +-        for index, is_pair, record1, record2 in paired_iter:
> +-            if index % 10000 == 0:
> +-                print('...', index, file=sys.stderr)
> +-
> +-            if is_pair:
> +-                write_record(record1, fp_out1)
> +-                counter1 += 1
> +-                write_record(record2, fp_out2)
> +-                counter2 += 1
> +-            elif args.output_orphaned:
> +-                write_record(record1, fp_out0)
> +-                counter3 += 1
> +-    except UnpairedReadsError as e:
> +-        print("Unpaired reads found starting at {name}; exiting".format(
> +-            name=e.read1.name), file=sys.stderr)
> +-        sys.exit(1)
> ++    with out0_ctx as fp_out0, out1_ctx as fp_out1, out2_ctx as fp_out2:
> ++        try:
> ++            for index, is_pair, record1, record2 in paired_iter:
> ++                if index % 10000 == 0:
> ++                    print('...', index, file=sys.stderr)
> ++
> ++                if is_pair:
> ++                    write_record(record1, fp_out1)
> ++                    counter1 += 1
> ++                    write_record(record2, fp_out2)
> ++                    counter2 += 1
> ++                elif args.output_orphaned:
> ++                    write_record(record1, fp_out0)
> ++                    counter3 += 1
> ++        except UnpairedReadsError as e:
> ++            print("Unpaired reads found starting at {name}; exiting".format(
> ++                name=e.read1.name), file=sys.stderr)
> ++            sys.exit(1)
> + 
> +     print("DONE; split %d sequences (%d left, %d right, %d orphans)" %
> +           (counter1 + counter2, counter1, counter2, counter3), file=sys.stderr)
> +Index: khmer-3.0.0~a3+dfsg/scripts/trim-low-abund.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/scripts/trim-low-abund.py	2023-11-26 20:23:39.915485717 +0100
> ++++ khmer-3.0.0~a3+dfsg/scripts/trim-low-abund.py	2023-11-26 20:28:16.389478687 +0100
> +@@ -43,6 +43,7 @@
> + 
> + Use -h for parameter help.
> + """
> ++from contextlib import nullcontext
> + import csv
> + import sys
> + import os
> +@@ -63,7 +64,7 @@
> + from khmer.utils import write_record, broken_paired_reader, ReadBundle
> + from khmer.kfile import (check_space, check_space_for_graph,
> +                          check_valid_file_exists, add_output_compression_type,
> +-                         get_file_writer)
> ++                         get_file_writer, FileWriter)
> + from khmer.khmer_logger import configure_logging, log_info, log_error
> + from khmer.trimming import trim_record
> + 
> +@@ -374,108 +375,111 @@
> +     # only create the file writer once if outfp is specified; otherwise,
> +     # create it for each file.
> +     if args.output:
> +-        trimfp = get_file_writer(args.output, args.gzip, args.bzip)
> ++        trim_ctx = FileWriter(args.output, args.gzip, args.bzip)
> ++    else:
> ++        trim_ctx = nullcontext()
> + 
> +     pass2list = []
> +-    for filename in args.input_filenames:
> +-        # figure out temporary filename for 2nd pass
> +-        pass2filename = filename.replace(os.path.sep, '-') + '.pass2'
> +-        pass2filename = os.path.join(tempdir, pass2filename)
> +-        pass2fp = open(pass2filename, 'w')
> +-
> +-        # construct output filenames
> +-        if args.output is None:
> +-            # note: this will be saved in trimfp.
> +-            outfp = open(os.path.basename(filename) + '.abundtrim', 'wb')
> +-
> +-            # get file handle w/gzip, bzip
> +-            trimfp = get_file_writer(outfp, args.gzip, args.bzip)
> +-
> +-        # record all this info
> +-        pass2list.append((filename, pass2filename, trimfp))
> +-
> +-        # input file stuff: get a broken_paired reader.
> +-        paired_iter = broken_paired_reader(ReadParser(filename), min_length=K,
> +-                                           force_single=args.ignore_pairs)
> +-
> +-        # main loop through the file.
> +-        n_start = trimmer.n_reads
> +-        save_start = trimmer.n_saved
> +-
> +-        watermark = REPORT_EVERY_N_READS
> +-        for read in trimmer.pass1(paired_iter, pass2fp):
> +-            if (trimmer.n_reads - n_start) > watermark:
> +-                log_info("... {filename} {n_saved} {n_reads} {n_bp} "
> +-                         "{w_reads} {w_bp}", filename=filename,
> +-                         n_saved=trimmer.n_saved, n_reads=trimmer.n_reads,
> +-                         n_bp=trimmer.n_bp, w_reads=written_reads,
> +-                         w_bp=written_bp)
> +-                watermark += REPORT_EVERY_N_READS
> +-
> +-            # write out the trimmed/etc sequences that AREN'T going to be
> +-            # revisited in a 2nd pass.
> +-            write_record(read, trimfp)
> +-            written_bp += len(read)
> +-            written_reads += 1
> +-        pass2fp.close()
> +-
> +-        log_info("{filename}: kept aside {kept} of {total} from first pass",
> +-                 filename=filename, kept=trimmer.n_saved - save_start,
> +-                 total=trimmer.n_reads - n_start)
> +-
> +-    # first pass goes across all the data, so record relevant stats...
> +-    n_reads = trimmer.n_reads
> +-    n_bp = trimmer.n_bp
> +-    n_skipped = trimmer.n_skipped
> +-    bp_skipped = trimmer.bp_skipped
> +-    save_pass2_total = trimmer.n_saved
> +-
> +-    # ### SECOND PASS. ###
> +-
> +-    # nothing should have been skipped yet!
> +-    assert trimmer.n_skipped == 0
> +-    assert trimmer.bp_skipped == 0
> +-
> +-    if args.single_pass:
> +-        pass2list = []
> +-
> +-    # go back through all the files again.
> +-    for _, pass2filename, trimfp in pass2list:
> +-        log_info('second pass: looking at sequences kept aside in {pass2}',
> +-                 pass2=pass2filename)
> +-
> +-        # note that for this second pass, we don't care about paired
> +-        # reads - they will be output in the same order they're read in,
> +-        # so pairs will stay together if not orphaned.  This is in contrast
> +-        # to the first loop.  Hence, force_single=True below.
> +-
> +-        read_parser = ReadParser(pass2filename)
> +-        paired_iter = broken_paired_reader(read_parser,
> +-                                           min_length=K,
> +-                                           force_single=True)
> +-
> +-        watermark = REPORT_EVERY_N_READS
> +-        for read in trimmer.pass2(paired_iter):
> +-            if (trimmer.n_reads - n_start) > watermark:
> +-                log_info('... x 2 {a} {b} {c} {d} {e} {f} {g}',
> +-                         a=trimmer.n_reads - n_start,
> +-                         b=pass2filename, c=trimmer.n_saved,
> +-                         d=trimmer.n_reads, e=trimmer.n_bp,
> +-                         f=written_reads, g=written_bp)
> +-                watermark += REPORT_EVERY_N_READS
> +-
> +-            write_record(read, trimfp)
> +-            written_reads += 1
> +-            written_bp += len(read)
> +-
> +-        read_parser.close()
> +-
> +-        log_info('removing {pass2}', pass2=pass2filename)
> +-        os.unlink(pass2filename)
> +-
> +-        # if we created our own trimfps, close 'em.
> +-        if not args.output:
> +-            trimfp.close()
> ++    with trim_ctx as trimfp:
> ++        for filename in args.input_filenames:
> ++            # figure out temporary filename for 2nd pass
> ++            pass2filename = filename.replace(os.path.sep, '-') + '.pass2'
> ++            pass2filename = os.path.join(tempdir, pass2filename)
> ++            pass2fp = open(pass2filename, 'w')
> ++
> ++            # construct output filenames
> ++            if args.output is None:
> ++                # note: this will be saved in trimfp.
> ++                outfp = open(os.path.basename(filename) + '.abundtrim', 'wb')
> ++
> ++                # get file handle w/gzip, bzip
> ++                trimfp = get_file_writer(outfp, args.gzip, args.bzip)
> ++
> ++            # record all this info
> ++            pass2list.append((filename, pass2filename, trimfp))
> ++
> ++            # input file stuff: get a broken_paired reader.
> ++            paired_iter = broken_paired_reader(ReadParser(filename), min_length=K,
> ++                                               force_single=args.ignore_pairs)
> ++
> ++            # main loop through the file.
> ++            n_start = trimmer.n_reads
> ++            save_start = trimmer.n_saved
> ++
> ++            watermark = REPORT_EVERY_N_READS
> ++            for read in trimmer.pass1(paired_iter, pass2fp):
> ++                if (trimmer.n_reads - n_start) > watermark:
> ++                    log_info("... {filename} {n_saved} {n_reads} {n_bp} "
> ++                             "{w_reads} {w_bp}", filename=filename,
> ++                             n_saved=trimmer.n_saved, n_reads=trimmer.n_reads,
> ++                             n_bp=trimmer.n_bp, w_reads=written_reads,
> ++                             w_bp=written_bp)
> ++                    watermark += REPORT_EVERY_N_READS
> ++
> ++                # write out the trimmed/etc sequences that AREN'T going to be
> ++                # revisited in a 2nd pass.
> ++                write_record(read, trimfp)
> ++                written_bp += len(read)
> ++                written_reads += 1
> ++            pass2fp.close()
> ++
> ++            log_info("{filename}: kept aside {kept} of {total} from first pass",
> ++                     filename=filename, kept=trimmer.n_saved - save_start,
> ++                     total=trimmer.n_reads - n_start)
> ++
> ++        # first pass goes across all the data, so record relevant stats...
> ++        n_reads = trimmer.n_reads
> ++        n_bp = trimmer.n_bp
> ++        n_skipped = trimmer.n_skipped
> ++        bp_skipped = trimmer.bp_skipped
> ++        save_pass2_total = trimmer.n_saved
> ++
> ++        # ### SECOND PASS. ###
> ++
> ++        # nothing should have been skipped yet!
> ++        assert trimmer.n_skipped == 0
> ++        assert trimmer.bp_skipped == 0
> ++
> ++        if args.single_pass:
> ++            pass2list = []
> ++
> ++        # go back through all the files again.
> ++        for _, pass2filename, trimfp in pass2list:
> ++            log_info('second pass: looking at sequences kept aside in {pass2}',
> ++                     pass2=pass2filename)
> ++
> ++            # note that for this second pass, we don't care about paired
> ++            # reads - they will be output in the same order they're read in,
> ++            # so pairs will stay together if not orphaned.  This is in contrast
> ++            # to the first loop.  Hence, force_single=True below.
> ++
> ++            read_parser = ReadParser(pass2filename)
> ++            paired_iter = broken_paired_reader(read_parser,
> ++                                               min_length=K,
> ++                                               force_single=True)
> ++
> ++            watermark = REPORT_EVERY_N_READS
> ++            for read in trimmer.pass2(paired_iter):
> ++                if (trimmer.n_reads - n_start) > watermark:
> ++                    log_info('... x 2 {a} {b} {c} {d} {e} {f} {g}',
> ++                             a=trimmer.n_reads - n_start,
> ++                             b=pass2filename, c=trimmer.n_saved,
> ++                             d=trimmer.n_reads, e=trimmer.n_bp,
> ++                             f=written_reads, g=written_bp)
> ++                    watermark += REPORT_EVERY_N_READS
> ++
> ++                write_record(read, trimfp)
> ++                written_reads += 1
> ++                written_bp += len(read)
> ++
> ++            read_parser.close()
> ++
> ++            log_info('removing {pass2}', pass2=pass2filename)
> ++            os.unlink(pass2filename)
> ++
> ++            # if we created our own trimfps, close 'em.
> ++            if not args.output:
> ++                trimfp.close()
> + 
> +     try:
> +         log_info('removing temp directory & contents ({temp})', temp=tempdir)
> diff -Nru khmer-3.0.0~a3+dfsg/debian/patches/python3.12-support.patch khmer-3.0.0~a3+dfsg/debian/patches/python3.12-support.patch
> --- khmer-3.0.0~a3+dfsg/debian/patches/python3.12-support.patch	2023-11-25 18:11:03.000000000 +0100
> +++ khmer-3.0.0~a3+dfsg/debian/patches/python3.12-support.patch	2023-11-26 02:28:32.000000000 +0100
> @@ -1,18 +1,25 @@
>  Description: Add support for Python 3.12
>   Ever since Python 3.2, configparser.SafeConfigParser has been deprecated in
> - favor of configparser.ConfigParser. An alias existed for backward compability
> - but the alias was dropped from Python 3.12.
> + favor of configparser.ConfigParser. An alias existed for backward
> + compatibility but the alias was dropped from Python 3.12. Let us now use
> + ConfigParser explicitly, and use .read_file() instead of .readfp() which was
> + dropped too.
> + .
> + The imp module has also been dropped, but a similar behavior can be achieved
> + using importlib.
>  Author: Olivier Gayot <olivier.gayot at canonical.com>
>  Author: Simon Quigley <tsimonq2 at ubuntu.com>
>  Bug-Ubuntu: https://launchpad.net/bugs/2044383
>  Bug-Debian: https://bugs.debian.org/1055687
>  Forwarded: https://github.com/dib-lab/khmer/pull/1922
> -Last-Update: 2023-11-23
> +Last-Update: 2023-11-26
>  ---
>  This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
> ---- a/versioneer.py
> -+++ b/versioneer.py
> -@@ -339,9 +339,9 @@ def get_config_from_root(root):
> +Index: khmer-3.0.0~a3+dfsg/versioneer.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/versioneer.py	2023-11-26 15:33:13.654983243 +0100
> ++++ khmer-3.0.0~a3+dfsg/versioneer.py	2023-11-26 15:33:13.650983273 +0100
> +@@ -339,9 +339,9 @@
>       # configparser.NoOptionError (if it lacks "VCS="). See the docstring at
>       # the top of versioneer.py for instructions on writing your setup.cfg .
>       setup_cfg = os.path.join(root, "setup.cfg")
> @@ -24,3 +31,26 @@
>       VCS = parser.get("versioneer", "VCS")  # mandatory
>   
>       def get(parser, name):
> +Index: khmer-3.0.0~a3+dfsg/tests/test_sandbox_scripts.py
> +===================================================================
> +--- khmer-3.0.0~a3+dfsg.orig/tests/test_sandbox_scripts.py	2023-11-26 15:33:13.654983243 +0100
> ++++ khmer-3.0.0~a3+dfsg/tests/test_sandbox_scripts.py	2023-11-26 15:33:13.650983273 +0100
> +@@ -42,7 +42,7 @@
> + from io import StringIO
> + import traceback
> + import glob
> +-import imp
> ++import importlib
> + 
> + import pytest
> + 
> +@@ -77,7 +77,8 @@
> + @pytest.mark.parametrize("filename", _sandbox_scripts())
> + def test_import_succeeds(filename, tmpdir, capsys):
> +     try:
> +-        mod = imp.load_source('__zzz', filename)
> ++        loader = importlib.machinery.SourceFileLoader('__zzz', filename)
> ++        mod = loader.load_module()
> +     except:
> +         print(traceback.format_exc())
> +         raise AssertionError("%s cannot be imported" % (filename,))
> diff -Nru khmer-3.0.0~a3+dfsg/debian/patches/series khmer-3.0.0~a3+dfsg/debian/patches/series
> --- khmer-3.0.0~a3+dfsg/debian/patches/series	2023-11-25 17:44:28.000000000 +0100
> +++ khmer-3.0.0~a3+dfsg/debian/patches/series	2023-11-26 02:28:32.000000000 +0100
> @@ -18,3 +18,4 @@
>  refresh_cython
>  find_object_files_at_right_loc.patch
>  python3.12-support.patch
> +close-opened-files.patch

> _______________________________________________
> Debian-med-packaging mailing list
> Debian-med-packaging at alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/debian-med-packaging

-- 
http://fam-tille.de