[med-svn] [Git][med-team/biobambam2][master] 8 commits: New upstream version 2.0.177+ds

Étienne Mollier gitlab at salsa.debian.org
Wed Nov 18 21:02:06 GMT 2020



Étienne Mollier pushed to branch master at Debian Med / biobambam2


Commits:
b36bccd1 by Étienne Mollier at 2020-11-12T16:17:53+01:00
New upstream version 2.0.177+ds
- - - - -
1cba8474 by Étienne Mollier at 2020-11-12T16:17:53+01:00
routine-update: New upstream version

- - - - -
f1564365 by Étienne Mollier at 2020-11-12T16:17:54+01:00
Update upstream source from tag 'upstream/2.0.177+ds'

Update to upstream version '2.0.177+ds'
with Debian dir 7084c20732c1d252c294ebab01aa0cbf12ecbb5a
- - - - -
8ca7a625 by Étienne Mollier at 2020-11-18T11:19:27+01:00
add myself to Uploaders

- - - - -
493b0ee7 by Étienne Mollier at 2020-11-18T11:24:04+01:00
build depends on libmaus2-dev >= 2.0.749

- - - - -
b22260ca by Étienne Mollier at 2020-11-18T14:06:05+01:00
add patches to fix various issues

manuals.patch targets manual pages issues reported by lintian.
spelling-error-in-binary.patch fixes one of the spelling issues
reported by lintian.

- - - - -
009dc429 by Étienne Mollier at 2020-11-18T15:39:35+01:00
one last typo fix

- - - - -
bfc5cd7d by Étienne Mollier at 2020-11-18T21:59:20+01:00
ready to upload to unstable

- - - - -


20 changed files:

- .gitignore
- ChangeLog
- configure.ac
- debian/changelog
- debian/control
- + debian/patches/manuals.patch
- + debian/patches/series
- + debian/patches/spelling-error-in-binary.patch
- release.sh
- + removespace.sh
- src/Makefile.am
- src/biobambam2/BamBamConfig.hpp.in
- src/biobambam2/UpdateNumericalIndex.cpp
- src/programs/bamcollate2.cpp
- src/programs/bamconsensus.cpp
- + src/programs/bamdifference.cpp
- src/programs/bammerge.cpp
- src/programs/bamtofastq.cpp
- src/programs/blastnxmltobam.cpp
- + src/programs/fastaselectreg.cpp


Changes:

=====================================
.gitignore
=====================================
@@ -70,3 +70,5 @@ src/vcffiltersamples
 src/vcfdiff
 src/vcffilterfilterflags
 src/vcfreplacecontigsmap
+src/fastaselectreg
+src/bamdifference


=====================================
ChangeLog
=====================================
@@ -1,3 +1,30 @@
+biobambam2 (2.0.177-1) unstable; urgency=medium
+
+  * Versioning cleanup
+
+ -- German Tischler-Höhle <germant at miltenyibiotec.de>  Thu, 12 Nov 2020 10:39:59 +0100
+
+biobambam2 (2.0.176-1) unstable; urgency=medium
+
+  * Adapt UpdateNumericalIndex to new libmaus2 api
+  * Fix output of orphaned read 2 instances in bamtofastq when splitting by readgroup
+  * Remove references to non functional IRODS interface in libmaus2
+  * Add bamdifference program
+
+ -- German Tischler-Höhle <germant at miltenyibiotec.de>  Thu, 12 Nov 2020 09:43:13 +0100
+
+biobambam2 (2.0.175-1) unstable; urgency=medium
+
+  * Add fastaselectreg
+
+ -- German Tischler-Höhle <germant at miltenyibiotec.de>  Thu, 27 Aug 2020 10:08:12 +0200
+
+biobambam2 (2.0.174-1) unstable; urgency=medium
+
+  * Fix wrong operator for deallocating memory in bamconsensus
+
+ -- German Tischler-Höhle <germant at miltenyibiotec.de>  Mon, 10 Aug 2020 11:26:08 +0200
+
 biobambam2 (2.0.173-1) unstable; urgency=medium
 
   * Fix for libmaus2 update


=====================================
configure.ac
=====================================
@@ -1,4 +1,4 @@
-AC_INIT(biobambam2,2.0.173,[germant at miltenyibiotec.de],[biobambam2],[https://gitlab.com/german.tischler/biobambam2])
+AC_INIT(biobambam2,2.0.177,[germant at miltenyibiotec.de],[biobambam2],[https://gitlab.com/german.tischler/biobambam2])
 AC_CANONICAL_SYSTEM
 AC_PROG_LIBTOOL
 
@@ -165,7 +165,7 @@ if test ! -z "${with_libmaus2}" ; then
 	fi
 fi
 
-PKG_CHECK_MODULES([libmaus2],[libmaus2 >= 2.0.740])
+PKG_CHECK_MODULES([libmaus2],[libmaus2 >= 2.0.749])
 
 if test ! -z "${with_libmaus2}" ; then
 	if test ! -z "${PKGCONFIGPATHSAVE}" ; then
@@ -300,7 +300,7 @@ if test "${have_libmaus2_irods}" = "yes" ; then
 		fi
 	fi
 
-	PKG_CHECK_MODULES([libmaus2irods],[libmaus2irods >= 2.0.740])
+	PKG_CHECK_MODULES([libmaus2irods],[libmaus2irods >= 2.0.749])
 
 	LIBMAUS2IRODSCPPFLAGS="${libmaus2irods_CFLAGS}"
 	LIBMAUS2IRODSLIBS="${libmaus2irods_LIBS}"
@@ -463,7 +463,7 @@ AC_ARG_ENABLE(install_uncommon,
         AS_HELP_STRING([--enable-install-uncommon],[enable installation of some uncommon programs (default no)]),
         [install_uncommon=${enableval}],[install_uncommon=no])
 
-UNCOMMON="bamfilter bamfilterbyname bamfixmatecoordinates bamfixmatecoordinatesnamesorted bamtoname bamdisthist fastabgzfextract bamheap bamfrontback bamrandomtag bamheap2 bamheap3 bamtagconversion fastqtobampar bambisect vcffilterinfo vcfpatchcontigprepend vcfconcat vcfsort filtergtf bamconsensus vcfreplacecontigs vcffiltersamples bamexploderg bamexondepth bamheadercat bammarkduplicatesoptdist vcfdiff bamsimpledepth bamdepthmerge bamcountflags vcffilterfilterflags vcfreplacecontigsmap"
+UNCOMMON="bamfilter bamfilterbyname bamfixmatecoordinates bamfixmatecoordinatesnamesorted bamtoname bamdisthist fastabgzfextract bamheap bamfrontback bamrandomtag bamheap2 bamheap3 bamtagconversion fastqtobampar bambisect vcffilterinfo vcfpatchcontigprepend vcfconcat vcfsort filtergtf bamconsensus vcfreplacecontigs vcffiltersamples bamexploderg bamexondepth bamheadercat bammarkduplicatesoptdist vcfdiff bamsimpledepth bamdepthmerge bamcountflags vcffilterfilterflags vcfreplacecontigsmap fastaselectreg"
 UNCOMMONINSTALLED=
 UNCOMMONUNINSTALLED=
 if test "${install_uncommon}" = "yes" ; then


=====================================
debian/changelog
=====================================
@@ -1,10 +1,13 @@
-biobambam2 (2.0.173+ds-2) UNRELEASED; urgency=medium
+biobambam2 (2.0.177+ds-1) unstable; urgency=medium
 
-  * Team upload
+  * Add myself to Uploaders.
   * Remove the zlib1g-dev dependency, as zlib1g should now be properly pulled
     and used by libmaus2 (>= 2.0.740+dfsg-2).
+  * Force builds against libmaus2 >= 2.0.749.
+  * Add manuals.patch to fix various issues reported by lintian.
+  * Add spelling-error-in-binary.patch.
 
- -- Étienne Mollier <etienne.mollier at mailoo.org>  Mon, 10 Aug 2020 21:22:20 +0200
+ -- Étienne Mollier <etienne.mollier at mailoo.org>  Wed, 18 Nov 2020 21:58:28 +0100
 
 biobambam2 (2.0.173+ds-1) unstable; urgency=medium
 


=====================================
debian/control
=====================================
@@ -1,11 +1,12 @@
 Source: biobambam2
 Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
-Uploaders: Andreas Tille <tille at debian.org>
+Uploaders: Andreas Tille <tille at debian.org>,
+           Étienne Mollier <etienne.mollier at mailoo.org>
 Section: science
 Priority: optional
 Build-Depends: debhelper-compat (= 13),
                pkg-config,
-               libmaus2-dev (>= 2.0.740+dfsg-2)
+               libmaus2-dev (>= 2.0.749)
 Standards-Version: 4.5.0
 Vcs-Browser: https://salsa.debian.org/med-team/biobambam2
 Vcs-Git: https://salsa.debian.org/med-team/biobambam2.git


=====================================
debian/patches/manuals.patch
=====================================
@@ -0,0 +1,442 @@
+Description: miscellaneous manual pages fixes
+Author: Étienne Mollier <etienne.mollier at mailoo.org>
+Forwarded: no
+Last-Update: 2020-11-18
+---
+This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
+--- biobambam2.orig/src/programs/bamalignfrac.1
++++ biobambam2/src/programs/bamalignfrac.1
+@@ -1,6 +1,6 @@
+ .TH BAMALIGNFRAC 1 "July 2019" BIOBAMBAM
+ .SH NAME
+-bamalignfrac - compute fraction of aligned bases in alignment file
++bamalignfrac \- compute fraction of aligned bases in alignment file
+ .SH SYNOPSIS
+ .PP
+ .B bamalignfrac
+@@ -10,30 +10,31 @@
+ bamalignfrac reads a SAM/BAM/CRAM file, computes a set of statistical values
+ and outputs these on the standard output channel. The values produced are
+ 
+-.It Cm
+-* the number of names passing a given regex filter
++.IP *
++the number of names passing a given regex filter
+ 
+-.It Cm
+-* the number of primary alignments
++.IP *
++the number of primary alignments
+ 
+-.It Cm
+-* the number of total alignments
++.IP *
++the number of total alignments
+ 
+-.It Cm
+-* the total number of bases
++.IP *
++the total number of bases
+ 
+-.It Cm
+-* the total number of aligned bases
++.IP *
++the total number of aligned bases
+ 
+-.It Cm
+-* the fraction of aligned bases
++.IP *
++the fraction of aligned bases
+ 
+-.It Cm
+-* the total number of clipped/unaligned bases
++.IP *
++the total number of clipped/unaligned bases
+ 
+-.It Cm
+-* the number of unmapped reads
++.IP *
++the number of unmapped reads
+ 
++.PP
+ These values are printed on the standard output channel at the end of the
+ program run in the last line printed by the program. The program prints a
+ set of line description line prior to that.
+@@ -42,8 +43,12 @@
+ The input file needs to be provied in query name sorted order, which can be
+ obtained via e.g.
+ 
+-bamsort SO=queryname
++.PP
++.EX
++        $ bamsort SO=queryname
++.EE
+ 
++.PP
+ right before being passed to bamalignfrac.
+ 
+ The following key=value pairs can be given:
+--- biobambam2.orig/src/programs/bamauxmerge.1
++++ biobambam2/src/programs/bamauxmerge.1
+@@ -8,34 +8,33 @@
+ in_unmapped in_mapped
+ .SH DESCRIPTION
+ bamauxmerge reads and merges two SAM/BAM/CRAM files which are expected to have the
+-following properties
++following properties:
+ 
+-.It Cm
+-* the first file contains only unmapped reads and it's header contains no SQ lines
++.IP *
++the first file contains only unmapped reads and it's header contains no SQ lines
+ 
+-.It Cm
+-* the second file was produced by an aligner based on the content of the first file.
+-
+-.It Cm
+-* the order of the reads is the same in the first an second file
++.IP *
++the second file was produced by an aligner based on the content of the first file.
+ 
++.IP *
++the order of the reads is the same in the first an second file
+ into a single alignment file.
+-
++.PP
+ The headers of the two files are merged in the following file:
+ 
+-.It Cm
+-* the SQ lines contained in the header of the second file are appended to the header of the first file to obtain the header of the output file
+-
+-.It
+-* all other header information from the second file is discarded
++.IP *
++the SQ lines contained in the header of the second file are appended to the header of the first file to obtain the header of the output file
+ 
++.IP *
++all other header information from the second file is discarded
++.PP
+ The output records are constructed in the following way:
+ 
+-.It Cm
+-1. Take a record from the second file
++.IP 1.
++Take a record from the second file
+ 
+-.It Cm
+-2. Copy all aux fields from the corresponding record in the first file which are not already present.
++.IP 2.
++Copy all aux fields from the corresponding record in the first file which are not already present.
+ 
+ .PP
+ The following key=value pairs can be given:
+--- biobambam2.orig/src/programs/bamauxmerge2.1
++++ biobambam2/src/programs/bamauxmerge2.1
+@@ -10,46 +10,44 @@
+ bamauxmerge2 reads and merges two BAM files which are expected to have the
+ following properties
+ 
+-.It Cm
+-* the first file contains only unmapped reads and it's header contains no SQ lines
++.IP *
++the first file contains only unmapped reads and it's header contains no SQ lines
+ 
+-.It Cm
+-* the second file was produced by an aligner based on the content of the first file.
++.IP *
++the second file was produced by an aligner based on the content of the first file.
+ 
+-.It Cm
+-* both files are sorted in query name order
+-
+-into a single alignment file.
++.IP *
++both files are sorted in query name order into a single alignment file.
+ 
++.PP
+ The headers of the two files are merged in the following file:
++.IP *
++the SQ lines contained in the header of the second file are appended to the header of the first file to obtain the header of the output file
+ 
+-.It Cm
+-* the SQ lines contained in the header of the second file are appended to the header of the first file to obtain the header of the output file
+-
+-.It
+-* all other header information from the second file is discarded
+-
++.IP *
++all other header information from the second file is discarded
++.PP
+ The output records are constructed in the following way:
+ 
+-.It Cm
+-1. Take a record from the second file
++.IP 1.
++Take a record from the second file
+ 
+-.It Cm
+-2. Copy all aux fields from the corresponding record in the first file which are not already present.
++.IP 2.
++Copy all aux fields from the corresponding record in the first file which are not already present.
+ 
+-.It Cm
+-3. Reinsert clipped adapter bases/quality values stored in the qs/qq by
++.IP 3.
++Reinsert clipped adapter bases/quality values stored in the qs/qq by
+ aux fields by fastqtobam2 and remove the qs/qq aux fields while inserting
+ appropriate soft clipping CIGAR operations.
+ 
+-.It Cm
+-4. Fix mate information like bamfixmateinformation.
++.IP 4.
++Fix mate information like bamfixmateinformation.
+ 
+-.It Cm
+-5. Insert the mate CIGAR information fields MC and MS if the mate is aligned.
++.IP 5.
++Insert the mate CIGAR information fields MC and MS if the mate is aligned.
+ 
+-.It Cm
+-6. Insert the MQ (mate quality) aux field.
++.IP 6.
++Insert the MQ (mate quality) aux field.
+ 
+ .PP
+ The following key=value pairs can be given:
+--- biobambam2.orig/src/programs/bamcollate2.1
++++ biobambam2/src/programs/bamcollate2.1
+@@ -196,7 +196,7 @@
+ the input SAM/BAM/CRAM file is used (and filtered in case of reset=1)..
+ .PP
+ .B resetaux=<0|1>:
+-remove auxilliary fields if resetaux=1. This key is only available for
++remove auxiliary fields if resetaux=1. This key is only available for
+ reset=1. If reset=1 then the default is to remove all aux fields.
+ .PP
+ .B auxfilter=<>:
+--- biobambam2.orig/src/programs/bamconsensus.1
++++ biobambam2/src/programs/bamconsensus.1
+@@ -25,7 +25,7 @@
+ .PP
+ .B E is the end position on the reference sequence (exclusive)
+ .PP
+-The reference key specifiying the name of a FastA reference sequence file
++The reference key specifying the name of a FastA reference sequence file
+ is required. The consensus is constructed by computing heavy paths in local
+ DeBruijn graphs. Consequently it is usually a patchwork of the haplotypes
+ present for diploid/polyploid genomes.
+--- biobambam2.orig/src/programs/bammarkduplicates.1
++++ biobambam2/src/programs/bammarkduplicates.1
+@@ -95,7 +95,7 @@
+ .IP 0:
+ duplicates will be retained in the output file and have the duplication flag set
+ .IP 1:
+-duplicates will be remove when writing the output file
++duplicates will be removed when writing the output file
+ .PP
+ .B md5=<0|1>:
+ md5 checksum creation for output file. Valid values are
+@@ -133,11 +133,11 @@
+ this option works like the tag option but is restricted to sequences of
+ nucleotides (A,C,G or T) as tags. The length of each tag sequence is not
+ allowed to exceed 15 bases. All tags are required to have the same length.
+-Each non nucleotide symbol is mapped to A. In constrast to the tag option, 
++Each non nucleotide symbol is mapped to A. In contrast to the tag option,
+ nucltag uses less memory for processing and can be expected to be faster.
+ .PP
+ .B D
+-ouptut file name for removed duplicates if rmdup=1. By default the reads and
++output file name for removed duplicates if rmdup=1. By default the reads and
+ read pairs marked as duplicates are discarded when rmdup=1. If the D key is
+ set, then the sequence of reads which is not written to the output file is
+ written to a separate file. The name of this separate file is the value of
+--- biobambam2.orig/src/programs/bammarkduplicates2.1
++++ biobambam2/src/programs/bammarkduplicates2.1
+@@ -96,7 +96,7 @@
+ .IP 0:
+ duplicates will be retained in the output file and have the duplication flag set
+ .IP 1:
+-duplicates will be remove when writing the output file
++duplicates will be removed when writing the output file
+ .PP
+ .B maxreadlength=<[500]>:
+ maximum read length in input. This value can be set higher than the actual
+@@ -138,11 +138,11 @@
+ this option works like the tag option but is restricted to sequences of
+ nucleotides (A,C,G or T) as tags. The length of each tag sequence is not
+ allowed to exceed 15 bases. All tags are required to have the same length.
+-Each non nucleotide symbol is mapped to A. In constrast to the tag option, 
++Each non nucleotide symbol is mapped to A. In contrast to the tag option,
+ nucltag uses less memory for processing and can be expected to be faster.
+ .PP
+ .B D
+-ouptut file name for removed duplicates if rmdup=1. By default the reads and
++output file name for removed duplicates if rmdup=1. By default the reads and
+ read pairs marked as duplicates are discarded when rmdup=1. If the D key is
+ set, then the sequence of reads which is not written to the output file is
+ written to a separate file. The name of this separate file is the value of
+--- biobambam2.orig/src/programs/bammarkduplicatesopt.1
++++ biobambam2/src/programs/bammarkduplicatesopt.1
+@@ -98,7 +98,7 @@
+ .IP 0:
+ duplicates will be retained in the output file and have the duplication flag set
+ .IP 1:
+-duplicates will be remove when writing the output file
++duplicates will be removed when writing the output file
+ .PP
+ .B md5=<0|1>:
+ md5 checksum creation for output file. Valid values are
+@@ -136,11 +136,11 @@
+ this option works like the tag option but is restricted to sequences of
+ nucleotides (A,C,G or T) as tags. The length of each tag sequence is not
+ allowed to exceed 15 bases. All tags are required to have the same length.
+-Each non nucleotide symbol is mapped to A. In constrast to the tag option, 
++Each non nucleotide symbol is mapped to A. In contrast to the tag option,
+ nucltag uses less memory for processing and can be expected to be faster.
+ .PP
+ .B D
+-ouptut file name for removed duplicates if rmdup=1. By default the reads and
++output file name for removed duplicates if rmdup=1. By default the reads and
+ read pairs marked as duplicates are discarded when rmdup=1. If the D key is
+ set, then the sequence of reads which is not written to the output file is
+ written to a separate file. The name of this separate file is the value of
+--- biobambam2.orig/src/programs/bammaskflags.1
++++ biobambam2/src/programs/bammaskflags.1
+@@ -1,6 +1,6 @@
+ .TH BAMMASKFLAGS 1 "July 2013" BIOBAMBAM
+ .SH NAME
+-bammaskflags - remove flags from alignments
++bammaskflags \- remove flags from alignments
+ .SH SYNOPSIS
+ .PP
+ .B bammaskflags
+@@ -24,7 +24,9 @@
+ zlib/gzip level 9 (best) compression
+ .P
+ If libmaus has been compiled with support for igzip (see
+-https://software.intel.com/en-us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-data)
++.UR https://software.intel.com/en-us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-data
++Intel article on igzip
++.UE )
+ then an additional valid value is
+ .IP 11:
+ igzip compression
+@@ -34,30 +36,42 @@
+ complement of this number and the flag field of the reads/alignments in the
+ input BAM file). This value can be obtained by adding up the following
+ values of the flags:
+-.IP PAIRED (paired in sequencing):
++.TP
+ 1
+-.IP PROPER_PAIR (mapped as a proper pair):
++PAIRED (paired in sequencing)
++.TP
+ 2
+-.IP UNMAP (unmapped):
++PROPER_PAIR (mapped as a proper pair)
++.TP
+ 4
+-.IP MUNMAP (mate unmapped):
++UNMAP (unmapped)
++.TP
+ 8
+-.IP REVERSE (mapped to the reverse strand):
++MUNMAP (mate unmapped)
++.TP
+ 16
+-.IP MREVERSE (mate mapped to the reverse strand):
++REVERSE (mapped to the reverse strand)
++.TP
+ 32
+-.IP READ1 (first read of pair):
++MREVERSE (mate mapped to the reverse strand)
++.TP
+ 64
+-.IP READ2 (second read of pair):
++READ1 (first read of pair)
++.TP
+ 128
+-.IP SECONDARY (secondary alignment):
++READ2 (second read of pair)
++.TP
+ 256
+-.IP QCFAIL (failed quality control):
++SECONDARY (secondary alignment)
++.TP
+ 512
+-.IP DUP (duplicate):
++QCFAIL (failed quality control)
++.TP
+ 1024
+-.IP SUPPLEMENTARY (supplementary):
++DUP (duplicate)
++.TP
+ 2048
++SUPPLEMENTARY (supplementary)
+ .PP
+ .B tmpfile=<filename>: 
+ prefix for temporary files. By default the temporary files are created in the current directory
+--- biobambam2.orig/src/programs/bammdnm.1
++++ biobambam2/src/programs/bammdnm.1
+@@ -7,7 +7,7 @@
+ [options]
+ .SH DESCRIPTION
+ bammdnm reads a coordinate sorted BAM file from standard input, fills the
+-MD and NM auxilliary fields for each mapped read fragment and writes
++MD and NM auxiliary fields for each mapped read fragment and writes
+ the resulting data to a BAM file on standard output.
+ .PP
+ The following key=value pairs can be given:
+--- biobambam2.orig/src/programs/bamreset.1
++++ biobambam2/src/programs/bamreset.1
+@@ -109,7 +109,7 @@
+ read is marked as supplementary alignment
+ .PP
+ .B resetaux=<0|1>:
+-auxilliary fields (default).
++auxiliary fields (default).
+ .PP
+ .B resetsortorder=<0|1>:
+ set sort order to unknown if resetsortorder=1 (default) and leave as it is
+--- biobambam2.orig/src/programs/bamsort.1
++++ biobambam2/src/programs/bamsort.1
+@@ -220,7 +220,7 @@
+ this option works like the tag option but is restricted to sequences of
+ nucleotides (A,C,G or T) as tags. The length of each tag sequence is not
+ allowed to exceed 15 bases. All tags are required to have the same length.
+-Each non nucleotide symbol is mapped to A. In constrast to the tag option, 
++Each non nucleotide symbol is mapped to A. In contrast to the tag option,
+ nucltag uses less memory for processing and can be expected to be faster.
+ .PP
+ .B M=<stderr>: 
+--- biobambam2.orig/src/programs/bamstreamingmarkduplicates.1
++++ biobambam2/src/programs/bamstreamingmarkduplicates.1
+@@ -11,7 +11,7 @@
+ adddupmarksupport=1, marks duplicate read pairs and reads and writes the
+ resulting file in BAM, SAM or CRAM format. The preprocessing of the file
+ using bamsort with the stated options is mandatory, i.e.
+-bamstreamingmarkduplicates will fail without it. In constrast to
++bamstreamingmarkduplicates will fail without it. In contrast to
+ bammarkduplicates and bammarkduplicates2 the streaming variant
+ bamstreamingmarkduplicates processes the file in a single pass.
+ bamstreamingmarkduplicates cannot handle files containing orphan pair ends
+@@ -129,7 +129,7 @@
+ this option works like the tag option but is restricted to sequences of
+ nucleotides (A,C,G or T) as tags. The length of each tag sequence is not
+ allowed to exceed 15 bases. All tags are required to have the same length.
+-Each non nucleotide symbol is mapped to A. In constrast to the tag option, 
++Each non nucleotide symbol is mapped to A. In contrast to the tag option,
+ nucltag uses less memory for processing and can be expected to be faster.
+ .PP
+ .B filterdupmarktags=<[0]>:
+--- biobambam2.orig/src/programs/fastqtobam.1
++++ biobambam2/src/programs/fastqtobam.1
+@@ -63,7 +63,7 @@
+ additional BAM encoding helper threads.
+ .PP
+ .B PGID=<>
+-read group identifier for reads. By default no read group identifer is set.
++read group identifier for reads. By default no read group identifier is set.
+ The fields CN, DS, DT, FO, KS, LB, PG, PI, PL, PU and SM of the
+ corresponding @RG header line can be set by using the keys RGCN, RGDS, etc.
+ respectively.


=====================================
debian/patches/series
=====================================
@@ -0,0 +1,2 @@
+manuals.patch
+spelling-error-in-binary.patch


=====================================
debian/patches/spelling-error-in-binary.patch
=====================================
@@ -0,0 +1,19 @@
+Description: fix spelling-error-in-binary
+Author: Étienne Mollier <etienne.mollier at mailoo.org>
+Forwarded: no
+Last-Update: 2020-11-18
+---
+This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
+--- biobambam2.orig/src/programs/bamvalidate.cpp
++++ biobambam2/src/programs/bamvalidate.cpp
+@@ -282,8 +282,8 @@
+ 
+ 				std::vector< std::pair<std::string,std::string> > V;
+ 
+-				V.push_back ( std::pair<std::string,std::string> ( "verbose=<["+::biobambam2::Licensing::formatNumber(getDefaultVerbose())+"]>", "print stats at the end of a successfull run" ) );
+-				V.push_back ( std::pair<std::string,std::string> ( "basequalhist=<["+::biobambam2::Licensing::formatNumber(getDefaultBaseQualHist())+"]>", "print base quality histogram at end of a successfull run" ) );
++				V.push_back ( std::pair<std::string,std::string> ( "verbose=<["+::biobambam2::Licensing::formatNumber(getDefaultVerbose())+"]>", "print stats at the end of a successful run" ) );
++				V.push_back ( std::pair<std::string,std::string> ( "basequalhist=<["+::biobambam2::Licensing::formatNumber(getDefaultBaseQualHist())+"]>", "print base quality histogram at end of a successful run" ) );
+ 				V.push_back ( std::pair<std::string,std::string> ( "passthrough=<["+::biobambam2::Licensing::formatNumber(getDefaultPassThrough())+"]>", "write alignments to standard output (default: do not pass through)" ) );
+ 				V.push_back ( std::pair<std::string,std::string> ( "tmpfile=<filename>", "prefix for temporary files, default: create files in current directory (passthrough=1, index=1 only)" ) );
+ 				V.push_back ( std::pair<std::string,std::string> ( "md5=<["+::biobambam2::Licensing::formatNumber(getDefaultMD5())+"]>", "create md5 check sum (default: 0, passthrough=1 only)" ) );


=====================================
release.sh
=====================================
@@ -1,4 +1,5 @@
 #! /bin/bash
+set -euxo pipefail
 
 # update branches
 git checkout experimental


=====================================
removespace.sh
=====================================
@@ -0,0 +1,11 @@
+#! /bin/bash
+for i in `find src -regex .*\\\.[ch]pp` `find src -regex .*\\\.[ch]` ; do
+	ORIG=`cat $i`
+	PATCHED=`perl -p -e "s/(\s*)($)/\n/" < ${i}`
+	
+	if [ "$ORIG" != "$PATCHED" ] ; then
+		echo "${PATCHED}" > ${i}
+		# git add ${i}
+		echo ${i}
+	fi
+done


=====================================
src/Makefile.am
=====================================
@@ -152,7 +152,8 @@ EXTRA_PROGRAMS = blastnxmltobam \
 	vcfdiff \
 	bamsimpledepth \
 	bamdepthmerge \
-	bamcountflags
+	bamcountflags \
+	fastaselectreg
 
 populaterefcache_SOURCES = programs/populaterefcache.cpp biobambam2/Licensing.cpp
 populaterefcache_LDADD = ${LIBMAUS2LIBS}
@@ -703,3 +704,8 @@ bamcountflags_SOURCES = programs/bamcountflags.cpp biobambam2/Licensing.cpp
 bamcountflags_LDADD = ${LIBMAUS2LIBS}
 bamcountflags_LDFLAGS = ${AM_CPPFLAGS} ${LIBMAUS2CPPFLAGS} ${LIBMAUS2LDFLAGS} ${AM_LDFLAGS}
 bamcountflags_CPPFLAGS = ${AM_CPPFLAGS} ${LIBMAUS2CPPFLAGS}
+
+fastaselectreg_SOURCES = programs/fastaselectreg.cpp biobambam2/Licensing.cpp
+fastaselectreg_LDADD = ${LIBMAUS2LIBS}
+fastaselectreg_LDFLAGS = ${AM_CPPFLAGS} ${LIBMAUS2CPPFLAGS} ${LIBMAUS2LDFLAGS} ${AM_LDFLAGS}
+fastaselectreg_CPPFLAGS = ${AM_CPPFLAGS} ${LIBMAUS2CPPFLAGS}


=====================================
src/biobambam2/BamBamConfig.hpp.in
=====================================
@@ -22,7 +22,6 @@
 @LIBMAUS2IOLIBDEFINE@
 @BIOBAMBAM_HAVE_XERCES_C@
 @BIOBAMBAM_HAVE_GMP@
- at LIBMAUS2IRODSDEFINE@
 @HAVE_PTHREAD_MUTEX_RECURSIVE_NP@
 @HAVE_PTHREAD_MUTEX_RECURSIVE@
 


=====================================
src/biobambam2/UpdateNumericalIndex.cpp
=====================================
@@ -28,6 +28,13 @@ namespace biobambam2
 
 		if ( libmaus2::util::GetFileSize::fileExists(indexfn) )
 		{
+			std::string const replfn = indexfn + ".repl";
+			libmaus2::aio::OutputStreamInstance::unique_ptr_type prepl(
+				new libmaus2::aio::OutputStreamInstance(
+					replfn
+				)
+			);
+
 			// get index stats
 			uint64_t alcnt, mod, numblocks;
 			{
@@ -59,20 +66,33 @@ namespace biobambam2
 
 					// std::cerr << "replacing index position for " << i << " by " << start.first << "," << start.second << std::endl;
 
+					libmaus2::bambam::BamNumericalIndexGenerator::ReplaceObject(
+						blockid,start.first,start.second
+					).serialise(*prepl);
+
+					#if 0
 					libmaus2::bambam::BamNumericalIndexGenerator::replaceValue(
 						indexfn,
 						blockid,
 						start.first,
 						start.second
 					);
+					#endif
 
 					highestSet = blockid;
 				}
 
+			prepl->flush();
+			prepl.reset();
+
+			libmaus2::bambam::BamNumericalIndexGenerator::replaceValues(indexfn,replfn);
+
+			libmaus2::aio::FileRemoval::removeFile(replfn);
+
 			// std::cerr << "shifting index positions for [" << highestSet+1 << "," << numblocks << ")" << " by " << static_cast<int64_t>(compdata.second.size()) - static_cast<int64_t>(compdata.first) << std::endl;
 
 			// shift values in part we moved as is
-			libmaus2::bambam::BamNumericalIndexGenerator::shiftValues(
+			libmaus2::bambam::BamNumericalIndexGenerator::shiftValuesStreaming(
 				indexfn,
 				highestSet + 1,numblocks,
 				static_cast<int64_t>(compdata.second.size()) - static_cast<int64_t>(compdata.first)


=====================================
src/programs/bamcollate2.cpp
=====================================
@@ -1403,29 +1403,14 @@ void bamcollate2(libmaus2::util::ArgInfo const & arginfo)
 	}
 }
 
-#if defined(LIBMAUS2_HAVE_IRODS)
-#include <libmaus2/irods/IRodsInputStreamFactory.hpp>
-#endif
-
 int main(int argc, char * argv[])
 {
 	try
 	{
-		#if defined(LIBMAUS2_HAVE_IRODS)
-                libmaus2::irods::IRodsInputStreamFactory::registerHandler();
-                #endif
-
 		libmaus2::timing::RealTimeClock rtc; rtc.start();
 
 		::libmaus2::util::ArgInfo arginfo(argc,argv);
 
-		#if defined(LIBMAUS2_HAVE_IRODS)
-		// set program name for iRODS identification
-		std::stringstream irods_id;
-		irods_id  << PACKAGE_NAME << ":" << arginfo.getProgFileName(arginfo.progname) << ":" << PACKAGE_VERSION;
-		setenv(SP_OPTION, irods_id.str().c_str(), 1);
-		#endif
-
 		for ( uint64_t i = 0; i < arginfo.restargs.size(); ++i )
 			if (
 				arginfo.restargs[i] == "-v"
@@ -1526,14 +1511,6 @@ int main(int argc, char * argv[])
 
 		bamcollate2(arginfo);
 
-		#if defined(LIBMAUS2_HAVE_IRODS)
-		// need a explicit call to disconnect to avoid atexit deallocation problems in iRODS 4.19+
-    		if (libmaus2::irods::IRodsSystem::defaultIrodsSystem)
-		{
-    	        	(libmaus2::irods::IRodsSystem::getDefaultIRodsSystem())->disconnect();
-		}
-		#endif
-
 		if ( arginfo.getValue<unsigned int>("verbose",getDefaultVerbose()) )
 			std::cerr << "[V] " << libmaus2::util::MemUsage() << " wall clock time " << rtc.formatTime(rtc.getElapsedSeconds()) << std::endl;
 	}


=====================================
src/programs/bamconsensus.cpp
=====================================
@@ -3307,7 +3307,7 @@ struct ReferenceCache
 
 		if ( ! --VuseCnt.at(id) )
 		{
-			Vref.at(id) == RefEntry::shared_ptr_type();
+			Vref.at(id) = RefEntry::shared_ptr_type();
 			std::cerr << "[V] reference cache deallocating " << ptr->name << std::endl;
 		}
 


=====================================
src/programs/bamdifference.cpp
=====================================
@@ -0,0 +1,222 @@
+/**
+    bambam
+    Copyright (C) 2009-2020 German Tischler-Höhle
+    Copyright (C) 2011-2013 Genome Research Limited
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>..
+**/
+#include <config.h>
+#include <libmaus2/bambam/BamBlockWriterBaseFactory.hpp>
+#include <libmaus2/bambam/BamWriter.hpp>
+#include <libmaus2/bambam/BamHeaderUpdate.hpp>
+#include <libmaus2/util/ArgInfo.hpp>
+#include <biobambam2/Licensing.hpp>
+#include <libmaus2/bambam/BamMultiAlignmentDecoderFactory.hpp>
+#include <libmaus2/bambam/BamPeeker.hpp>
+#include <libmaus2/lz/BgzfDeflateOutputCallbackMD5.hpp>
+#include <libmaus2/bambam/BgzfDeflateOutputCallbackBamIndex.hpp>
+
+static int getDefaultMD5() { return 0; }
+
+static void printVerbose(std::ostream & errstr, uint64_t const c0, uint64_t const c1, uint64_t const k, bool const verbose, uint64_t const mod)
+{
+	if ( verbose && ((c0+c1)%mod==0) )
+	{
+		errstr << "[V] " << c0 << "/" << c1 << "/" << c0+c1 << "/" << k << std::endl;
+	}
+}
+
+/*
+ * compute difference of two name sorted alignments files (SAM/BAM/CRAM)
+ */
+int bamintersect(libmaus2::util::ArgParser const & arg)
+{
+	std::ostream & verbstr = std::cerr;
+	static uint64_t const mod = 1024*1024;
+
+	libmaus2::util::ArgParser arg0 = arg;
+	arg0.replaceArg("I",arg0[0]);
+	libmaus2::util::ArgParser arg1 = arg;
+	arg1.replaceArg("I",arg0[1]);
+
+	bool const verbose = arg.argPresent("verbose");
+
+	libmaus2::bambam::BamAlignmentDecoderWrapper::unique_ptr_type decwrapper0(libmaus2::bambam::BamMultiAlignmentDecoderFactory::construct(arg0));
+	libmaus2::bambam::BamAlignmentDecoder & BD0 = decwrapper0->getDecoder();
+	libmaus2::bambam::BamPeeker BP0(BD0);
+	libmaus2::bambam::BamAlignment algn0;
+
+	libmaus2::bambam::BamAlignmentDecoderWrapper::unique_ptr_type decwrapper1(libmaus2::bambam::BamMultiAlignmentDecoderFactory::construct(arg1));
+	libmaus2::bambam::BamAlignmentDecoder & BD1 = decwrapper1->getDecoder();
+	libmaus2::bambam::BamPeeker BP1(BD1);
+	libmaus2::bambam::BamAlignment algn1;
+
+	std::string md5filename;
+
+	std::vector< ::libmaus2::lz::BgzfDeflateOutputCallback * > cbs;
+	::libmaus2::lz::BgzfDeflateOutputCallbackMD5::unique_ptr_type Pmd5cb;
+	if ( arg.getParsedArgOrDefault<uint64_t>("md5",getDefaultMD5()) )
+	{
+		if ( libmaus2::bambam::BamBlockWriterBaseFactory::getMD5FileName(arg) != std::string() )
+			md5filename = libmaus2::bambam::BamBlockWriterBaseFactory::getMD5FileName(arg);
+		else
+			std::cerr << "[V] no filename for md5 given, not creating hash" << std::endl;
+
+		if ( md5filename.size() )
+		{
+			::libmaus2::lz::BgzfDeflateOutputCallbackMD5::unique_ptr_type Tmd5cb(new ::libmaus2::lz::BgzfDeflateOutputCallbackMD5);
+			Pmd5cb = std::move(Tmd5cb);
+			cbs.push_back(Pmd5cb.get());
+		}
+	}
+	std::vector< ::libmaus2::lz::BgzfDeflateOutputCallback * > * Pcbs = 0;
+	if ( cbs.size() )
+		Pcbs = &cbs;
+
+	// construct writer
+	libmaus2::bambam::BamBlockWriterBase::unique_ptr_type Pwriter(libmaus2::bambam::BamBlockWriterBaseFactory::construct(BD0.getHeader(),arg,Pcbs));
+	libmaus2::bambam::BamBlockWriterBase & wr = *Pwriter;
+
+	uint64_t c0 = 0, c1 = 0, k = 0;
+
+	while ( BP0.peekNext(algn0) && BP1.peekNext(algn1) )
+	{
+		char const * name0 = algn0.getName();
+		char const * name1 = algn1.getName();
+		int const r = libmaus2::bambam::StrCmpNum::strcmpnum(name0,name1);
+
+		// name is in file0 but no in file1
+		if ( r < 0 )
+		{
+			std::string const name0 = algn0.getName();
+
+			while ( BP0.peekNext(algn0) && algn0.getName() == name0 )
+			{
+				BP0.getNext(algn0);
+				wr.writeAlignment(algn0);
+
+				++c0;
+				++k;
+
+				printVerbose(verbstr, c0, c1, k, verbose, mod);
+			}
+		}
+		// name is in both files, drop data
+		else if ( r == 0 )
+		{
+			std::string const name = algn0.getName();
+
+			while ( BP0.peekNext(algn0) && algn0.getName() == name )
+			{
+				BP0.getNext(algn0);
+
+				++c0;
+				printVerbose(verbstr, c0, c1, k, verbose, mod);
+			}
+			while ( BP1.peekNext(algn1) && algn1.getName() == name )
+			{
+				BP1.getNext(algn1);
+
+				++c1;
+				printVerbose(verbstr, c0, c1, k, verbose, mod);
+			}
+		}
+		// name is only in file1, drop data
+		else
+		{
+			std::string const name1 = algn1.getName();
+
+			while ( BP1.peekNext(algn1) && algn1.getName() == name1 )
+			{
+				BP1.getNext(algn1);
+				wr.writeAlignment(algn1);
+
+				++c1;
+				printVerbose(verbstr, c0, c1, k, verbose,mod);
+			}
+		}
+	}
+
+	// names only in file0 at end
+	while ( BP0.getNext(algn0) )
+	{
+		wr.writeAlignment(algn0);
+
+		++c0;
+		++k;
+
+		printVerbose(verbstr, c0, c1, k, verbose,mod);
+	}
+
+	Pwriter.reset();
+
+	if ( Pmd5cb )
+		Pmd5cb->saveDigestAsFile(md5filename);
+
+	return EXIT_SUCCESS;
+}
+
+int main(int argc, char * argv[])
+{
+	try
+	{
+		std::vector<libmaus2::util::ArgParser::ArgumentDefinition> Vformatcons;
+		Vformatcons.push_back(libmaus2::util::ArgParser::ArgumentDefinition("h","help",false));
+		Vformatcons.push_back(libmaus2::util::ArgParser::ArgumentDefinition("v","version",false));
+		Vformatcons.push_back(libmaus2::util::ArgParser::ArgumentDefinition("","verbose",false));
+
+		std::vector<libmaus2::util::ArgParser::ArgumentDefinition> const Vformatin = libmaus2::bambam::BamAlignmentDecoderInfo::getArgumentDefinitions();
+		std::vector<libmaus2::util::ArgParser::ArgumentDefinition> const Vformatout = libmaus2::bambam::BamBlockWriterBaseFactory::getArgumentDefinitions();
+
+		std::vector<libmaus2::util::ArgParser::ArgumentDefinition> Vformat =
+			libmaus2::util::ArgParser::mergeFormat(libmaus2::util::ArgParser::mergeFormat(Vformatin,Vformatout),Vformatcons);
+
+		libmaus2::util::ArgParser const arg(argc,argv,Vformat);
+
+		if ( arg.argPresent("version") )
+		{
+			std::cerr << ::biobambam2::Licensing::license();
+			return EXIT_SUCCESS;
+		}
+		else if ( arg.argPresent("help") || arg.size() < 2 )
+		{
+			std::cerr << ::biobambam2::Licensing::license();
+			std::cerr << std::endl;
+			std::cerr << "usage: " << arg.progname << " full.bam partial.bam" << std::endl;
+			std::cerr << std::endl;
+			std::cerr << "Argument:" << std::endl;
+			std::cerr << std::endl;
+
+			std::vector< std::pair<std::string,std::string> > V;
+
+			V.push_back ( std::pair<std::string,std::string> ( "-v/--verbose", "print progress report" ) );
+			V.push_back ( std::pair<std::string,std::string> ( "--md5 <["+::biobambam2::Licensing::formatNumber(getDefaultMD5())+"]>", "create md5 check sum (default: 0)" ) );
+			V.push_back ( std::pair<std::string,std::string> ( "--md5filename <filename>", "file name for md5 check sum (default: extend output file name)" ) );
+
+			::biobambam2::Licensing::printMap(std::cerr,V);
+
+			if ( arg.argPresent("help") )
+				return EXIT_SUCCESS;
+			else
+				return EXIT_FAILURE;
+		}
+
+		return bamintersect(arg);
+	}
+	catch(std::exception const & ex)
+	{
+		std::cerr << ex.what() << std::endl;
+		return EXIT_FAILURE;
+	}
+}


=====================================
src/programs/bammerge.cpp
=====================================
@@ -38,10 +38,6 @@ static std::string getDefaultSortOrder() { return "coordinate"; }
 static int getDefaultMD5() { return 0; }
 static int getDefaultIndex() { return 0; }
 
-#if defined(LIBMAUS2_HAVE_IRODS)
-#include <libmaus2/irods/IRodsInputStreamFactory.hpp>
-#endif
-
 ::libmaus2::bambam::BamHeader::unique_ptr_type updateHeader(
 	::libmaus2::util::ArgInfo const & arginfo,
 	::libmaus2::bambam::BamHeader const & header
@@ -239,14 +235,6 @@ int bammerge(libmaus2::util::ArgInfo const & arginfo)
 		Pindex->flush(std::string(indexfilename));
 	}
 
-	#if defined(LIBMAUS2_HAVE_IRODS)
-	// need a explicit call to disconnect to avoid atexit deallocation problems in iRODS 4.19+
-    	if (libmaus2::irods::IRodsSystem::defaultIrodsSystem)
-	{
-    	        (libmaus2::irods::IRodsSystem::getDefaultIRodsSystem())->disconnect();
-	}
-	#endif
-
 	return EXIT_SUCCESS;
 }
 
@@ -254,19 +242,8 @@ int main(int argc, char * argv[])
 {
 	try
 	{
-		#if defined(LIBMAUS2_HAVE_IRODS)
-                libmaus2::irods::IRodsInputStreamFactory::registerHandler();
-                #endif
-
 		::libmaus2::util::ArgInfo const arginfo(argc,argv);
 
-		#if defined(LIBMAUS2_HAVE_IRODS)
-		// set program name for iRODS identification
-		std::stringstream irods_id;
-		irods_id  << PACKAGE_NAME << ":" << arginfo.getProgFileName(arginfo.progname) << ":" << PACKAGE_VERSION;
-		setenv(SP_OPTION, irods_id.str().c_str(), 1);
-		#endif
-
 		for ( uint64_t i = 0; i < arginfo.restargs.size(); ++i )
 			if (
 				arginfo.restargs[i] == "-v"


=====================================
src/programs/bamtofastq.cpp
=====================================
@@ -761,7 +761,7 @@ void bamtofastqCollating(
 				}
 
 				AOS[rgfshift + O2map]->write(reinterpret_cast<char const *>(T.begin()),la);
-				filefrags[rgfshift + Omap]++;
+				filefrags[rgfshift + O2map]++;
 
 				combs.orphans2 += 1;
 				cnt += 1;
@@ -1240,29 +1240,14 @@ void bamtofastq(libmaus2::util::ArgInfo const & arginfo)
 	}
 }
 
-#if defined(LIBMAUS2_HAVE_IRODS)
-#include <libmaus2/irods/IRodsInputStreamFactory.hpp>
-#endif
-
 int main(int argc, char * argv[])
 {
 	try
 	{
-		#if defined(LIBMAUS2_HAVE_IRODS)
-		libmaus2::irods::IRodsInputStreamFactory::registerHandler();
-		#endif
-
 		libmaus2::timing::RealTimeClock rtc; rtc.start();
 
 		::libmaus2::util::ArgInfo arginfo(argc,argv);
 
-		#if defined(LIBMAUS2_HAVE_IRODS)
-		// set program name for iRODS identification
-		std::stringstream irods_id;
-		irods_id  << PACKAGE_NAME << ":" << arginfo.getProgFileName(arginfo.progname) << ":" << PACKAGE_VERSION;
-		setenv(SP_OPTION, irods_id.str().c_str(), 1);
-		#endif
-
 		for ( uint64_t i = 0; i < arginfo.restargs.size(); ++i )
 			if (
 				arginfo.restargs[i] == "-v"
@@ -1369,14 +1354,6 @@ int main(int argc, char * argv[])
 
 		bamtofastq(arginfo);
 
-		#if defined(LIBMAUS2_HAVE_IRODS)
-		// need a explicit call to disconnect to avoid atexit deallocation problems in iRODS 4.19+
-    		if (libmaus2::irods::IRodsSystem::defaultIrodsSystem)
-		{
-    	        	(libmaus2::irods::IRodsSystem::getDefaultIRodsSystem())->disconnect();
-		}
-		#endif
-
 		std::cerr << "[V] " << libmaus2::util::MemUsage() << " wall clock time " << rtc.formatTime(rtc.getElapsedSeconds()) << std::endl;
 	}
 	catch(std::exception const & ex)


=====================================
src/programs/blastnxmltobam.cpp
=====================================
@@ -54,7 +54,7 @@ static std::string stripAfterSpace(std::string const & s)
 struct XercesUtf8Transcoder
 {
 	typedef XercesUtf8Transcoder this_type;
-	typedef libmaus2::util::unique_ptr<this_type>::type unique_ptr_type;
+	typedef std::unique_ptr<this_type> unique_ptr_type;
 
 	xercesc::XMLTransService * ts;
 	xercesc::XMLTranscoder * utf8transcoder;
@@ -1136,12 +1136,12 @@ int main(int argc, char * argv[])
 			std::string const reffn = arginfo.restargs.at(0);
 			std::string const queriesfn = arginfo.restargs.at(1);
 
-			libmaus2::util::unique_ptr< std::vector<libmaus2::bambam::CramRange> >::type Pranges;
+			std::unique_ptr< std::vector<libmaus2::bambam::CramRange> > Pranges;
 			std::vector<libmaus2::bambam::CramRange> * ranges = 0;
 
 			if ( arginfo.hasArg("range") )
 			{
-				libmaus2::util::unique_ptr< std::vector<libmaus2::bambam::CramRange> >::type Tranges(
+				std::unique_ptr< std::vector<libmaus2::bambam::CramRange> > Tranges(
 					new std::vector<libmaus2::bambam::CramRange>
 				);
 				Pranges = std::move(Tranges);


=====================================
src/programs/fastaselectreg.cpp
=====================================
@@ -0,0 +1,156 @@
+/*
+    biobambam2
+    Copyright (C) 2020 German Tischler
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>..
+*/
+
+#include <iostream>
+#include <libmaus2/util/ArgParser.hpp>
+#include <libmaus2/fastx/StreamFastAReader.hpp>
+#include <libmaus2/lz/PlainOrGzipStream.hpp>
+
+#include <biobambam2/BamBamConfig.hpp>
+#include <biobambam2/Licensing.hpp>
+
+#include <regex>
+
+#include <libmaus2/lz/GzipOutputStream.hpp>
+
+static uint64_t getDefaultLineLength()
+{
+	return 80;
+}
+
+/**
+ * read a FastA file (possibly gziped) from stdin, select sequences with (short)
+ * names matching a given regular expression (argument, Posix extended regex)
+ * and output selected sequences on stdout (gziped if -g or --gzip is set).
+ *
+ * Options:
+ * - singleline: output a single line of sequence data per record (i.e. do not wrap)
+ * - longname: do not shorten the sequence name line
+ * - dataonly: output data only, drop FastA headers (lines starting with >)
+ * - up: transform all sequence symbols to upper case
+ * - gzip: compress output using gzip
+ * - verbose: print which sequences are kept and which are discarded
+ * - l<len>: wrap sequence lines after this number of symbols (default 80)
+ **/
+int fastaselectreg(libmaus2::util::ArgParser const & arg)
+{
+	libmaus2::lz::PlainOrGzipStream POS(std::cin);
+	libmaus2::fastx::StreamFastAReaderWrapper SFA(POS);
+	libmaus2::fastx::StreamFastAReaderWrapper::pattern_type pattern;
+	uint64_t const linelength = arg.uniqueArgPresent("l") ? arg.getUnsignedNumericArg<uint64_t>("l") : getDefaultLineLength();
+	std::regex reg(arg[0],std::regex_constants::extended);
+
+	bool const singleline = arg.argPresent("s") || arg.argPresent("singleline");
+	bool const longname = arg.argPresent("L") || arg.argPresent("longname");
+	bool const dataonly = arg.argPresent("d") || arg.argPresent("dataonly");
+	bool const up = arg.argPresent("u") || arg.argPresent("toupper");
+	bool const gzip = arg.argPresent("g") || arg.argPresent("gzip");
+	bool const verbose = arg.argPresent("verbose");
+
+	libmaus2::lz::GzipOutputStream::unique_ptr_type gzptr;
+	std::ostream & ostr = std::cout;
+	if ( gzip )
+	{
+		libmaus2::lz::GzipOutputStream::unique_ptr_type tgzptr(new libmaus2::lz::GzipOutputStream(ostr));
+		gzptr = std::move(tgzptr);
+	}
+	std::ostream & OSI = gzptr ? *gzptr : ostr;
+	while ( SFA.getNextPatternUnlocked(pattern) )
+	{
+		std::string & spat = pattern.spattern;
+
+		if ( up )
+			for ( uint64_t i = 0; i < spat.size(); ++i )
+				spat[i] = toupper(spat[i]);
+
+		std::string const shortname = pattern.getShortStringId();
+
+		if ( std::regex_match(shortname,reg) )
+		{
+			if ( verbose )
+				std::cerr << "[K] keeping " << shortname << std::endl;
+
+			if ( !longname )
+				pattern.sid = shortname;
+
+			if ( dataonly )
+				OSI.write(pattern.spattern.c_str(),pattern.spattern.size());
+			else if ( singleline )
+				OSI << pattern;
+			else
+				pattern.printMultiLine(OSI,linelength);
+		}
+		else if ( verbose )
+		{
+			std::cerr << "[K] discarding " << shortname << std::endl;
+		}
+	}
+
+	return EXIT_SUCCESS;
+}
+
+int main(int argc, char * argv[])
+{
+	try
+	{
+		libmaus2::util::ArgParser const arg(argc,argv);
+
+		if (
+			arg.uniqueArgPresent("v") || arg.uniqueArgPresent("version")
+		)
+		{
+			std::cerr << ::biobambam2::Licensing::license();
+			return EXIT_SUCCESS;
+		}
+		else if (
+			arg.uniqueArgPresent("h") || arg.uniqueArgPresent("help") || arg.size() < 1
+		)
+		{
+			std::cerr << ::biobambam2::Licensing::license();
+			std::cerr << std::endl;
+			std::cerr << "usage: " << arg.progname << " <regex> <in.fasta > out.fasta" << std::endl;
+			std::cerr << std::endl;
+			std::cerr << "options:" << std::endl;
+			std::cerr << std::endl;
+
+			std::vector< std::pair<std::string,std::string> > V;
+
+			V.push_back ( std::pair<std::string,std::string> ( "-v/--version", "print version number and quit" ) );
+			V.push_back ( std::pair<std::string,std::string> ( "-h/--help", "print help message and quit" ) );
+			V.push_back ( std::pair<std::string,std::string> ( "-l<cols>", "line length (default: "+libmaus2::util::NumberSerialisation::formatNumber(getDefaultLineLength(),0)+")" ) );
+			V.push_back ( std::pair<std::string,std::string> ( "-s/--singleline", "do not wrap sequence data lines" ) );
+			V.push_back ( std::pair<std::string,std::string> ( "-L/--longname", "do not shorten name" ) );
+			V.push_back ( std::pair<std::string,std::string> ( "-d/--dataonly", "do not print FastA header (data only)" ) );
+			V.push_back ( std::pair<std::string,std::string> ( "-u/--toupper", "convert sequence symbols to upper case" ) );
+			V.push_back ( std::pair<std::string,std::string> ( "-g/--gzip", "compress output" ) );
+
+			::biobambam2::Licensing::printMap(std::cerr,V);
+
+			std::cerr << std::endl;
+			return EXIT_SUCCESS;
+
+		}
+
+		return fastaselectreg(arg);
+	}
+	catch(std::exception const & ex)
+	{
+		std::cerr << ex.what() << std::endl;
+		return EXIT_FAILURE;
+	}
+}



View it on GitLab: https://salsa.debian.org/med-team/biobambam2/-/compare/eed8557fd96834ac16ad973c373d3786af6ceee2...bfc5cd7ddba5296ecd53e854866fa356808e0a22

-- 
View it on GitLab: https://salsa.debian.org/med-team/biobambam2/-/compare/eed8557fd96834ac16ad973c373d3786af6ceee2...bfc5cd7ddba5296ecd53e854866fa356808e0a22
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20201118/9df0e27d/attachment-0001.html>


More information about the debian-med-commit mailing list