[med-svn] [Git][med-team/stacks][master] 7 commits: New upstream version 2.60+dfsg

Andreas Tille (@tille) gitlab at salsa.debian.org
Tue Sep 27 11:42:34 BST 2022



Andreas Tille pushed to branch master at Debian Med / stacks


Commits:
cfc9f125 by Nilesh Patra at 2021-11-02T00:07:48+05:30
New upstream version 2.60+dfsg
- - - - -
cd73b73e by Andreas Tille at 2022-09-27T12:20:32+02:00
New upstream version 2.62+dfsg
- - - - -
afc93edc by Andreas Tille at 2022-09-27T12:20:32+02:00
routine-update: New upstream version

- - - - -
642ed1b4 by Andreas Tille at 2022-09-27T12:20:32+02:00
Update upstream source from tag 'upstream/2.62+dfsg'

Update to upstream version '2.62+dfsg'
with Debian dir b394fff46c780a855124eef32d13ab835ec15d1a
- - - - -
15d7ead4 by Andreas Tille at 2022-09-27T12:20:32+02:00
routine-update: Standards-Version: 4.6.1

- - - - -
80c72c78 by Andreas Tille at 2022-09-27T12:37:38+02:00
Refresh patches

- - - - -
3a737b16 by Andreas Tille at 2022-09-27T12:39:31+02:00
Upload to unstable

- - - - -


23 changed files:

- ChangeLog
- Makefile.am
- configure.ac
- debian/changelog
- debian/control
- − debian/patches/2to3.patch
- debian/patches/script-exe-paths
- debian/patches/series
- − scripts/count_fixed_catalog_snps.py
- scripts/denovo_map.pl
- − scripts/integrate_alignments.py
- scripts/stacks-dist-extract
- src/bootstrap.cc
- src/clean.h
- src/export_formats.cc
- src/export_formats.h
- src/ordered.h
- src/populations.cc
- src/populations.h
- src/process_radtags.cc
- src/process_radtags.h
- src/renz.cc
- src/ustacks.cc


Changes:

=====================================
ChangeLog
=====================================
@@ -1,3 +1,28 @@
+Stacks 2.62 - June 28, 2022
+---------------------------
+    Feature: added a '--vcf-all' export to populations which will export fixed
+             and variable sites to a VCF file. If '--ordered' is also specified, it
+             will only export non-overlapping sites.
+    Feature: improved ustacks logging to include final number of assembled
+             loci; modified denovo_map.pl to include this in its logging output.
+             Improved logging in populations program. 
+    Feature: Added variant of PstI cutsite to process_radtags: pstishi only
+             leaves "GCAG" as the cutsite remnant, published in:
+	     Shirasawa, Hirakawa, and Isobe, 2016, DNA Research; doi: 10.1093/dnares/dsw004 
+    Bugfix: fixed assert() failure in populations when no population map was
+            specified.
+    Bugfix: updated stacks-dist-extract --pretty print to better handle
+            printing comments.
+
+Stacks 2.61 - April 19, 2022
+----------------------------
+    Feature: parallelized process_radtags. Can now run on multiple cores (max of 24 cores), resulting
+             in a speedup of 2-3x, depending on physical I/O and number of cores used. Minor improvements
+             to output status messages.
+    Feature: added '--pretty' print option to stacks-dist-extract script..
+    Bugfix:  corrected bug in parsing of bootstrap archive file, long lines were not properly handled.
+    Feature: Added HhaI restriction enzyme.
+
 Stacks 2.60 - October 26, 2021
 ------------------------------
     Feature: memory usage reduction in populations. Some examples of memory savings:


=====================================
Makefile.am
=====================================
@@ -85,7 +85,6 @@ kmer_filter_LDADD        = $(LDADD) libclean.a
 populations_LDADD        = $(LDADD) libpop.a
 
 dist_bin_SCRIPTS = scripts/denovo_map.pl scripts/ref_map.pl \
-	scripts/integrate_alignments.py scripts/count_fixed_catalog_snps.py \
 	scripts/stacks-integrate-alignments scripts/stacks-dist-extract scripts/stacks-gdb \
 	scripts/stacks-samtools-tview scripts/stacks-count-reads-per-sample-per-locus \
 	scripts/stacks-hist2d-loci-samples-coverage # scripts/denovo_map.py


=====================================
configure.ac
=====================================
@@ -2,7 +2,7 @@
 # Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.59)
-AC_INIT([Stacks], [2.60])
+AC_INIT([Stacks], [2.62])
 AC_CONFIG_AUX_DIR([config])
 AM_INIT_AUTOMAKE([-Wall -Werror foreign parallel-tests subdir-objects])
 AC_CONFIG_SRCDIR([src/ustacks.cc])


=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+stacks (2.62+dfsg-1) unstable; urgency=medium
+
+  * New upstream version
+  * Standards-Version: 4.6.1 (routine-update)
+
+ -- Andreas Tille <tille at debian.org>  Tue, 27 Sep 2022 12:38:01 +0200
+
 stacks (2.60+dfsg-1) unstable; urgency=medium
 
   * Team upload.


=====================================
debian/control
=====================================
@@ -10,7 +10,7 @@ Build-Depends: debhelper-compat (= 13),
                libsparsehash-dev,
                samtools,
                libhts-dev
-Standards-Version: 4.6.0
+Standards-Version: 4.6.1
 Vcs-Browser: https://salsa.debian.org/med-team/stacks
 Vcs-Git: https://salsa.debian.org/med-team/stacks.git
 Homepage: https://creskolab.uoregon.edu/stacks/


=====================================
debian/patches/2to3.patch deleted
=====================================
@@ -1,164 +0,0 @@
-Author: Andreas Tille <tille at debian.org>
-Last-Update: Wed, 10 Jul 2019 21:58:30 +0200
-Description: Result of 2to3 port to Python3
-
---- a/scripts/count_fixed_catalog_snps.py
-+++ b/scripts/count_fixed_catalog_snps.py
-@@ -1,4 +1,4 @@
--#!/usr/bin/env python
-+#!/usr/bin/python3
- 
- import optparse
- import sys
-@@ -39,7 +39,7 @@
-         batch_id = int(opts.batch_id)
- 
-     if len(path) == 0 or os.path.exists(path) == False:
--        print >> sys.stderr, "You must specify a valid path to Stacks input files."
-+        print("You must specify a valid path to Stacks input files.", file=sys.stderr)
-         p.print_help()
-         sys.exit()
- 
-@@ -62,10 +62,10 @@
-                 pos = entry.find(".matches.tsv")
-             if (pos != -1):
-                 files.append(entry[0:pos])
--        print >> sys.stderr, "Found", len(files), "Stacks samples."
-+        print("Found", len(files), "Stacks samples.", file=sys.stderr)
- 
-     except:
--        print >> sys.stderr, "Unable to read files from Stacks directory, '" + path + "'"
-+        print("Unable to read files from Stacks directory, '" + path + "'", file=sys.stderr)
- 
- 
- def parse_cigar(cigar, components):
-@@ -143,14 +143,14 @@
-             matches[sample_locus] = cat_locus
-         else:
-             if cat_locus != matches[sample_locus]:
--                print >> sys.stderr, "Error: sample locus", sample_locus, "matches more than one catalog locus."
-+                print("Error: sample locus", sample_locus, "matches more than one catalog locus.", file=sys.stderr)
- 
-         if len(cigar) > 0:
-             if sample_locus not in cigars:
-                 cigars[sample_locus] = cigar
-             else:
-                 if cigar != cigars[sample_locus]:
--                    print >> sys.stderr, "Error: sample locus", sample_locus, "has multiple cigar alignments."
-+                    print("Error: sample locus", sample_locus, "has multiple cigar alignments.", file=sys.stderr)
- 
-     fh.close()
- 
-@@ -279,23 +279,23 @@
- #
- i = 1
- for file in files:
--    print >> sys.stderr, "Processing file", str(i), "of", len(files), "['" +  file + "']"
-+    print("Processing file", str(i), "of", len(files), "['" +  file + "']", file=sys.stderr)
-     cnt = count_sample_snps(path, file, sample_snps)
--    print >> sys.stderr, "  Found", cnt, "heterozygous SNPs in sample."
-+    print("  Found", cnt, "heterozygous SNPs in sample.", file=sys.stderr)
-     i += 1
- 
- total_snps = 0
- for locus in sample_snps:
-     for col in sample_snps[locus]:
-         total_snps += 1
--print >> sys.stderr, "Found", total_snps, "variable sites across the population."
-+print("Found", total_snps, "variable sites across the population.", file=sys.stderr)
- 
- #
- # Count all the SNPs found in the catalog.
- #
--print >> sys.stderr, "Processing the catalog"
-+print("Processing the catalog", file=sys.stderr)
- cnt = count_catalog_snps(path, batch_id, catalog_snps)
--print >> sys.stderr, "  Found", cnt, "heterozygous SNPs in the catalog."
-+print("  Found", cnt, "heterozygous SNPs in the catalog.", file=sys.stderr)
- 
- #
- # Count all the SNPs in the catalog but not in any sample: these are the fixed differences cstacks identified.
-@@ -315,7 +315,7 @@
-             fixed_snps += 1
-     for col in sample_snps[locus]:
-         if col not in c:
--            print "Locus:", locus, "col:", col, "Catalog SNPs:", catalog_snps[locus], "Sample SNPs:", sample_snps[locus]
-+            print("Locus:", locus, "col:", col, "Catalog SNPs:", catalog_snps[locus], "Sample SNPs:", sample_snps[locus])
- 
--print >> sys.stderr, "Found", total_snps, "SNPs across all samples and in the catalog."
--print >> sys.stderr, "Found", fixed_snps, "fixed SNPs only in the catalog."
-+print("Found", total_snps, "SNPs across all samples and in the catalog.", file=sys.stderr)
-+print("Found", fixed_snps, "fixed SNPs only in the catalog.", file=sys.stderr)
---- a/scripts/integrate_alignments.py
-+++ b/scripts/integrate_alignments.py
-@@ -1,4 +1,4 @@
--#!/usr/bin/env python
-+#!/usr/bin/python3
- 
- import optparse
- import sys
-@@ -49,7 +49,7 @@
-         batch_id = int(opts.batch_id)
- 
-     if len(out_path) == 0 or os.path.exists(out_path) == False:
--        print >> sys.stderr, "You must specify a valid path to write files to."
-+        print("You must specify a valid path to write files to.", file=sys.stderr)
-         p.print_help()
-         sys.exit()
- 
-@@ -57,12 +57,12 @@
-         out_path += "/"
- 
-     if len(aln_path) == 0 or os.path.exists(aln_path) == False:
--        print >> sys.stderr, "You must specify a valid path to a SAM file containing catalog locus alignments."
-+        print("You must specify a valid path to a SAM file containing catalog locus alignments.", file=sys.stderr)
-         p.print_help()
-         sys.exit()
- 
-     if len(path) == 0 or os.path.exists(path) == False:
--        print >> sys.stderr, "You must specify a valid path to Stacks input files."
-+        print("You must specify a valid path to Stacks input files.", file=sys.stderr)
-         p.print_help()
-         sys.exit()
- 
-@@ -85,10 +85,10 @@
-                 pos = entry.find(".matches.tsv")
-             if (pos != -1):
-                 files.append(entry[0:pos])
--        print >> sys.stderr, "Found", len(files), "Stacks samples."
-+        print("Found", len(files), "Stacks samples.", file=sys.stderr)
- 
-     except:
--        print >> sys.stderr, "Unable to read files from Stacks directory, '", path, "'"
-+        print("Unable to read files from Stacks directory, '", path, "'", file=sys.stderr)
- 
- 
- def parse_catalog_alignments(aln_path, alns):
-@@ -115,7 +115,7 @@
-             alns[locus] = (chr, bp, "+");
- 
-     fh.close()
--    print >> sys.stderr, "Loaded", len(alns), "catalog locus alignments from '", aln_path, "'."
-+    print("Loaded", len(alns), "catalog locus alignments from '", aln_path, "'.", file=sys.stderr)
- 
- 
- def convert_sample(path, file, out_path, alns):
-@@ -414,14 +414,14 @@
- 
- i  = 1
- for file in files:
--    print >> sys.stderr, "Processing file", str(i), "of", len(files), "['" +  file + "']"
-+    print("Processing file", str(i), "of", len(files), "['" +  file + "']", file=sys.stderr)
-     cnt = convert_sample(path, file, out_path, alns)
--    print >> sys.stderr, "  Added alignments for", cnt, "loci."
-+    print("  Added alignments for", cnt, "loci.", file=sys.stderr)
-     i += 1
- 
- #
- # Now process the catalog.
- #
--print >> sys.stderr, "Processing the catalog"
-+print("Processing the catalog", file=sys.stderr)
- cnt = convert_catalog(path, batch_id, out_path, alns)
--print >> sys.stderr, "  Added alignments for", cnt, "catalog loci."
-+print("  Added alignments for", cnt, "catalog loci.", file=sys.stderr)


=====================================
debian/patches/script-exe-paths
=====================================
@@ -9,7 +9,7 @@ Last-Update: 2016-10-21
 This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
 --- a/Makefile.am
 +++ b/Makefile.am
-@@ -104,10 +104,10 @@
+@@ -103,10 +103,10 @@ debug:
  	$(MAKE) all "CXXFLAGS=-g3 -Wall -DDEBUG -Og"
  
  install-data-hook:


=====================================
debian/patches/series
=====================================
@@ -1,4 +1,3 @@
 RisR.patch
 script-exe-paths
 use_debian_packaged_htslib.patch
-2to3.patch


=====================================
scripts/count_fixed_catalog_snps.py deleted
=====================================
@@ -1,321 +0,0 @@
-#!/usr/bin/env python
-
-import optparse
-import sys
-import os
-import gzip
-import re
-
-#
-# Global configuration variables.
-#
-path     = ""
-aln_path = ""
-out_path = ""
-batch_id = -1
-
-def parse_command_line():
-    global path
-    global batch_id
-
-    p = optparse.OptionParser()
-
-    #
-    # Add options.
-    #
-    p.add_option("-p", action="store", dest="path",
-                 help="path to Stacks directory.")
-    p.add_option("-b", action="store", dest="batch_id",
-                 help="Stacks batch ID.")
-
-    #
-    # Parse the command line
-    #
-    (opts, args) = p.parse_args()
-
-    if opts.path != None:
-        path = opts.path
-    if opts.batch_id != None:
-        batch_id = int(opts.batch_id)
-
-    if len(path) == 0 or os.path.exists(path) == False:
-        print >> sys.stderr, "You must specify a valid path to Stacks input files."
-        p.print_help()
-        sys.exit()
-
-    if batch_id < 0:
-        pritn >> sys.stderr, "You must specify the batch ID that was supplied to Stacks."
-        p.print_help()
-        sys.exit()
-
-    if path.endswith("/") == False:
-        path += "/"
-        
-        
-def find_stacks_files(path, files):
-    try:
-        entries = os.listdir(path)
-
-        for entry in entries:
-            pos = entry.find(".matches.tsv.gz")
-            if (pos == -1):
-                pos = entry.find(".matches.tsv")
-            if (pos != -1):
-                files.append(entry[0:pos])
-        print >> sys.stderr, "Found", len(files), "Stacks samples."
-
-    except:
-        print >> sys.stderr, "Unable to read files from Stacks directory, '" + path + "'"
-
-
-def parse_cigar(cigar, components):
-    #
-    # Parse a cigar string, e.g. 48M1D47M or 43M3D52M3D.
-    #
-    start = 0
-    end   = 0
-    dist  = ""
-    for c in cigar:
-        if c.isalpha():
-            dist   = int(cigar[start:end])
-            op     = c.upper();
-            end   += 1
-            start  = end
-            components.append((op, dist))
-        else:
-            end += 1
-
-
-def adjust_snps(cigar, sample_snps):
-    #
-    # Adjust SNP positions according to how this sample was aligned to the catalog
-    # (which is described by the CIGAR string).
-    #
-    if len(sample_snps) == 0:
-        return
-
-    comp = []
-    parse_cigar(cigar, comp)
-
-    index  = 0
-    offset = 0
-    bp     = 0
-    for (op, dist) in comp:
-        if op == 'M' or op == 'I':
-            bp += dist
-            while index < len(sample_snps) and sample_snps[index] < bp:
-                sample_snps[index] += offset
-                index += 1
-        if op == 'D':
-            offset += dist
-
-
-def count_sample_snps(path, file, sample_snps):
-    matches = {}
-    cigars  = {}
-    #
-    # Open the matches file and load the matches to the catalog.
-    #
-    p = path + file + ".matches.tsv.gz"
-    if os.path.exists(p):
-        gzipped = True;
-        fh = gzip.open(p, 'rb')
-    else:
-        gzipped = False;
-        fh = open(path + file + ".matches.tsv", "r")
-
-    for line in fh:
-        line = line.strip("\n")
-
-        if len(line) == 0 or line[0] == "#":
-            continue
-
-        parts = line.split("\t")
-
-        cat_locus    = int(parts[2])
-        sample_locus = int(parts[4])
-        if len(parts) == 9:
-            cigar = parts[8] if len(parts[8]) > 0 else ""
-        else:
-            cigar = ""
-
-        if sample_locus not in matches:
-            matches[sample_locus] = cat_locus
-        else:
-            if cat_locus != matches[sample_locus]:
-                print >> sys.stderr, "Error: sample locus", sample_locus, "matches more than one catalog locus."
-
-        if len(cigar) > 0:
-            if sample_locus not in cigars:
-                cigars[sample_locus] = cigar
-            else:
-                if cigar != cigars[sample_locus]:
-                    print >> sys.stderr, "Error: sample locus", sample_locus, "has multiple cigar alignments."
-
-    fh.close()
-
-    #
-    # Open the SNPs file and record all the SNP positions found in this sample.
-    #
-    if gzipped:
-        fh = gzip.open(path + file + ".snps.tsv.gz", "rb")
-    else:
-        fh = open(path + file + ".snps.tsv", "r")
-
-    snps = {}
-
-    for line in fh:
-        if line[0] == "#":
-            continue
-
-        if len(line) == 0:
-            continue
-
-        parts        = line.split("\t")
-        sample_locus = int(parts[2])
-        col          = int(parts[3])
-        model        = parts[4]
-
-        if model != "E":
-            continue
-
-        if sample_locus not in matches:
-            continue
-
-        if sample_locus not in snps:
-            snps[sample_locus] = []
-        snps[sample_locus].append(col)
-
-    fh.close()
-
-    #
-    # Adjust SNP positions according to the gapped alignments recorded in the CIGAR string.
-    #
-    for sample_locus in snps:
-        if sample_locus in cigars:
-            adjust_snps(cigars[sample_locus], snps[sample_locus])        
-
-    snp_cnt = 0
-
-    #
-    # Transfer this sample's SNPs to the catalog level dictionary.
-    #
-    for sample_locus in snps:
-        cat_locus = matches[sample_locus]
-        
-        for col in snps[sample_locus]:
-            if cat_locus in sample_snps:
-                if col in sample_snps[cat_locus]:
-                    sample_snps[cat_locus][col] += 1
-                    # print >> sys.stderr, "CatLocus:", cat_locus, "; col:", col
-                else:
-                    sample_snps[cat_locus][col]  = 1
-                    snp_cnt  += 1
-            else:
-                sample_snps[cat_locus]      = {}
-                sample_snps[cat_locus][col] = 1
-                snp_cnt  += 1
-        
-    return snp_cnt
-
-
-def count_catalog_snps(path, batch_id, catalog_snps):
-    #
-    # Open the tags file and rewrite it with the alignment coordinates.
-    #
-    p = path + "batch_" + str(batch_id) + ".catalog.snps.tsv.gz"
-    if os.path.exists(p):
-        gzipped = True;
-        fh = gzip.open(path + "batch_" + str(batch_id) + ".catalog.snps.tsv.gz", "rb")
-    else:
-        gzipped = False;
-        fh = open(path + "batch_" + str(batch_id) + ".catalog.snps.tsv", "r")
-
-    snp_cnt = 0
-
-    for line in fh:
-        if line[0] == "#":
-            continue
-
-        if len(line) == 0:
-            continue
-
-        parts     = line.split("\t")
-        cat_locus = int(parts[2])
-        col       = int(parts[3])
-        model     = parts[4]
-
-        if model != "E":
-            continue
-
-        snp_cnt += 1
-
-        if cat_locus not in catalog_snps:
-            catalog_snps[cat_locus] = []
-
-        catalog_snps[cat_locus].append(col)
-
-    fh.close()
-
-    return snp_cnt
-
-# #                                                                                             # #
-# # ------------------------------------------------------------------------------------------- # #
-# #                                                                                             # #
-
-parse_command_line()
-
-files        = []
-catalog_snps = {}
-sample_snps  = {}
-
-#
-# Find all the sample files by looking for the matches files in the Stacks directory.
-#
-find_stacks_files(path, files)
-
-#
-# Count all the SNPs identified in the samples.
-#
-i = 1
-for file in files:
-    print >> sys.stderr, "Processing file", str(i), "of", len(files), "['" +  file + "']"
-    cnt = count_sample_snps(path, file, sample_snps)
-    print >> sys.stderr, "  Found", cnt, "heterozygous SNPs in sample."
-    i += 1
-
-total_snps = 0
-for locus in sample_snps:
-    for col in sample_snps[locus]:
-        total_snps += 1
-print >> sys.stderr, "Found", total_snps, "variable sites across the population."
-
-#
-# Count all the SNPs found in the catalog.
-#
-print >> sys.stderr, "Processing the catalog"
-cnt = count_catalog_snps(path, batch_id, catalog_snps)
-print >> sys.stderr, "  Found", cnt, "heterozygous SNPs in the catalog."
-
-#
-# Count all the SNPs in the catalog but not in any sample: these are the fixed differences cstacks identified.
-#
-total_snps = 0
-fixed_snps = 0
-
-for locus in catalog_snps:
-    if locus not in sample_snps:
-        continue
-    c = {}
-    for col in catalog_snps[locus]:
-        c[col] = 1
-        total_snps += 1
-        if col not in sample_snps[locus]:
-            # print >> sys.stderr, "Locus:", locus, "Catalog SNPs:", catalog_snps[locus], "Sample SNPs:", sample_snps[locus]
-            fixed_snps += 1
-    for col in sample_snps[locus]:
-        if col not in c:
-            print "Locus:", locus, "col:", col, "Catalog SNPs:", catalog_snps[locus], "Sample SNPs:", sample_snps[locus]
-
-print >> sys.stderr, "Found", total_snps, "SNPs across all samples and in the catalog."
-print >> sys.stderr, "Found", fixed_snps, "fixed SNPs only in the catalog."


=====================================
scripts/denovo_map.pl
=====================================
@@ -184,11 +184,14 @@ sub execute_stacks {
             check_return_value($?, $log_fh, $cmd);
 
             #
-            # Pull the depth of coverage from ustacks.
+            # Pull the depth of coverage and number of loci from ustacks..
             #
             my $depthline = (grep(/^Final coverage/, @results))[0];
             my ($depth, $stdev, $max, $nreads, $pctreads) = ($depthline =~ /mean=([\d.]+); stdev=([\d.]+); max=(\d+); n_reads=(\d+)\(([\d.]+)%\)/);
-            push(@depths_of_cov, [$sample->{'file'}, $depth, $stdev, $max, $nreads, $pctreads]);
+
+            $depthline = (grep(/^Final number of stacks built/, @results))[0];
+            my ($locicnt) = ($depthline =~ /Final number of stacks built: ([\d.]+)/);
+            push(@depths_of_cov, [$sample->{'file'}, $depth, $stdev, $max, $nreads, $pctreads, $locicnt]);
         }
 
         $i++;
@@ -733,11 +736,11 @@ sub write_depths_of_cov {
     print $log_fh
         "\nBEGIN cov_per_sample\n",
         "# Depths of Coverage for Processed Samples\n",
-        "sample\tdepth of cov\tmax cov\tnumber reads incorporated\t% reads incorporated\n";
+        "sample\tloci assembled\tdepth of cov\tmax cov\tnumber reads incorporated\t% reads incorporated\n";
 
     foreach $a (@{$depths}) {
-        print STDERR  $a->[0], "; depth: ", $a->[1], "x; max: ", $a->[3], "x; number of reads: ", $a->[4], " (", $a->[5], "%)\n";
-        print $log_fh $a->[0], "\t", $a->[1], "\t", $a->[3], "\t", $a->[4], "\t", $a->[5], "\n";
+        print STDERR  $a->[0], "; loci assembled: ", $a->[6], "; depth: ", $a->[1], "x; max: ", $a->[3], "x; number of reads: ", $a->[4], " (", $a->[5], "%)\n";
+        print $log_fh $a->[0], "\t", $a->[6], "\t", $a->[1], "\t", $a->[3], "\t", $a->[4], "\t", $a->[5], "\n";
     }
 
     print $log_fh
@@ -891,7 +894,7 @@ denovo_map.pl --samples dir --popmap path --out-path dir [--paired [--rm-pcr-dup
 
   Stack assembly options:
     -M: number of mismatches allowed between stacks within individuals (for ustacks).
-    -n: number of mismatches allowed between stacks between individuals (for cstacks; default 1; suggested: set to ustacks -M).
+    -n: number of mismatches allowed between stacks between individuals (for cstacks; default: set to ustacks -M).
 
   SNP model options:
     --var-alpha: significance level at which to call variant sites (for gstacks; default: 0.05).


=====================================
scripts/integrate_alignments.py deleted
=====================================
@@ -1,427 +0,0 @@
-#!/usr/bin/env python
-
-import optparse
-import sys
-import os
-import gzip
-import re
-
-#
-# Global configuration variables.
-#
-path     = ""
-aln_path = ""
-out_path = ""
-batch_id = -1
-
-def parse_command_line():
-    global out_path
-    global aln_path
-    global path
-    global batch_id
-
-    p = optparse.OptionParser()
-
-    #
-    # Add options.
-    #
-    p.add_option("-o", action="store", dest="out_path",
-                 help="write modified Stacks files to this output path.")
-    p.add_option("-a", action="store", dest="aln_path",
-                 help="SAM file containing catalog locus alignments.")
-    p.add_option("-p", action="store", dest="path",
-                 help="path to Stacks directory.")
-    p.add_option("-b", action="store", dest="batch_id",
-                 help="Stacks batch ID.")
-
-    #
-    # Parse the command line
-    #
-    (opts, args) = p.parse_args()
-
-    if opts.out_path != None:
-        out_path = opts.out_path
-    if opts.aln_path != None:
-        aln_path = opts.aln_path
-    if opts.path != None:
-        path = opts.path
-    if opts.batch_id != None:
-        batch_id = int(opts.batch_id)
-
-    if len(out_path) == 0 or os.path.exists(out_path) == False:
-        print >> sys.stderr, "You must specify a valid path to write files to."
-        p.print_help()
-        sys.exit()
-
-    if out_path.endswith("/") == False:
-        out_path += "/"
-
-    if len(aln_path) == 0 or os.path.exists(aln_path) == False:
-        print >> sys.stderr, "You must specify a valid path to a SAM file containing catalog locus alignments."
-        p.print_help()
-        sys.exit()
-
-    if len(path) == 0 or os.path.exists(path) == False:
-        print >> sys.stderr, "You must specify a valid path to Stacks input files."
-        p.print_help()
-        sys.exit()
-
-    if batch_id < 0:
-        pritn >> sys.stderr, "You must specify the batch ID that was supplied to Stacks."
-        p.print_help()
-        sys.exit()
-
-    if path.endswith("/") == False:
-        path += "/"
-        
-        
-def find_stacks_files(path, files):
-    try:
-        entries = os.listdir(path)
-
-        for entry in entries:
-            pos = entry.find(".matches.tsv.gz")
-            if (pos == -1):
-                pos = entry.find(".matches.tsv")
-            if (pos != -1):
-                files.append(entry[0:pos])
-        print >> sys.stderr, "Found", len(files), "Stacks samples."
-
-    except:
-        print >> sys.stderr, "Unable to read files from Stacks directory, '", path, "'"
-
-
-def parse_catalog_alignments(aln_path, alns):
-    fh = open(aln_path, "r")
-
-    for line in fh:
-	line = line.strip("\n")
-
-	if len(line) == 0 or line[0] == "#" or line[0] == "@":
-            continue
-
-	parts = line.split("\t")
-        locus = int(parts[0])
-        chr   = parts[2]
-        bp    = int(parts[3])
-        flag  = int(parts[1])
-
-        #
-        # Check which strand the read is aligned to.
-        #
-        if flag & 16 > 0:
-            alns[locus] = (chr, bp, "-");
-        else:
-            alns[locus] = (chr, bp, "+");
-
-    fh.close()
-    print >> sys.stderr, "Loaded", len(alns), "catalog locus alignments from '", aln_path, "'."
-
-
-def convert_sample(path, file, out_path, alns):
-    matches = {}
-    #
-    # Open the matches file and load the matches to the catalog.
-    #
-    p = path + file + ".matches.tsv.gz"
-    if os.path.exists(p):
-        gzipped = True;
-        fh = gzip.open(p, 'rb')
-    else:
-        gzipped = False;
-        fh = open(path + file + ".matches.tsv", "r")
-
-    for line in fh:
-	line = line.strip("\n")
-
-	if len(line) == 0 or line[0] == "#":
-            continue
-
-	parts = line.split("\t")
-
-        cat_locus    = int(parts[2])
-        sample_locus = int(parts[4])
-        matches[sample_locus] = cat_locus
-
-    fh.close()
-
-    #
-    # Open the tags file and rewrite it with the alignment coordinates.
-    #
-    if gzipped:
-        fh = gzip.open(path + file + ".tags.tsv.gz", "rb")
-    else:
-        fh = open(path + file + ".tags.tsv", "r")
-
-    out_fh = open(out_path + file + ".tags.tsv", "w")
-
-    alns_written = {}
-
-    for line in fh:
-        if line[0] == "#":
-            out_fh.write(line)
-            continue
-
-	if len(line) == 0:
-            continue
-
-	parts        = line.split("\t")
-        sample_locus = int(parts[2])
-        read_type    = parts[6]
-
-        if read_type == "consensus":
-            if sample_locus not in matches:
-                continue;
-            cat_locus = matches[sample_locus]
-            if cat_locus not in alns:
-                continue;
-
-            (chr, bp, strand) = alns[cat_locus]
-
-            if sample_locus in alns_written:
-                alns_written[sample_locus] += 1
-            else:
-                alns_written[sample_locus]  = 1;
-                
-            buf = "\t".join(parts[0:3]) + "\t" + chr + "\t" + str(bp) + "\t" + strand + "\t" + "\t".join(parts[6:])
-            out_fh.write(buf)
-
-        elif sample_locus in alns_written:
-            out_fh.write(line)
-
-    fh.close()
-    out_fh.close()
-
-    #
-    # Open the SNPs, Alleles, and Matches files and rewrite those that had alignment coordinates.
-    #
-    if gzipped:
-        fh = gzip.open(path + file + ".snps.tsv.gz", "rb")
-    else:
-        fh = open(path + file + ".snps.tsv", "r")
-
-    out_fh = open(out_path + file + ".snps.tsv", "w")
-
-    for line in fh:
-        if line[0] == "#":
-            out_fh.write(line)
-            continue
-
-	if len(line) == 0:
-            continue
-
-	parts        = line.split("\t")
-        sample_locus = int(parts[2])
-
-        if sample_locus in alns_written:
-            out_fh.write(line)
-
-    fh.close()
-    out_fh.close()
-
-    if gzipped:
-        fh = gzip.open(path + file + ".alleles.tsv.gz", "rb")
-    else:
-        fh = open(path + file + ".alleles.tsv", "r")
-
-    out_fh = open(out_path + file + ".alleles.tsv", "w")
-
-    for line in fh:
-        if line[0] == "#":
-            out_fh.write(line)
-            continue
-
-	if len(line) == 0:
-            continue
-
-	parts        = line.split("\t")
-        sample_locus = int(parts[2])
-
-        if sample_locus in alns_written:
-            out_fh.write(line)
-
-    fh.close()
-    out_fh.close()
-
-    if gzipped:
-        fh = gzip.open(path + file + ".matches.tsv.gz", "rb")
-    else:
-        fh = open(path + file + ".matches.tsv", "r")
-
-    out_fh = open(out_path + file + ".matches.tsv", "w")
-
-    for line in fh:
-        if line[0] == "#":
-            out_fh.write(line)
-            continue
-
-	if len(line) == 0:
-            continue
-
-	parts        = line.split("\t")
-        sample_locus = int(parts[2])
-
-        if sample_locus in alns_written:
-            out_fh.write(line)
-
-    fh.close()
-    out_fh.close()
-
-    #
-    # If it exists, open the model file and rewrite it with the alignment coordinates.
-    #
-    if gzipped:
-        if os.path.exists(path + file + ".models.tsv.gz") == False:
-            return len(alns_written)
-    elif os.path.exists(path + file + ".models.tsv") == False:
-        return len(alns_written)
-
-    if gzipped:
-        fh = gzip.open(path + file + ".models.tsv.gz", "rb")
-    else:
-        fh = open(path + file + ".models.tsv", "r")
-
-    out_fh = open(out_path + file + ".models.tsv", "w")
-
-    for line in fh:
-        if line[0] == "#":
-            out_fh.write(line)
-            continue
-
-	if len(line) == 0:
-            continue
-
-	parts        = line.split("\t")
-        sample_locus = int(parts[2])
-        read_type    = parts[6]
-
-        if sample_locus in alns_written:
-            if read_type == "consensus":
-                cat_locus = matches[sample_locus]
-                (chr, bp, strand) = alns[cat_locus]
-                buf = "\t".join(parts[0:3]) + "\t" + chr + "\t" + str(bp) + "\t" + strand + "\t" + "\t".join(parts[6:])
-                out_fh.write(buf)
-            else:
-                out_fh.write(line)
-
-    fh.close()
-    out_fh.close()
-
-    return len(alns_written)
-
-
-def convert_catalog(path, batch_id, out_path, alns):
-    #
-    # Open the tags file and rewrite it with the alignment coordinates.
-    #
-    p = path + "batch_" + str(batch_id) + ".catalog.tags.tsv.gz"
-    if os.path.exists(p):
-        gzipped = True;
-        fh = gzip.open(path + "batch_" + str(batch_id) + ".catalog.tags.tsv.gz", "rb")
-    else:
-        gzipped = False;
-        fh = open(path + "batch_" + str(batch_id) + ".catalog.tags.tsv", "r")
-
-    out_fh = open(out_path + "batch_" + str(batch_id) + ".catalog.tags.tsv", "w")
-
-    alns_written = {}
-
-    for line in fh:
-        if line[0] == "#":
-            out_fh.write(line)
-            continue
-
-	if len(line) == 0:
-            continue
-
-	parts     = line.split("\t")
-        cat_locus = int(parts[2])
-
-        if cat_locus not in alns:
-            continue;
-
-        (chr, bp, strand) = alns[cat_locus]
-
-        if cat_locus in alns_written:
-            alns_written[cat_locus] += 1
-        else:
-            alns_written[cat_locus]  = 1;
-               
-        buf = "\t".join(parts[0:3]) + "\t" + chr + "\t" + str(bp) + "\t" + strand + "\t" + "\t".join(parts[6:])
-        out_fh.write(buf)
-
-    fh.close()
-    out_fh.close()
-
-    if gzipped:
-        fh = gzip.open(path + "batch_" + str(batch_id) + ".catalog.snps.tsv.gz", "rb")
-    else:
-        fh = open(path + "batch_" + str(batch_id) + ".catalog.snps.tsv", "r")
-
-    out_fh = open(out_path + "batch_" + str(batch_id) + ".catalog.snps.tsv", "w")
-
-    for line in fh:
-        if line[0] == "#":
-            out_fh.write(line)
-            continue
-
-	if len(line) == 0:
-            continue
-
-	parts     = line.split("\t")
-        cat_locus = int(parts[2])
-
-        if cat_locus in alns_written:
-            out_fh.write(line)
-
-    fh.close()
-    out_fh.close()
-
-    if gzipped:
-        fh = gzip.open(path + "batch_" + str(batch_id) + ".catalog.alleles.tsv.gz", "rb")
-    else:
-        fh = open(path + "batch_" + str(batch_id) + ".catalog.alleles.tsv", "r")
-
-    out_fh = open(out_path + "batch_" + str(batch_id) + ".catalog.alleles.tsv", "w")
-
-    for line in fh:
-        if line[0] == "#":
-            out_fh.write(line)
-            continue
-
-	if len(line) == 0:
-            continue
-
-	parts     = line.split("\t")
-        cat_locus = int(parts[2])
-
-        if cat_locus in alns_written:
-            out_fh.write(line)
-
-    fh.close()
-    out_fh.close()
-
-    return len(alns_written)
-
-
-parse_command_line()
-
-files = []
-alns  = {}
-
-find_stacks_files(path, files)
-parse_catalog_alignments(aln_path, alns)
-
-i  = 1
-for file in files:
-    print >> sys.stderr, "Processing file", str(i), "of", len(files), "['" +  file + "']"
-    cnt = convert_sample(path, file, out_path, alns)
-    print >> sys.stderr, "  Added alignments for", cnt, "loci."
-    i += 1
-
-#
-# Now process the catalog.
-#
-print >> sys.stderr, "Processing the catalog"
-cnt = convert_catalog(path, batch_id, out_path, alns)
-print >> sys.stderr, "  Added alignments for", cnt, "catalog loci."


=====================================
scripts/stacks-dist-extract
=====================================
@@ -3,27 +3,40 @@
 usage="\
 Usage:
   $(basename "$0") DIST_FILE
-  $(basename "$0") DIST_FILE SECTION_NAME
+  $(basename "$0") [--pretty] DIST_FILE SECTION_NAME
 
 (1) Prints available sections/distributions for DIST_FILE.
 (2) Extracts section SECTION_NAME from DIST_FILE.
 "
 
-if { [[ $# -ne 1 ]] && [[ $# -ne 2 ]]; } || [[ "$1" =~ ^(-h|--help|--version)$ ]] ;then
+if { [[ $# -ne 1 ]] && [[ $# -ne 2 ]] && [[ $# -ne 3 ]]; } || [[ "$1" =~ ^(-h|--help|--version)$ ]] ;then
     echo -n "$usage" >&2
     exit 1
 fi
 
-dist_f="$1"
+if [[ "$1" =~ ^--pretty$ ]]; then 
+    dist_f="$2"
+    sect="$3"
+else
+    dist_f="$1"
+    sect="$2"
+fi
+
 [[ -e "$dist_f" ]] || { ls -- "$dist_f"; exit 1; }
 
-if [[ "$2" ]] ;then
-    section="$(echo "$2" | grep -oE '\w+' | tr -d '\n')"
+if [[ $# -ge 2 ]] ;then
+    section="$(echo "$sect" | grep -oE '\w+' | tr -d '\n')"
     if ! grep --quiet "^BEGIN $section\\b" "$dist_f" ;then
         echo "Error: Couldn't find section '$section' in '$dist_f'." >&1
         exit 1
     fi
-    sed -nE "/^END $section($| )/ q; /^BEGIN $section($| )/,$ p" "$dist_f" | sed 1d
+
+    if [[ "$1" =~ ^--pretty$ ]]; then 
+	sed -nE "/^END $section($| )/ q; /^BEGIN $section($| )/,$ p" "$dist_f" | sed 1d | sed -nE "/^#/p"
+	sed -nE "/^END $section($| )/ q; /^BEGIN $section($| )/,$ p" "$dist_f" | sed 1d | grep -v "^#" | column -s "	" -t
+    else
+	sed -nE "/^END $section($| )/ q; /^BEGIN $section($| )/,$ p" "$dist_f" | sed 1d
+    fi
 else
     grep ^BEGIN "$dist_f" | cut -c7-
 fi


=====================================
src/bootstrap.cc
=====================================
@@ -149,7 +149,8 @@ BSArchive::parse_archive()
 
     cerr << "Parsing bootstrap archive file to obtain values for resampling...";
 
-    char line[max_len];
+    char *line = (char *) malloc(sizeof(char) * max_len);
+    int   size = max_len;
     memset(line, '\0', max_len);
     vector<string> parts;
 
@@ -159,7 +160,7 @@ BSArchive::parse_archive()
     // line for each of those pops.
     //
     size_t lineno = 0;
-    fh.getline(line, max_len);
+    read_line(fh, &line, &size);
     lineno++;
     int pop_cnt = is_integer(line);
     if (pop_cnt < 1) {
@@ -168,7 +169,7 @@ BSArchive::parse_archive()
     }
 
     for (int i = 0; i < pop_cnt; i++) {
-	fh.getline(line, max_len);
+	read_line(fh, &line, &size);
 	lineno++;
 
 	//
@@ -204,7 +205,7 @@ BSArchive::parse_archive()
     //
     // Parse the body of the file. 
     //
-    while (fh.getline(line, max_len)) {
+    while (read_line(fh, &line, &size)) {
 	lineno++;
 	
         size_t len = strlen(line);
@@ -351,6 +352,9 @@ BSArchive::parse_archive()
 		 << this->_nuc_div_stats[this->_div_key[i][j]].size() << " sets of nucleotide-based divergence statistics read.\n";
 	}
     cerr << "\n";
+
+    fh.close();
+    free(line);
     
     return 0;
 }


=====================================
src/clean.h
=====================================
@@ -131,7 +131,7 @@ public:
     {
         return (lhs.se == rhs.se && lhs.pe == rhs.pe);
     }
-    friend ofstream& operator<<(ofstream &out, const BarcodePair &bp)
+    friend ostream& operator<<(ostream &out, const BarcodePair &bp)
     {
         if (bp.pe.length() > 0)
             out << bp.se << "-" << bp.pe;
@@ -286,6 +286,41 @@ public:
     }
 };
 
+class ReadComp {
+public:
+    Seq         *s;
+    RawRead     *r;
+    BarcodePair *bc;
+
+    ReadComp(Seq *s, RawRead *r, BarcodePair *bc) {
+	this->s  = s;
+	this->r  = r;
+	this->bc = bc;
+    }
+    void destroy() {
+	delete this->s;
+	delete this->r;
+	delete this->bc;
+    }
+};
+
+class ReadCompPair {
+public:
+    Seq         *s_1;
+    Seq         *s_2;
+    RawRead     *r_1;
+    RawRead     *r_2;
+    BarcodePair *bc;
+
+    ReadCompPair(Seq *s_1, Seq *s_2, RawRead *r_1, RawRead *r_2, BarcodePair *bc) {
+	this->s_1 = s_1;
+	this->s_2 = s_2;
+	this->r_1 = r_1;
+	this->r_2 = r_2;
+	this->bc  = bc;
+    }
+};
+
 int  parse_illumina_v1(const char *);
 int  parse_illumina_v2(const char *);
 int  parse_input_record(Seq *, RawRead *);


=====================================
src/export_formats.cc
=====================================
@@ -1,6 +1,6 @@
 // -*-mode:c++; c-style:k&r; c-basic-offset:4;-*-
 //
-// Copyright 2012-2019, Julian Catchen <jcatchen at illinois.edu>
+// Copyright 2012-2022, Julian Catchen <jcatchen at illinois.edu>
 //
 // This file is part of Stacks.
 //
@@ -1207,7 +1207,7 @@ OrderableExport::write_batch(const vector<LocBin*> &loci)
         //
         vector<NucTally *> sites;
         OLocTally<NucTally> ord;
-        ord.order(sites, loci);
+        ord.order(sites, loci, true);
 
         map<size_t, size_t> key;
         map<size_t, size_t>::iterator key_it = key.begin();
@@ -1246,6 +1246,48 @@ OrderableExport::write_batch(const vector<LocBin*> &loci)
     return 0;
 }
 
+
+int
+OrderableAllExport::write_batch(const vector<LocBin*> &loci)
+{
+    if (ordered_export) {
+        //
+        // We need to order nucleotide positions (fixed or variable) to take into account overlapping loci.
+        //
+        vector<NucTally *> sites;
+        OLocTally<NucTally> ord;
+        ord.order(sites, loci, false);
+
+        map<size_t, size_t> key;
+        map<size_t, size_t>::iterator key_it = key.begin();
+
+        for (size_t k = 0; k < loci.size(); k++)
+            key_it = key.insert(key_it, pair<size_t, size_t>(loci[k]->cloc->id, k));
+
+        for (uint pos = 0; pos < sites.size(); pos++) {
+            int loc_id = sites[pos]->loc_id;
+
+            const LocBin *loc = loci[key[loc_id]];
+            size_t        col = sites[pos]->col;
+
+            this->write_site(loc->cloc, loc->s, loc->d, col);
+        }
+
+    } else {
+        for (uint k = 0; k < loci.size(); k++) {
+            const LocBin  *loc  = loci[k];
+            const CSLocus *cloc = loc->cloc;
+
+            for (uint col = 0; col < cloc->len; col++) {
+
+                this->write_site(cloc, loc->s, loc->d, col);
+            }
+        }
+    }
+
+    return 0;
+}
+
 int
 GenePopExport::open(const MetaPopInfo *mpopi)
 {
@@ -2728,6 +2770,151 @@ VcfHapsExport::write_batch(const vector<LocBin*>& loci)
     return 0;
 }
 
+int
+VcfAllSitesExport::open(const MetaPopInfo *mpopi)
+{
+    this->_path = out_path + out_prefix + ".all.vcf";
+    cout << "All fixed and variable sites will be written in VCF format to '" << this->_path << "'\n";
+
+    this->_mpopi = mpopi;
+
+    VcfHeader header;
+    //
+    // Check for the existence of a contigs definition file, include them in the VCF header if found.
+    //
+    const vector<Contig> &contigs = this->_mpopi->contigs();
+    if (contigs.size() > 0)
+	header.add_std_meta_with_contigs(contigs);
+    else
+	header.add_std_meta();
+
+    header.add_meta(VcfMeta::predefs::info_loc_strand);
+    for(size_t s : this->_mpopi->sample_indexes_orig_order())
+        header.add_sample(this->_mpopi->samples()[s].name);
+
+    this->_writer = new VcfWriter(this->_path, move(header));
+
+    return 0;
+}
+
+int
+VcfAllSitesExport::write_site(const CSLocus* cloc,
+			      const LocPopSum* psum,
+			      Datum const*const* d,
+			      size_t col)
+{
+    const LocTally* t = psum->meta_pop();
+
+    // Check that a call was made for this site before proceeding with output.
+    if (!Nt4(t->nucs[col].p_allele).is_acgt())
+	return 0;
+
+    VcfRecord rec;
+    string id = to_string(cloc->id);
+    id += ":";
+    id += to_string(col);
+    if (loci_ordered) {
+	rec.append_chrom(string(cloc->loc.chr()));
+	rec.append_pos(cloc->sort_bp(col));
+	id += ':';
+	id += (cloc->loc.strand == strand_plus ? '+' : '-');
+    } else {
+	rec.append_chrom(to_string(cloc->id));
+	rec.append_pos(col);
+    }
+    rec.append_id(id);
+
+    if (t->nucs[col].fixed == true) {
+	const char ref = t->nucs[col].p_allele;
+
+	rec.append_allele(Nt2(ref));
+	rec.append_qual();
+	rec.append_filters("PASS");
+	rec.append_info(string("NS=") + to_string(t->nucs[col].num_indv));
+	rec.append_info(string("AF=") + "1.0");
+	rec.append_format("GT");
+
+	for (size_t s : _mpopi->sample_indexes_orig_order()) {
+	    stringstream sample;
+
+	    if (d[s] == NULL || col >= uint(d[s]->len) || d[s]->model[col] == 'U') {
+	        // Data does not exist.
+	        sample << "./.";
+	    } else {
+		sample << "0/0";
+	    }
+	    
+	    rec.append_sample(sample.str());
+	}
+	
+    } else {
+	// What is our index into the haplotype for this column?
+	int index = -1;
+	for (size_t n = 0; n < cloc->snps.size(); n++)
+	    if (cloc->snps[n]->col == col) {
+		index = (int) n;
+		break;
+	    }
+	assert(index >= 0);
+	
+	const char ref = t->nucs[col].p_allele;
+	const char alt = t->nucs[col].q_allele;
+	char freq_alt[32];
+	sprintf(freq_alt, "%0.3f", 1 - t->nucs[col].p_freq);
+
+	rec.append_allele(Nt2(cloc->loc.strand == strand_plus ? ref : reverse(ref)));
+	rec.append_allele(Nt2(cloc->loc.strand == strand_plus ? alt : reverse(alt)));
+	rec.append_qual();
+	rec.append_filters("PASS");
+	rec.append_info(string("NS=") + to_string(t->nucs[col].num_indv));
+	rec.append_info(string("AF=") + freq_alt);
+	rec.append_format("GT");
+	rec.append_format("DP");
+	rec.append_format("AD");
+	rec.append_format("GQ");
+	rec.append_format("GL");
+
+	const vector<Nt2> alleles {Nt2(ref), Nt2(alt)};
+	for (size_t s : _mpopi->sample_indexes_orig_order()) {
+	    stringstream sample;
+
+	    if (d[s] == NULL || col >= uint(d[s]->len) || d[s]->model[col] == 'U') {
+	        // Data does not exist.
+	        sample << "./.";
+	    } else {
+	        if (d[s]->model[col] == 'O') {
+	            assert(d[s]->obshap.size() == 1
+	                   || (d[s]->obshap.size() == 2 && d[s]->obshap[0][index] == d[s]->obshap[1][index]));
+	            sample << (d[s]->obshap[0][index] == ref ? "0/0" : "1/1");
+	        } else {
+	            assert(d[s]->model[col] == 'E');
+	            assert(d[s]->obshap.size() == 2 && d[s]->obshap[0][index] != d[s]->obshap[1][index]);
+	            sample << "0/1";
+	        }
+	    
+	        // DP.
+	        sample << ":" << d[s]->snpdata[index].tot_depth;
+	        // AD.
+	        if (d[s]->snpdata[index].nt_depths.sum() > 0)
+	            sample << ":" << d[s]->snpdata[index].nt_depths[Nt2(ref)]
+	                   << "," << d[s]->snpdata[index].nt_depths[Nt2(alt)];
+	        else
+	            sample << ":.";
+	        // GQ.
+	        assert(d[s]->snpdata[index].gq != UINT8_MAX);
+	        sample << ':' << int(d[s]->snpdata[index].gq);
+	        // // GL.
+	        sample << ':' << VcfRecord::util::fmt_gt_gl(alleles, d[s]->snpdata[index].gtliks);
+	    }
+	    rec.append_sample(sample.str());
+	}
+    }
+
+    this->_writer->write_record(rec);
+
+    return 0;
+}
+
 int
 PlinkExport::open(const MetaPopInfo *mpopi)
 {
@@ -2804,7 +2991,7 @@ PlinkExport::write_batch(const vector<LocBin*> &loci)
         //
         vector<NucTally *> sites;
         OLocTally<NucTally> ord;
-        ord.order(sites, loci);
+        ord.order(sites, loci, true);
 
         map<size_t, size_t> key;
         map<size_t, size_t>::iterator key_it = key.begin();


=====================================
src/export_formats.h
=====================================
@@ -1,6 +1,6 @@
 // -*-mode:c++; c-style:k&r; c-basic-offset:4;-*-
 //
-// Copyright 2012-2021, Julian Catchen <jcatchen at illinois.edu>
+// Copyright 2012-2022, Julian Catchen <jcatchen at illinois.edu>
 //
 // This file is part of Stacks.
 //
@@ -60,6 +60,9 @@ class Export {
     static int transpose(ifstream &ifh, vector<string> &transposed);
 };
 
+//
+// For exporting ordered, variable sites.
+//
 class OrderableExport : public Export {
  public:
     OrderableExport() {}
@@ -70,6 +73,19 @@ class OrderableExport : public Export {
     virtual int write_site(const CSLocus* cloc, const LocPopSum* psum, Datum const*const* datums, size_t col, size_t index) = 0;
 };
 
+//
+// For exporting ordered, variable AND fixed sites.
+//
+class OrderableAllExport : public Export {
+ public:
+    OrderableAllExport() {}
+    virtual ~OrderableAllExport() {}
+    int write_batch(const vector<LocBin*>& loci);
+
+ protected:
+    virtual int write_site(const CSLocus* cloc, const LocPopSum* psum, Datum const*const* datums, size_t col) = 0;
+};
+
 class GenPos {
  public:
     uint     id;
@@ -451,6 +467,21 @@ class VcfHapsExport: public Export {
     int write_header() { return 0; }
 };
 
+class VcfAllSitesExport: public OrderableAllExport {
+    const MetaPopInfo*_mpopi;
+    VcfWriter* _writer;
+
+ public:
+    VcfAllSitesExport() : _mpopi(NULL), _writer(NULL) {}
+    ~VcfAllSitesExport() { delete this->_writer; }
+    int open(const MetaPopInfo *mpopi);
+    int write_header() { return 0; }
+
+private:
+    int write_site(const CSLocus* cloc, const LocPopSum* psum, Datum const*const* datums, size_t col);
+
+};
+
 class TreemixExport: public OrderableExport {
     const MetaPopInfo*_mpopi;
     ofstream _writer;


=====================================
src/ordered.h
=====================================
@@ -1,6 +1,6 @@
 // -*-mode:c++; c-style:k&r; c-basic-offset:4;-*-
 //
-// Copyright 2014-2021, Julian Catchen <jcatchen at illinois.edu>
+// Copyright 2014-2022, Julian Catchen <jcatchen at illinois.edu>
 //
 // This file is part of Stacks.
 //
@@ -48,7 +48,7 @@ public:
     Ordered(): incompatible_sites(0), total_sites(0), uniq_sites(0), multiple_sites(0) {}
     virtual ~Ordered() {}
 
-    int init_sites(vector<StatT *> &, map<uint, uint> &, const vector<LocBin *> &);
+    int init_sites(vector<StatT *> &, map<uint, uint> &, const vector<LocBin *> &, bool);
     int init_sites(vector<StatT *> &, map<uint, uint> &, const vector<LocBin *> &, uint);
     int init_sites(vector<StatT *> &, map<uint, uint> &, const vector<LocBin *> &, uint, uint);
     int init_haplotypes(vector<StatT *> &, map<uint, uint> &, const vector<LocBin *> &);
@@ -56,7 +56,7 @@ public:
 
 template<class StatT>
 int
-Ordered<StatT>::init_sites(vector<StatT *> &sites, map<uint, uint> &sites_key, const vector<LocBin *> &sorted_loci)
+Ordered<StatT>::init_sites(vector<StatT *> &sites, map<uint, uint> &sites_key, const vector<LocBin *> &sorted_loci, bool het_sites_only)
 {
     const CSLocus  *loc;
     const LocTally *ltally;
@@ -71,8 +71,12 @@ Ordered<StatT>::init_sites(vector<StatT *> &sites, map<uint, uint> &sites_key, c
         ltally = sorted_loci[pos]->s->meta_pop();
 
         for (uint k = 0; k < loc->len; k++) {
-            if (ltally->nucs[k].allele_cnt == 2)
-                bps.push_back(ltally->nucs[k].bp);
+            if (het_sites_only) {
+		if (ltally->nucs[k].allele_cnt == 2)
+		    bps.push_back(ltally->nucs[k].bp);
+	    } else {
+		bps.push_back(ltally->nucs[k].bp);
+	    }	
         }
     }
 
@@ -390,12 +394,12 @@ public:
         this->log_fh = NULL;
     }
 
-    int order(vector<StatT *> &, const vector<LocBin *> &);
+    int order(vector<StatT *> &, const vector<LocBin *> &, bool);
 };
 
 template<class StatT>
 int
-OLocTally<StatT>::order(vector<StatT *> &sites, const vector<LocBin *> &sorted_loci)
+OLocTally<StatT>::order(vector<StatT *> &sites, const vector<LocBin *> &sorted_loci, bool het_sites_only)
 {
     this->incompatible_sites = 0;
     this->total_sites        = 0;
@@ -403,7 +407,7 @@ OLocTally<StatT>::order(vector<StatT *> &sites, const vector<LocBin *> &sorted_l
 
     map<uint, uint> sites_key;
 
-    this->init_sites(sites, sites_key, sorted_loci);
+    this->init_sites(sites, sites_key, sorted_loci, het_sites_only);
 
     this->uniq_sites = sites_key.size();
 
@@ -419,7 +423,9 @@ OLocTally<StatT>::order(vector<StatT *> &sites, const vector<LocBin *> &sorted_l
         ltally = sorted_loci[pos]->s->meta_pop();
 
         for (uint k = 0; k < loc->len; k++) {
-            if (ltally->nucs[k].allele_cnt != 2) continue;
+            if (het_sites_only &&
+		ltally->nucs[k].allele_cnt != 2)
+		continue;
 
             assert(sites_key.count(ltally->nucs[k].bp) > 0);
             this->total_sites++;


=====================================
src/populations.cc
=====================================
@@ -1,6 +1,6 @@
 // -*-mode:c++; c-style:k&r; c-basic-offset:4;-*-
 //
-// Copyright 2012-2021, Julian Catchen <jcatchen at illinois.edu>
+// Copyright 2012-2022, Julian Catchen <jcatchen at illinois.edu>
 //
 // This file is part of Stacks.
 //
@@ -131,26 +131,6 @@ try {
     //
     parse_command_line(argc, argv);
 
-    // NucTally *n = new NucTally();
-    // n->optimize();
-    // delete n;
-    //exit(1);
-    
-    // SumStat *s = new SumStat();
-    // s->optimize();
-    // delete s;
-    // exit(1);
-
-    // LocSum *ls = new LocSum(144);
-    // ls->optimize(144);
-    // delete ls;
-    // exit(1);
-
-    // PopPair *p = new PopPair();
-    // p->optimize();
-    // delete p;
-    // exit(1);
-
     //
     // Open and initialize the log file.
     //
@@ -518,6 +498,11 @@ BatchLocusProcessor::init(string in_path, string pmap_path)
     else
         this->init_stacks_loci(in_path, pmap_path);
 
+    //
+    // Initialize the CatalogDists object to record the number of present/absent sample/locus/snp counts.
+    //
+    this->_dists.init(this->_mpopi);
+
     //
     // Check that the number of samples does not exceed the maximum we can handle
     // without overflowing our internal data structures.
@@ -642,8 +627,8 @@ BatchLocusProcessor::init_stacks_loci(string in_path, string pmap_path)
 
     // Create the population map or check that all samples have data.
     if (pmap_path.empty()) {
-        cout << "No population map specified, using all samples...\n";
         this->_mpopi->init_names(this->cloc_reader().header().samples());
+
     } else {
         size_t n_samples_before = this->_mpopi->n_samples();
         this->_mpopi->intersect_with(this->cloc_reader().header().samples());
@@ -744,7 +729,7 @@ BatchLocusProcessor::next_batch_stacks_loci(ostream &log_fh)
         //
         // Apply locus & SNP filters.
         //
-        this->_dists.accumulate_pre_filtering(loc->sample_cnt, loc->cloc);
+        this->_dists.accumulate_pre_filtering(loc);
         this->_loc_filter.whitelist_snp_filter(*loc);
         if (this->_loc_filter.apply_filters_stacks(*loc, log_fh, *this->_mpopi)) {
             delete loc;
@@ -937,7 +922,7 @@ BatchLocusProcessor::next_batch_external_loci(ostream &log_fh)
         //
         // Apply filters.
         //
-        this->_dists.accumulate_pre_filtering(loc->sample_cnt, loc->cloc);
+        this->_dists.accumulate_pre_filtering(loc);
         if (this->_loc_filter.apply_filters_external(*loc, log_fh, *this->_mpopi)) {
             delete loc;
             continue;
@@ -1623,27 +1608,93 @@ LocusFilter::filter_haps(LocBin& loc, const MetaPopInfo& mpopi, ostream &log_fh)
 }
 
 int
-CatalogDists::accumulate_pre_filtering(const size_t sample_cnt, const CSLocus *loc)
+CatalogDists::init(const MetaPopInfo *mpopi)
 {
-    size_t missing;
+    assert(mpopi != NULL);
+
+    this->_pop_cnt    = mpopi->pops().size();
+    this->_sample_cnt = mpopi->n_samples();
+
+    assert(this->_pop_cnt > 0);
+    assert(this->_sample_cnt > 0);
+
+    this->_pop_order = new size_t [this->_pop_cnt];
+    this->_samples   = new size_t [this->_sample_cnt];
+
+    this->_pre_loci_pres_per_sample  = new size_t [this->_sample_cnt];
+    this->_pre_loci_miss_per_sample  = new size_t [this->_sample_cnt];
+	
+    this->_sites_pres_per_sample = new size_t [this->_sample_cnt];
+    this->_sites_miss_per_sample = new size_t [this->_sample_cnt];
+    this->_loci_pres_per_sample  = new size_t [this->_sample_cnt];
+    this->_loci_miss_per_sample  = new size_t [this->_sample_cnt];
+
+    memset(this->_pre_loci_pres_per_sample,  0, this->_sample_cnt * sizeof(size_t));
+    memset(this->_pre_loci_miss_per_sample,  0, this->_sample_cnt * sizeof(size_t));
+
+    memset(this->_sites_pres_per_sample, 0, this->_sample_cnt * sizeof(size_t));
+    memset(this->_sites_miss_per_sample, 0, this->_sample_cnt * sizeof(size_t));
+    memset(this->_loci_pres_per_sample,  0, this->_sample_cnt * sizeof(size_t));
+    memset(this->_loci_miss_per_sample,  0, this->_sample_cnt * sizeof(size_t));
+
+    this->_pre_total_loci = 0;
+    this->_total_loci     = 0;
+    this->_total_sites    = 0;
+
+    size_t pop_sthg = 0;
+
+    for (size_t i_pop = 0; i_pop < mpopi->pops().size(); ++i_pop) {
+	const Pop& pop = mpopi->pops()[i_pop];
+
+	for (uint i = pop.first_sample; i <= pop.last_sample; i++)
+	    this->_samples[i] = pop_sthg;
+	this->_pop_order[pop_sthg] = i_pop;
 
-    if (this->_pre_valid.count(loc->cnt) == 0)
-        this->_pre_valid[loc->cnt] = 1;
+	pop_sthg++;
+    }
+    return 0;
+}
+
+int
+CatalogDists::accumulate_pre_filtering(const LocBin *locbin)
+{
+    Datum **d;
+    size_t  missing;
+
+    const CSLocus *cloc = locbin->cloc;
+    const size_t   sample_cnt = locbin->sample_cnt;
+
+    if (this->_pre_valid.count(cloc->cnt) == 0)
+        this->_pre_valid[cloc->cnt] = 1;
     else
-        this->_pre_valid[loc->cnt]++;
+        this->_pre_valid[cloc->cnt]++;
 
-    missing = sample_cnt - loc->cnt;
+    missing = sample_cnt - cloc->cnt;
 
     if (this->_pre_absent.count(missing) == 0)
         this->_pre_absent[missing] = 1;
     else
         this->_pre_absent[missing]++;
 
-    if (this->_pre_snps_per_loc.count(loc->snps.size()) == 0)
-        this->_pre_snps_per_loc[loc->snps.size()] = 1;
+    if (this->_pre_snps_per_loc.count(cloc->snps.size()) == 0)
+        this->_pre_snps_per_loc[cloc->snps.size()] = 1;
     else
-        this->_pre_snps_per_loc[loc->snps.size()]++;
+        this->_pre_snps_per_loc[cloc->snps.size()]++;
+
+    //
+    // Record which individuals are missing loci and/or SNPs.
+    //
+    d = locbin->d;
 
+    for (uint j = 0; j < this->_sample_cnt; j++) {
+	if (d[j] == NULL) {
+	    this->_pre_loci_miss_per_sample[j]++;
+	    continue;
+	}
+	this->_pre_loci_pres_per_sample[j]++;
+    }
+    this->_pre_total_loci++;
+    
     return 0;
 }
 
@@ -1735,7 +1786,7 @@ CatalogDists::write_results(ostream &log_fh, const MetaPopInfo *mpopi)
 
     begin_section("missing_samples_per_loc_prefilters");
     log_fh << "# Distribution of missing samples for each catalog locus prior to filtering.\n"
-           << "# Absent samples at locus\tCount\n";
+           << "Absent samples at locus\tCount\n";
     for (cnt_it = this->_pre_absent.begin(); cnt_it != this->_pre_absent.end(); cnt_it++)
         log_fh << cnt_it->first << "\t" << cnt_it->second << "\n";
     end_section();
@@ -1756,7 +1807,7 @@ CatalogDists::write_results(ostream &log_fh, const MetaPopInfo *mpopi)
 
     begin_section("missing_samples_per_loc_postfilters");
     log_fh << "# Distribution of missing samples for each catalog locus after filtering.\n"
-           << "# Absent samples at locus\tCount\n";
+           << "Absent samples at locus\tCount\n";
     for (cnt_it = this->_post_absent.begin(); cnt_it != this->_post_absent.end(); cnt_it++)
         log_fh << cnt_it->first << "\t" << cnt_it->second << "\n";
     end_section();
@@ -1769,7 +1820,19 @@ CatalogDists::write_results(ostream &log_fh, const MetaPopInfo *mpopi)
     end_section();
 
     const vector<Sample> &s = mpopi->samples();
-    
+
+    begin_section("loci_per_sample_prefilters");
+    log_fh << "# Number of loci per individual sample (before filtering).\n"
+           << "sample\tn_loci\tpresent_loci\tmissing_loci\tfrequency_missing\n";
+    for (uint i = 0; i < this->_sample_cnt; i++)
+        log_fh << s[i].name << "\t"
+               << this->_pre_total_loci << "\t"
+               << this->_pre_loci_pres_per_sample[i] << "\t"
+               << this->_pre_loci_miss_per_sample[i] << "\t"
+               << fixed << setprecision(4)
+               << (double) this->_pre_loci_miss_per_sample[i] / (double) this->_pre_total_loci << "\n";
+    end_section();
+
     begin_section("loci_per_sample");
     log_fh << "# Number of loci per individual sample (after filtering).\n"
            << "sample\tn_loci\tpresent_loci\tmissing_loci\tfrequency_missing\n";
@@ -3574,6 +3637,7 @@ parse_command_line(int argc, char* argv[])
             {"quiet",             no_argument,       NULL, 'q'},
             {"verbose",           no_argument,       NULL, 'd'},
             {"vcf",               no_argument,       NULL, 1004},
+	    {"vcf-all",           no_argument,       NULL, 1009},
             {"fasta-loci",        no_argument,       NULL, 1006},
             {"fasta-samples",     no_argument,       NULL, 'J'},
             {"fasta-samples-raw", no_argument,       NULL, 'F'},
@@ -3829,6 +3893,9 @@ parse_command_line(int argc, char* argv[])
             add_export<VcfExport>();
             add_export<VcfHapsExport>();
             break;
+	case 1009: // --vcf-all
+	    add_export<VcfAllSitesExport>();
+	    break;
         case 1006: // --fasta-loci
             add_export<FastaLociExport>();
             break;
@@ -4135,11 +4202,10 @@ void help() {
          << "  --fasta-loci: output locus consensus sequences in FASTA format.\n"
          << "  --fasta-samples: output the sequences of the two haplotypes of each (diploid) sample, for each locus, in FASTA format.\n"
          << "  --vcf: output SNPs and haplotypes in Variant Call Format (VCF).\n"
+	 << "  --vcf-all: output all sites in Variant Call Format (VCF).\n"
          << "  --genepop: output SNPs and haplotypes in GenePop format.\n"
          << "  --structure: output results in Structure format.\n"
          << "  --radpainter: output results in fineRADstructure/RADpainter format.\n"
-         // << "  --phase*: output genotypes in PHASE format.\n"
-         // << "  --fastphase*: output genotypes in fastPHASE format.\n"
          << "  --plink: output genotypes in PLINK format.\n"
          << "  --hzar: output genotypes in Hybrid Zone Analysis using R (HZAR) format.\n"
          << "  --phylip: output nucleotides that are fixed-within, and variant among populations in Phylip format for phylogenetic tree construction.\n"


=====================================
src/populations.h
=====================================
@@ -106,8 +106,10 @@ class CatalogDists {
     map<size_t, size_t> _pre_snps_per_loc, _post_snps_per_loc;
     
     size_t _pop_cnt, _sample_cnt;
+    size_t _pre_total_loci;
     size_t _total_sites, _total_loci;
     size_t *_pop_order, *_samples;
+    size_t *_pre_loci_pres_per_sample,  *_pre_loci_miss_per_sample;
     size_t *_sites_pres_per_sample, *_sites_miss_per_sample;
     size_t *_loci_pres_per_sample,  *_loci_miss_per_sample;
 
@@ -118,55 +120,27 @@ public:
         this->_pop_order  = NULL; // The array order of each population.
         this->_samples    = NULL; // Which population each sample belongs to.
 
+        this->_pre_loci_pres_per_sample  = NULL; // Array of samples tallying # loci present
+        this->_pre_loci_miss_per_sample  = NULL; // Array of samples tallying # loci missing
+
         this->_sites_pres_per_sample = NULL; // Array of samples tallying # variant sites present
         this->_sites_miss_per_sample = NULL; // Array of samples tallying # variant sites missing
         this->_loci_pres_per_sample  = NULL; // Array of samples tallying # loci present
         this->_loci_miss_per_sample  = NULL; // Array of samples tallying # loci missing
 
-        this->_total_loci  = 0;
-        this->_total_sites = 0;
-    }
-    CatalogDists(const MetaPopInfo *mpopi) {
-        assert(mpopi != NULL);
-
-        this->_pop_cnt    = mpopi->pops().size();
-        this->_sample_cnt = mpopi->n_samples();
-
-        assert(this->_pop_cnt > 0);
-        assert(this->_sample_cnt > 0);
-
-        this->_pop_order = new size_t [this->_pop_cnt];
-        this->_samples   = new size_t [this->_sample_cnt];
-        this->_sites_pres_per_sample = new size_t [this->_sample_cnt];
-        this->_sites_miss_per_sample = new size_t [this->_sample_cnt];
-        this->_loci_pres_per_sample  = new size_t [this->_sample_cnt];
-        this->_loci_miss_per_sample  = new size_t [this->_sample_cnt];
-
-        memset(this->_sites_pres_per_sample, 0, this->_sample_cnt * sizeof(size_t));
-        memset(this->_sites_miss_per_sample, 0, this->_sample_cnt * sizeof(size_t));
-        memset(this->_loci_pres_per_sample,  0, this->_sample_cnt * sizeof(size_t));
-        memset(this->_loci_miss_per_sample,  0, this->_sample_cnt * sizeof(size_t));
-        
-        this->_total_loci  = 0;
-        this->_total_sites = 0;
-
-        size_t pop_sthg = 0;
-
-        for (size_t i_pop = 0; i_pop < mpopi->pops().size(); ++i_pop) {
-            const Pop& pop = mpopi->pops()[i_pop];
-
-            for (uint i = pop.first_sample; i <= pop.last_sample; i++)
-                this->_samples[i] = pop_sthg;
-            this->_pop_order[pop_sthg] = i_pop;
-
-            pop_sthg++;
-        }
+        this->_pre_total_loci = 0;
+        this->_total_loci     = 0;
+        this->_total_sites    = 0;
     }
     ~CatalogDists() {
         if (this->_pop_order != NULL)
             delete [] this->_pop_order;
         if (this->_samples != NULL)
             delete [] this->_samples;
+        if (this->_pre_loci_pres_per_sample != NULL)
+            delete [] this->_pre_loci_pres_per_sample;
+        if (this->_pre_loci_miss_per_sample != NULL)
+            delete [] this->_pre_loci_miss_per_sample;
         if (this->_sites_pres_per_sample != NULL)
             delete [] this->_sites_pres_per_sample;
         if (this->_sites_miss_per_sample != NULL)
@@ -180,7 +154,8 @@ public:
     CatalogDists(const CatalogDists &) = delete;
     CatalogDists(CatalogDists &&) = delete;
 
-    int accumulate_pre_filtering(const size_t, const CSLocus *);
+    int init(const MetaPopInfo *mpopi);
+    int accumulate_pre_filtering(const LocBin *);
     int accumulate(const vector<LocBin *> &);
     int write_results(ostream &log_fh, const MetaPopInfo *mpopi);
 };
@@ -332,15 +307,15 @@ public:
     BatchLocusProcessor():
 	_sig_hwe_dev(NULL),
         _input_mode(InputMode::stacks2), _user_supplied_whitelist(false), _batch_size(0), _batch_num(0),
-        _mpopi(NULL), _next_loc(NULL), _unordered_bp(0) {}
+        _mpopi(NULL), _next_loc(NULL), _dists(), _unordered_bp(0) {}
     BatchLocusProcessor(InputMode mode, size_t batch_size, MetaPopInfo *popi):
 	_sig_hwe_dev(NULL),
         _input_mode(mode), _user_supplied_whitelist(false), _batch_size(batch_size), _batch_num(0),
-        _mpopi(popi), _next_loc(NULL), _dists(popi), _unordered_bp(0) {}
+        _mpopi(popi), _next_loc(NULL), _dists(), _unordered_bp(0) {}
     BatchLocusProcessor(InputMode mode, size_t batch_size):
 	_sig_hwe_dev(NULL),
         _input_mode(mode), _user_supplied_whitelist(false), _batch_size(batch_size), _batch_num(0),
-        _mpopi(NULL), _next_loc(NULL), _unordered_bp(0) {}
+        _mpopi(NULL), _next_loc(NULL), _dists(), _unordered_bp(0) {}
     ~BatchLocusProcessor() {
         for (uint i = 0; i < this->_loci.size(); i++)
             delete this->_loci[i];
@@ -428,10 +403,8 @@ int     tabulate_haplotypes(map<int, CSLocus *> &, PopMap<CSLocus> *);
 int     tabulate_locus_haplotypes(CSLocus *, Datum **, int);
 int     create_genotype_map(CSLocus *, Datum **, int);
 int     call_population_genotypes(CSLocus *, Datum **, int);
-int     translate_genotypes(map<string, string> &, map<string, map<string, string> > &, map<int, CSLocus *> &, PopMap<CSLocus> *, map<int, string> &, set<int> &); // This function doesn't exist (March 24, 2016)
+
 int     correct_fst_bonferroni_win(vector<PopPair *> &);
-int     bootstrap_fst_approximate_dist(vector<double> &, vector<int>  &, double *, int *, map<int, vector<double> > &); // not used (March 23, 2016)
-int     bootstrap_popstats_approximate_dist(vector<double> &, vector<double> &, vector<int>  &, double *, int *, int, map<int, vector<double> > &, map<int, vector<double> > &); // not used (March 23, 2016)
 double  bootstrap_approximate_pval(int, double, map<int, vector<double> > &);
 
 bool hap_compare(const pair<string,int>&, const pair<string,int>&);


=====================================
src/process_radtags.cc
=====================================
@@ -1,6 +1,6 @@
 // -*-mode:c++; c-style:k&r; c-basic-offset:4;-*-
 //
-// Copyright 2011-2021, Julian Catchen <jcatchen at illinois.edu>
+// Copyright 2011-2022, Julian Catchen <jcatchen at illinois.edu>
 //
 // This file is part of Stacks.
 //
@@ -68,7 +68,12 @@ int      barcode_dist_2  = -1;
 double   win_size        = 0.15;
 uint     score_limit     = 10;
 uint     len_limit       = 0;
-uint     num_threads     = 1;
+
+uint     num_threads        = 1;
+int      num_input_threads  = 1;
+int      num_output_threads = 0;
+int      num_worker_threads = 0;
+int      max_records_queued = 0;
 
 //
 // How to shift FASTQ-encoded quality scores from ASCII down to raw scores
@@ -159,44 +164,79 @@ int main (int argc, char* argv[]) {
     else
         open_files(files, barcodes, pair_1_fhs, pair_2_fhs, rem_1_fhs, rem_2_fhs, counters);
 
+    //
+    // Set the number of OpenMP parallel threads to execute -- minimum of 3 threads required.
+    //
+    #ifdef _OPENMP
+    if (num_threads >= 3) {	
+	if (num_threads > max_threads) {
+	    num_threads = max_threads;
+	    cerr << "Warning: too many threads requested and performance will be negatively affected; reducing thread count to " << num_threads << ".\n\n";
+	}
+	omp_set_num_threads(num_threads);
+	allocate_threads();
+    } else if (num_threads > 1) {
+	cerr << "Warning: minimum number of threads required is 3; reverting to single-threaded mode.\n";
+    }
+    #endif
+    
     int result = 1;
 
     for (uint i = 0; i < files.size(); i++) {
         cerr << "Processing file " << i+1 << " of " << files.size() << " [" << files[i].first.c_str() << "]\n";
 
-        counters[files[i].first]["total"]        = 0;
-        counters[files[i].first]["ill_filtered"] = 0;
-        counters[files[i].first]["adapter"]      = 0;
-        counters[files[i].first]["low_quality"]  = 0;
-        counters[files[i].first]["noradtag"]     = 0;
-        counters[files[i].first]["ambiguous"]    = 0;
-        counters[files[i].first]["retained"]     = 0;
-        counters[files[i].first]["recovered"]    = 0;
-	counters[files[i].first]["transposed"]   = 0;
+	initialize_counters(counters[files[i].first]);
+
+	if (num_threads >= 3) {
+	    if (paired) {
+		if (out_file_type == FileT::gzfastq || out_file_type == FileT::gzfasta)
+                    result = process_paired_reads_parallel(files[i].first, files[i].second,
+							   se_bc, pe_bc,
+							   pair_1_gzfhs, pair_2_gzfhs, rem_1_gzfhs, rem_2_gzfhs,
+							   counters[files[i].first], barcode_log);
+		else
+		    result = process_paired_reads_parallel(files[i].first, files[i].second,
+							   se_bc, pe_bc,
+							   pair_1_fhs, pair_2_fhs, rem_1_fhs, rem_2_fhs,
+							   counters[files[i].first], barcode_log);
+	    } else {
+		if (out_file_type == FileT::gzfastq || out_file_type == FileT::gzfasta)
+                    result = process_reads_parallel(files[i].first,
+						    se_bc, pe_bc,
+						    pair_1_gzfhs,
+						    counters[files[i].first], barcode_log);
+		else
+		    result = process_reads_parallel(files[i].first,
+						    se_bc, pe_bc,
+						    pair_1_fhs,
+						    counters[files[i].first], barcode_log);
+	    }
 
-        if (paired) {
-            if (out_file_type == FileT::gzfastq || out_file_type == FileT::gzfasta)
+	} else {
+	    if (paired) {
+		if (out_file_type == FileT::gzfastq || out_file_type == FileT::gzfasta)
                     result = process_paired_reads(files[i].first, files[i].second,
-                                              se_bc, pe_bc,
-                                              pair_1_gzfhs, pair_2_gzfhs, rem_1_gzfhs, rem_2_gzfhs,
-                                              counters[files[i].first], barcode_log);
-            else
-                result = process_paired_reads(files[i].first, files[i].second,
-                                              se_bc, pe_bc,
-                                              pair_1_fhs, pair_2_fhs, rem_1_fhs, rem_2_fhs,
-                                              counters[files[i].first], barcode_log);
-        } else {
-            if (out_file_type == FileT::gzfastq || out_file_type == FileT::gzfasta)
+						  se_bc, pe_bc,
+						  pair_1_gzfhs, pair_2_gzfhs, rem_1_gzfhs, rem_2_gzfhs,
+						  counters[files[i].first], barcode_log);
+		else
+		    result = process_paired_reads(files[i].first, files[i].second,
+						  se_bc, pe_bc,
+						  pair_1_fhs, pair_2_fhs, rem_1_fhs, rem_2_fhs,
+						  counters[files[i].first], barcode_log);
+	    } else {
+		if (out_file_type == FileT::gzfastq || out_file_type == FileT::gzfasta)
                     result = process_reads(files[i].first,
-                                       se_bc, pe_bc,
-                                       pair_1_gzfhs,
-                                       counters[files[i].first], barcode_log);
-            else
-                result = process_reads(files[i].first,
-                                       se_bc, pe_bc,
-                                       pair_1_fhs,
-                                       counters[files[i].first], barcode_log);
-        }
+					   se_bc, pe_bc,
+					   pair_1_gzfhs,
+					   counters[files[i].first], barcode_log);
+		else
+		    result = process_reads(files[i].first,
+					   se_bc, pe_bc,
+					   pair_1_fhs,
+					   counters[files[i].first], barcode_log);
+	    }
+	}
 
         cerr <<        "  "
              << counters[files[i].first]["total"] << " total reads; ";
@@ -217,7 +257,7 @@ int main (int argc, char* argv[]) {
         }
     }
 
-    cerr << "Closing files, flushing buffers...\n";
+    cerr << "Closing files, flushing buffers...";
     if (out_file_type == FileT::gzfastq || out_file_type == FileT::gzfasta) {
         close_file_handles(pair_1_gzfhs);
         if (paired) {
@@ -233,13 +273,102 @@ int main (int argc, char* argv[]) {
             close_file_handles(pair_2_fhs);
         }
     }
-
+    cerr << "done.\n";
+    
     print_results(argc, argv, barcodes, counters, barcode_log);
 
+    cerr << "process_radtags is done.\n";
+    
     return 0;
     IF_NDEBUG_CATCH_ALL_EXCEPTIONS
 }
 
+int
+allocate_threads()
+{
+    if (num_threads <= 4)
+	num_output_threads = 1;
+    else if (num_threads <= 6)
+	num_output_threads = 2;
+    else if (num_threads <= 8)
+	num_output_threads = 3;
+    else
+	num_output_threads = 4;
+
+    num_worker_threads = num_threads - num_input_threads - num_output_threads;
+
+    max_records_queued = 1000 * num_worker_threads;
+    
+    cerr << "Setting number of threads to " << num_threads << "; "
+	 << num_input_threads << " reader thread; "
+	 << num_output_threads << " writer threads; "
+	 << num_worker_threads << " worker threads.\n";
+
+    return 0;
+}
+
+int
+initialize_counters(map<string, long> &counter)
+{
+    counter["total"]        = 0;
+    counter["ill_filtered"] = 0;
+    counter["adapter"]      = 0;
+    counter["low_quality"]  = 0;
+    counter["noradtag"]     = 0;
+    counter["ambiguous"]    = 0;
+    counter["retained"]     = 0;
+    counter["recovered"]    = 0;
+    counter["transposed"]   = 0;
+
+    return 0;
+}
+
+int
+consolidate_counters(map<string, long> &counter,
+		     vector<map<string, long>> &local_counters)
+{
+    for (uint j = 0; j < local_counters.size(); j++) {
+    	map<string, long> &cntr = local_counters[j];
+
+	counter["total"]        += cntr["total"];
+	counter["ill_filtered"] += cntr["ill_filtered"];
+	counter["adapter"]      += cntr["adapter"];
+	counter["low_quality"]  += cntr["low_quality"];
+	counter["noradtag"]     += cntr["noradtag"];
+	counter["ambiguous"]    += cntr["ambiguous"];
+	counter["retained"]     += cntr["retained"];
+	counter["recovered"]    += cntr["recovered"];
+	counter["transposed"]   += cntr["transposed"];
+    }
+
+    return 0;
+}
+
+int
+consolidate_barcode_logs(map<BarcodePair, map<string, long>> &barcode_log,
+			 vector<map<BarcodePair, map<string, long>>> &local_barcode_logs)
+{
+    for (uint j = 0; j < local_barcode_logs.size(); j++) {
+    	map<BarcodePair, map<string, long>> &blog = local_barcode_logs[j];
+
+    	for (auto it = blog.begin(); it != blog.end(); it++) {
+	    BarcodePair bc = it->first;
+	    if (barcode_log.count(bc) == 0) {
+		barcode_log[bc]["noradtag"] = 0;
+		barcode_log[bc]["total"]    = 0;
+		barcode_log[bc]["low_qual"] = 0;
+		barcode_log[bc]["retained"] = 0;
+	    }
+	    barcode_log[bc]["noradtag"] += it->second["noradtag"];
+	    barcode_log[bc]["total"]    += it->second["total"];
+	    barcode_log[bc]["low_qual"] += it->second["low_qual"];
+	    barcode_log[bc]["retained"] += it->second["retained"];
+	}
+    }
+
+    return 0;
+}
+
 template <typename fhType>
 int
 process_paired_reads(string prefix_1,
@@ -287,28 +416,28 @@ process_paired_reads(string prefix_1,
         discard_fh_1 = new ofstream(path_1.c_str(), ifstream::out);
 
         if (discard_fh_1->fail()) {
-            cerr << "Error opening discard output file '" << path_1 << "'\n";
+            cerr << "Error opening single-end discard output file '" << path_1 << "'\n";
             exit(1);
         }
 
         path_2 = out_path + prefix_2 + ".discards";
         discard_fh_2 = new ofstream(path_2.c_str(), ifstream::out);
 
-        if (discard_fh_1->fail()) {
-            cerr << "Error opening discard output file '" << path_2 << "'\n";
+        if (discard_fh_2->fail()) {
+            cerr << "Error opening paired-end discard output file '" << path_2 << "'\n";
             exit(1);
         }
     }
 
     //
-    // Read in the first record, initializing the Seq object s. Then
-    // initialize the Read object r, then loop, using the same objects.
+    // Read in the first pair of records, initializing the Seq object s_1 and s_2. Then
+    // initialize the Read objects r_1 and r_2, then loop, using the same objects.
     //
     Seq *s_1 = fh_1->next_seq();
     Seq *s_2 = fh_2->next_seq();
     if (s_1 == NULL || s_2 == NULL) {
-        cerr << "Attempting to read first pair of input records, unable to allocate "
-             << "Seq object (Was the correct input type specified?).\n";
+        cerr << "Error: a failure occurred while attempting to read first pair of input sequences "
+             << "(was the correct input type specified?).\n";
         exit(1);
     }
 
@@ -329,12 +458,21 @@ process_paired_reads(string prefix_1,
     if (max_bc_size_1 == 0)
         bc.set(prefix_1, prefix_2);
 
-    long i = 1;
+    long i        = 1;
+    int  line_len = 33;
+    char numstr[id_len];
 
-    cerr << "  Processing RAD-Tags...";
+    cerr << "  Processing pairs of RAD-Tags...";
     do {
-        if (i % 1000000 == 0)
-            cerr << i/1000000 << "M...";
+        if (i % 1000000 == 0) {
+	    sprintf(numstr, "%ldM...", i/1000000);
+	    line_len += strlen(numstr);
+            cerr << numstr;
+	    if (line_len > 80) {
+		cerr << "\n";
+		line_len = 0;
+	    }
+	}
 
         parse_input_record(s_1, r_1);
         parse_input_record(s_2, r_2);
@@ -454,7 +592,7 @@ process_paired_reads(string prefix_1,
                 write_fasta(discard_fh_2, s_2);
 
         if (!result_1 || !result_2) {
-            cerr << "Error writing to discard file for '" << bc.str() << "'\n";
+            cerr << "Error writing to discard file\n";
             return_val = -1;
             break;
         }
@@ -481,6 +619,449 @@ process_paired_reads(string prefix_1,
     return return_val;
 }
 
+template <typename fhType>
+int
+process_paired_reads_parallel(string prefix_1,
+			      string prefix_2,
+			      set<string> &se_bc, set<string> &pe_bc,
+			      map<BarcodePair, fhType *> &pair_1_fhs,
+			      map<BarcodePair, fhType *> &pair_2_fhs,
+			      map<BarcodePair, fhType *> &rem_1_fhs,
+			      map<BarcodePair, fhType *> &rem_2_fhs,
+			      map<string, long> &counter,
+			      map<BarcodePair, map<string, long> > &barcode_log) {
+    Input    *fh_1=NULL, *fh_2=NULL;
+    RawRead   *r_1=NULL,  *r_2=NULL;
+    ofstream *discard_fh_1=NULL, *discard_fh_2=NULL;
+
+    int return_val = 1;
+
+    string path_1 = in_path_1 + prefix_1;
+    string path_2 = in_path_2 + prefix_2;
+
+    if (interleaved)
+        cerr << "  Reading data from:\n  " << path_1 << "\n";
+    else
+        cerr << "  Reading data from:\n  " << path_1 << " and\n  " << path_2 << "\n";
+
+    if (in_file_type == FileT::fastq) {
+        fh_1 = new Fastq(path_1.c_str());
+        fh_2 = interleaved ? fh_1 : new Fastq(path_2.c_str());
+    } else if (in_file_type == FileT::gzfastq) {
+        fh_1 = new GzFastq(path_1.c_str());
+        fh_2 = interleaved ? fh_1 : new GzFastq(path_2.c_str());
+    } else if (in_file_type == FileT::bam) {
+        fh_1 = new BamUnAln(path_1.c_str());
+        fh_2 = fh_1;
+    } else if (in_file_type == FileT::bustard) {
+        fh_1 = new Bustard(path_1.c_str());
+        fh_2 = interleaved ? fh_1 : new Bustard(path_2.c_str());
+    }
+
+    //
+    // Open a file for recording discarded reads
+    //
+    if (discards) {
+        path_1 = out_path + prefix_1 + ".discards";
+        discard_fh_1 = new ofstream(path_1.c_str(), ifstream::out);
+
+        if (discard_fh_1->fail()) {
+            cerr << "Error opening single-end discard output file '" << path_1 << "'\n";
+            exit(1);
+        }
+
+        path_2 = out_path + prefix_2 + ".discards";
+        discard_fh_2 = new ofstream(path_2.c_str(), ifstream::out);
+
+        if (discard_fh_2->fail()) {
+            cerr << "Error opening paired-end discard output file '" << path_2 << "'\n";
+            exit(1);
+        }
+    }
+
+    //
+    // Set len_limit so that if we encounter reads already shorter than truncate_seq limit
+    // they will be discarded.
+    //
+    if (truncate_seq > 0)
+        len_limit = truncate_seq;
+
+    //
+    // The reader thread will read from the current input file and store them in a set of input buffer
+    // queue objects (one per worker thread). The worker threads will process reads from their input
+    // queue and store them in a corresponding output buffer queue object. The writer threads will
+    // pull reads from the ouput queues and write them to disk.
+    //
+    vector<queue<pair<Seq *, Seq *>>> input_buffer(num_worker_threads, queue<pair<Seq *, Seq *>>());
+    vector<queue<ReadCompPair>>      output_buffer(num_worker_threads, queue<ReadCompPair>());
+
+    //
+    // Create a set of locks, to protect the input and output queues, one for each worker thread.
+    //
+    omp_lock_t inpbuf_lock[num_worker_threads];
+    omp_lock_t outbuf_lock[num_worker_threads];
+    for (int j = 0; j < num_worker_threads; j++) {
+	omp_init_lock(&inpbuf_lock[j]);
+	omp_init_lock(&outbuf_lock[j]);
+    }
+
+    map<BarcodePair, int> output_thread_map;
+    uint k = 0;
+    for (auto it = pair_1_fhs.begin(); it != pair_1_fhs.end(); it++) {
+	output_thread_map[it->first] = (k % num_output_threads) + num_input_threads;
+	k++;
+    }
+    
+    //
+    // Create a set of local counters for each worker thread. These will have to be
+    // consolidated after processing the file.
+    //
+    vector<map<BarcodePair, map<string, long>>> local_barcode_log;
+    vector<map<string, long>> local_counters;
+    for (int j = 0; j < num_worker_threads; j++) {
+	local_barcode_log.push_back(map<BarcodePair, map<string, long>>());
+	local_counters.push_back(map<string, long>());
+    }
+
+    //
+    // Read in the first pair of records, initializing the Seq object s_1 and s_2. Then
+    // initialize the Read objects r_1 and r_2, then loop, using the same objects.
+    //
+    Seq *s_1 = fh_1->next_seq();
+    Seq *s_2 = fh_2->next_seq();
+    if (s_1 == NULL || s_2 == NULL) {
+        cerr << "Error: a failure occurred while attempting to read first pair of input sequences "
+             << "(was the correct input type specified?).\n";
+        exit(1);
+    }
+
+    long records_read         = 0;
+    long records_processed    = 0;
+    long records_written      = 0;
+    bool stop                 = false;
+    uint seq_init_len_1       = strlen(s_1->seq);
+    uint seq_init_len_2       = strlen(s_2->seq);
+
+    int  logger_thread = num_input_threads;
+    long prev_log      = 0;
+    int  line_len      = 33;
+    char numstr[id_len];
+    
+    cerr << "  Processing pairs of RAD-Tags...";
+
+    #pragma omp parallel shared(records_read, records_processed, records_written, stop, inpbuf_lock, outbuf_lock)
+    {
+	int thread_no = omp_get_thread_num();
+	
+	if (thread_no == 0) {
+	    //
+	    // Thread 0 is our reader thread. Read and fill the buffer anywhere there is a NULL pointer.
+	    //
+	    uint worker_thread_index = 0;
+
+	    do {
+		omp_set_lock(&inpbuf_lock[worker_thread_index]);
+		input_buffer[worker_thread_index].push(make_pair(s_1, s_2));
+		omp_unset_lock(&inpbuf_lock[worker_thread_index]);
+		
+		worker_thread_index++;
+		worker_thread_index = worker_thread_index % num_worker_threads;
+		records_read++;
+
+		while (records_read - records_written > max_records_queued) {
+		    #pragma omp taskyield
+                    #pragma omp flush(records_written)		    
+		}
+	    } while ((s_1 = fh_1->next_seq()) != NULL &&
+		     (s_2 = fh_2->next_seq()) != NULL);
+	    
+	    stop = true;
+
+	} else if (thread_no > 0 && thread_no < (num_input_threads + num_output_threads)) {
+	    //
+	    // These are our writer threads. It writes any processed read to the appropriate output file(s).
+	    //
+	    uint worker_thread_index = 0;
+	    RawRead      *r_1, *r_2;
+	    Seq          *s_1, *s_2;
+	    BarcodePair *bc;
+
+	    while (stop == false || records_written < records_read) {
+		
+		//
+		// Attempt to get the lock on this output buffer. If the lock is already held by another thread
+		// move to the next output buffer.
+		//
+		if (omp_test_lock(&outbuf_lock[worker_thread_index]) == false) {
+		    worker_thread_index++;
+		    worker_thread_index = worker_thread_index % num_worker_threads;
+		    continue;
+		} else {
+		    if (output_buffer[worker_thread_index].size() == 0) {
+			omp_unset_lock(&outbuf_lock[worker_thread_index]);
+			worker_thread_index++;
+			worker_thread_index = worker_thread_index % num_worker_threads;
+			continue;
+		    }
+
+		    ReadCompPair &rc = output_buffer[worker_thread_index].front();
+
+		    //
+		    // Does this barcode combination belong to this output thread? If not, move to the next output buffer.
+		    //
+		    if (output_thread_map.count(*rc.bc) > 0 &&
+			output_thread_map.at(*rc.bc)   != thread_no) {
+			omp_unset_lock(&outbuf_lock[worker_thread_index]);
+			worker_thread_index++;
+			worker_thread_index = worker_thread_index % num_worker_threads;
+			continue;
+		    }
+
+		    //
+		    // Found a record to output for this writer thread.
+		    //
+		    r_1 = rc.r_1;
+		    r_2 = rc.r_2;
+		    s_1 = rc.s_1;
+		    s_2 = rc.s_2;
+		    bc  = rc.bc;
+		    output_buffer[worker_thread_index].pop();
+
+		    omp_unset_lock(&outbuf_lock[worker_thread_index]);
+		}
+		
+		int result_1 = 1;
+		int result_2 = 1;
+
+		if (r_1->retain && r_2->retain) {
+		    if (retain_header) {
+			result_1 = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+			    write_fastq(pair_1_fhs[*bc], s_1, r_1) :
+			    write_fasta(pair_1_fhs[*bc], s_1, r_1);
+			result_2 = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+			    write_fastq(pair_2_fhs[*bc], s_2, r_2) :
+			    write_fasta(pair_2_fhs[*bc], s_2, r_2);
+		    } else {
+			result_1 = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+			    write_fastq(pair_1_fhs[*bc], r_1, overhang) :
+			    write_fasta(pair_1_fhs[*bc], r_1, overhang);
+			result_2 = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+			    write_fastq(pair_2_fhs[*bc], r_2, overhang) :
+			    write_fasta(pair_2_fhs[*bc], r_2, overhang);
+		    }
+		} else if (r_1->retain && !r_2->retain) {
+		    //
+		    // Write to the remainder file.
+		    //
+		    if (retain_header)
+			result_1 = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+			    write_fastq(rem_1_fhs[*bc], s_1, r_1) :
+			    write_fasta(rem_1_fhs[*bc], s_1, r_1);
+		    else
+			result_1 = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+			    write_fastq(rem_1_fhs[*bc], r_1, overhang) :
+			    write_fasta(rem_1_fhs[*bc], r_1, overhang);
+
+		} else if (!r_1->retain && r_2->retain) {
+		    //
+		    // Write to the remainder file.
+		    //
+		    if (retain_header)
+			result_2 = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+			    write_fastq(rem_2_fhs[*bc], s_2, r_2) :
+			    write_fasta(rem_2_fhs[*bc], s_2, r_2);
+		    else
+			result_2 = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+			    write_fastq(rem_2_fhs[*bc], r_2, overhang) :
+			    write_fasta(rem_2_fhs[*bc], r_2, overhang);
+		}
+
+		if (!result_1 || !result_2) {
+		    cerr << "Error writing to output file for '" << bc->str() << "'\n";
+		    return_val = -1;
+		    break;
+		}
+
+		if (discards && !r_1->retain)
+		    result_1 = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+			write_fastq(discard_fh_1, s_1) :
+			write_fasta(discard_fh_1, s_1);
+		if (discards && !r_2->retain)
+		    result_2 = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+			write_fastq(discard_fh_2, s_2) :
+			write_fasta(discard_fh_2, s_2);
+
+		if (!result_1 || !result_2) {
+		    cerr << "Error writing to discard file for '" << bc->str() << "'\n";
+		    return_val = -1;
+		    break;
+		}
+
+		delete r_1;
+		delete r_2;
+		delete s_1;
+		delete s_2;
+		delete bc;
+
+		worker_thread_index++;
+		worker_thread_index = worker_thread_index % num_worker_threads;
+
+		#pragma omp atomic
+		records_written++;
+
+		if (thread_no == logger_thread) {
+		    if (int(records_written / 1000000) > prev_log) {
+			prev_log = int(records_written/1000000);
+			sprintf(numstr, "%dM...", int(records_written/1000000));
+			line_len += strlen(numstr);
+			cerr << numstr;
+			if (line_len > 80) {
+			    cerr << "\n";
+			    line_len = 0;
+			}
+		    }
+		}
+	    }
+	    
+	} else {
+	    //
+	    // All other threads process individual reads.
+	    //
+	    Seq         *s_1, *s_2;
+	    RawRead     *r_1, *r_2;
+	    BarcodePair *bc;
+	    uint                       thread_index         = thread_no - num_input_threads - num_output_threads;
+	    queue<pair<Seq *, Seq *>> &thread_input_buffer  = input_buffer[thread_index];
+	    omp_lock_t                &thread_inpbuf_lock   = inpbuf_lock[thread_index];
+	    queue<ReadCompPair>       &thread_output_buffer = output_buffer[thread_index];
+	    omp_lock_t                &thread_outbuf_lock   = outbuf_lock[thread_index];
+
+	    map<BarcodePair, map<string, long>> &blog = local_barcode_log[thread_index];
+	    map<string, long> &cntr = local_counters[thread_index];
+
+	    while (stop == false || records_processed < records_read) {
+		
+		while (stop == false && thread_input_buffer.size() == 0) {
+		    #pragma omp flush(stop)
+		}
+
+		if (stop && thread_input_buffer.size() == 0)
+		    break;
+		    
+		omp_set_lock(&thread_inpbuf_lock);
+		s_1 = thread_input_buffer.front().first;
+		s_2 = thread_input_buffer.front().second;
+		thread_input_buffer.pop();
+		omp_unset_lock(&thread_inpbuf_lock);
+		
+		r_1 = new RawRead(seq_init_len_1, 1, min_bc_size_1, win_size);
+		r_2 = new RawRead(seq_init_len_2, 2, min_bc_size_2, win_size);
+
+		parse_input_record(s_1, r_1);
+		parse_input_record(s_2, r_2);
+		cntr["total"] += 2;
+
+		//
+		// If a BestRAD protocol was used, check which read the barcode/restriction cutsite is on
+		// and transpose the reads if necessary.
+		//
+		if (bestrad)
+		    if (check_for_transposed_reads(r_1, r_2, renz_1)) {
+			transpose_reads(&r_1, &r_2);
+			cntr["transposed"]++;
+		    }
+
+		bc = new BarcodePair();
+		//
+		// If no barcodes were specified, set the barcode object to be the input file name so
+		// that reads are written to an output file of the same name as the input file.
+		//
+		if (max_bc_size_1 == 0)
+		    bc->set(prefix_1, prefix_2);
+		
+		switch (barcode_type) {
+		case inline_null:
+		case index_null:
+		case null_index:
+		    bc->set(r_1->se_bc);
+		    break;
+		case index_inline:
+		case inline_index:
+		case inline_inline:
+		case index_index:
+		    bc->set(r_1->se_bc, r_2->pe_bc);
+		    break;
+		case null_null:
+		default:
+		    break;
+		}
+
+		process_barcode(r_1, r_2, *bc, pair_1_fhs, se_bc, pe_bc, blog, cntr);
+
+		//
+		// Adjust the size of the read to accommodate truncating the sequence and variable
+		// barcode lengths. With standard Illumina data we want to output constant length
+		// reads even as the barcode size may change. Other technologies, like IonTorrent
+		// need to be truncated uniformly.
+		//
+		if (truncate_seq > 0) {
+		    if (truncate_seq + r_1->inline_bc_len <= r_1->len)
+			r_1->set_len(truncate_seq + r_1->inline_bc_len);
+		    if (truncate_seq + r_2->inline_bc_len <= r_2->len)
+			r_2->set_len(truncate_seq + r_2->inline_bc_len);
+		} else {
+		    if (barcode_type == inline_null || barcode_type == inline_inline || barcode_type == inline_index)
+			r_1->set_len(r_1->len - (max_bc_size_1 - r_1->inline_bc_len));
+		    if (barcode_type == index_inline || barcode_type == inline_inline)
+			r_2->set_len(r_2->len - (max_bc_size_2 - r_2->inline_bc_len));
+		}
+
+		if (r_1->retain)
+		    process_singlet(r_1, renz_1, false, blog[*bc], cntr);
+		if (r_2->retain)
+		    process_singlet(r_2, renz_2, true, blog[*bc], cntr);
+
+		//
+		// Store the result in the output buffer for writing.
+		//
+		omp_set_lock(&thread_outbuf_lock);
+		thread_output_buffer.push(ReadCompPair(s_1, s_2, r_1, r_2, bc));
+		omp_unset_lock(&thread_outbuf_lock);
+		
+		#pragma omp atomic
+		records_processed++;
+
+                #pragma omp flush(records_read, stop)
+	    }
+	}
+    }
+    cerr << "\n";
+
+    for (int j = 0; j < num_worker_threads; j++) {
+	omp_destroy_lock(&inpbuf_lock[j]);
+	omp_destroy_lock(&outbuf_lock[j]);
+    }
+
+    //
+    // Consolidate counters and barcode logs from the worker threads.
+    //
+    consolidate_barcode_logs(barcode_log, local_barcode_log);
+    consolidate_counters(counter, local_counters);
+
+    if (discards) {
+        delete discard_fh_1;
+        delete discard_fh_2;
+    }
+
+    delete fh_1;
+    if (interleaved == false) delete fh_2;
+
+    delete r_1;
+    delete r_2;
+
+    return return_val;
+}
+
 template <typename fhType>
 int
 process_reads(string prefix,
@@ -488,7 +1069,7 @@ process_reads(string prefix,
               map<BarcodePair, fhType *> &pair_1_fhs,
               map<string, long> &counter,
               map<BarcodePair, map<string, long> > &barcode_log) {
-    Input *fh=NULL;
+    Input   *fh=NULL;
     RawRead  *r;
     ofstream *discard_fh=NULL;
 
@@ -518,24 +1099,24 @@ process_reads(string prefix,
         }
     }
 
+    //
+    // Set len_limit so that if we encounter reads already shorter than truncate_seq limit
+    // they will be discarded.
+    //
+    if (truncate_seq > 0)
+        len_limit = truncate_seq;
+
     //
     // Read in the first record, initializing the Seq object s. Then
     // initialize the Read object r, then loop, using the same objects.
     //
     Seq *s = fh->next_seq();
     if (s == NULL) {
-        cerr << "Attempting to read first input record, unable to allocate "
-             << "Seq object (Was the correct input type specified?).\n";
+        cerr << "Error: a failure occurred while attempting to read the first input sequence "
+             << "(was the correct input type specified?).\n";
         exit(1);
     }
 
-    //
-    // Set len_limit so that if we encounter reads already shorter than truncate_seq limit
-    // they will be discarded.
-    //
-    if (truncate_seq > 0)
-        len_limit = truncate_seq;
-
     r = new RawRead(strlen(s->seq), 1, min_bc_size_1, win_size);
 
     BarcodePair bc;
@@ -548,11 +1129,21 @@ process_reads(string prefix,
 
     //cerr << "Length: " << r->len << "; Window length: " << r->win_len << "; Stop position: " << r->stop_pos << "\n";
 
-    long i = 1;
+    long i        = 1;
+    int  line_len = 24;
+    char numstr[id_len];
+    
     cerr << "  Processing RAD-Tags...";
     do {
-        if (i % 1000000 == 0)
-            cerr << i/1000000 << "M...";
+        if (i % 1000000 == 0) {
+	    sprintf(numstr, "%ldM...", i/1000000);
+	    line_len += strlen(numstr);
+            cerr << numstr;
+	    if (line_len > 80) {
+		cerr << "\n";
+		line_len = 0;
+	    }
+	}
 
         counter["total"]++;
 
@@ -577,7 +1168,7 @@ process_reads(string prefix,
             if (truncate_seq + r->inline_bc_len <= r->len)
                 r->set_len(truncate_seq + r->inline_bc_len);
         } else {
-            if (barcode_type == inline_null || barcode_type == inline_inline ||        barcode_type == inline_index)
+            if (barcode_type == inline_null || barcode_type == inline_inline || barcode_type == inline_index)
                 r->set_len(r->len - (max_bc_size_1 - r->inline_bc_len));
         }
 
@@ -632,6 +1223,354 @@ process_reads(string prefix,
     return return_val;
 }
 
+template <typename fhType>
+int
+process_reads_parallel(string prefix,
+		       set<string> &se_bc, set<string> &pe_bc,
+		       map<BarcodePair, fhType *> &pair_1_fhs,
+		       map<string, long> &counter,
+		       map<BarcodePair, map<string, long>> &barcode_log)
+{
+    Input    *fh         = NULL;
+    ofstream *discard_fh = NULL;
+
+    int return_val = 1;
+
+    string path = in_path_1 + prefix;
+
+    if (in_file_type == FileT::fastq)
+        fh = new Fastq(path.c_str());
+    else if (in_file_type == FileT::gzfastq)
+        fh = new GzFastq(path.c_str());
+    else if (in_file_type == FileT::bam)
+        fh = new BamUnAln(path.c_str());
+    else if (in_file_type == FileT::bustard)
+        fh = new Bustard(path.c_str());
+
+    //
+    // Open a file for recording discarded reads
+    //
+    if (discards) {
+        path = out_path + prefix + ".discards";
+        discard_fh = new ofstream(path.c_str(), ifstream::out);
+
+        if (discard_fh->fail()) {
+            cerr << "Error opening discard output file '" << path << "'\n";
+            exit(1);
+        }
+    }
+
+    //
+    // Set len_limit so that if we encounter reads already shorter than truncate_seq limit
+    // they will be discarded.
+    //
+    if (truncate_seq > 0)
+        len_limit = truncate_seq;
+
+    //
+    // The reader thread will read from the current input file and store them in a set of input buffer
+    // queue objects (one per worker thread). The worker threads will process reads from their input
+    // queue and store them in a corresponding output buffer queue object. The writer threads will
+    // pull reads from the ouput queues and write them to disk.
+    //
+    vector<queue<Seq *>>     input_buffer(num_worker_threads, queue<Seq *>());
+    vector<queue<ReadComp>> output_buffer(num_worker_threads, queue<ReadComp>());
+
+    //
+    // Create a set of locks, to protect the input and output queues, one for each worker thread.
+    //
+    omp_lock_t inpbuf_lock[num_worker_threads];
+    omp_lock_t outbuf_lock[num_worker_threads];
+    for (int j = 0; j < num_worker_threads; j++) {
+	omp_init_lock(&inpbuf_lock[j]);
+	omp_init_lock(&outbuf_lock[j]);
+    }
+
+    map<BarcodePair, int> output_thread_map;
+    uint k = 0;
+    for (auto it = pair_1_fhs.begin(); it != pair_1_fhs.end(); it++) {
+	output_thread_map[it->first] = (k % num_output_threads) + num_input_threads;
+	k++;
+    }
+    
+    //
+    // Create a set of local counters for each worker thread. These will have to be
+    // consolidated after processing the file.
+    //
+    vector<map<BarcodePair, map<string, long>>> local_barcode_log;
+    vector<map<string, long>> local_counters;
+    for (int j = 0; j < num_worker_threads; j++) {
+	local_barcode_log.push_back(map<BarcodePair, map<string, long>>());
+	local_counters.push_back(map<string, long>());
+    }
+    
+    //
+    // Read in the first record, initializing the Seq object s, checking that we can parse the file.
+    //
+    Seq *s = fh->next_seq();
+    if (s == NULL) {
+        cerr << "Error: a failure occurred while attempting to read the first input sequence "
+             << "(was the correct input type specified?).\n";
+        exit(1);
+    }
+
+    long records_read         = 0;
+    long records_processed    = 0;
+    long records_written      = 0;
+    bool stop                 = false;
+    uint seq_init_len         = strlen(s->seq);
+
+    int  logger_thread = num_input_threads;
+    long prev_log      = 0;
+    int  line_len      = 24;
+    char numstr[id_len];
+    
+    cerr << "  Processing RAD-Tags...";
+
+    #pragma omp parallel shared(records_read, records_processed, records_written, stop, inpbuf_lock, outbuf_lock)
+    {
+	int thread_no = omp_get_thread_num();
+	
+	if (thread_no == 0) {
+	    //
+	    // Thread 0 is our reader thread. Read and fill the buffer anywhere there is a NULL pointer.
+	    //
+	    uint worker_thread_index = 0;
+
+	    do {
+		omp_set_lock(&inpbuf_lock[worker_thread_index]);
+		input_buffer[worker_thread_index].push(s);
+		omp_unset_lock(&inpbuf_lock[worker_thread_index]);
+		
+		worker_thread_index++;
+		worker_thread_index = worker_thread_index % num_worker_threads;
+		records_read++;
+
+		//
+		// Throttle the input reading thread to control memory usage.
+		//
+		while (records_read - records_written > max_records_queued) {
+		    #pragma omp taskyield
+                    #pragma omp flush(records_written)		    
+		}
+	    } while ((s = fh->next_seq()) != NULL);
+	    
+	    stop = true;
+
+	} else if (thread_no > 0 && thread_no < (num_input_threads + num_output_threads)) {
+	    //
+	    // These are our writer threads. It writes any processed read to the appropriate output file(s).
+	    //
+	    uint worker_thread_index = 0;
+	    RawRead      *r;
+	    Seq          *s;
+	    BarcodePair *bc;
+
+	    while (stop == false || records_written < records_read) {
+		
+		//
+		// Attempt to get the lock on this output buffer. If the lock is already held by another thread
+		// move to the next output buffer.
+		//
+		if (omp_test_lock(&outbuf_lock[worker_thread_index]) == false) {
+		    worker_thread_index++;
+		    worker_thread_index = worker_thread_index % num_worker_threads;
+		    continue;
+		} else {
+		    if (output_buffer[worker_thread_index].size() == 0) {
+			omp_unset_lock(&outbuf_lock[worker_thread_index]);
+			worker_thread_index++;
+			worker_thread_index = worker_thread_index % num_worker_threads;
+			continue;
+		    }
+
+		    ReadComp &rc = output_buffer[worker_thread_index].front();
+
+		    //
+		    // Does this barcode combination belong to this output thread? If not, move to the next output buffer.
+		    //
+		    if (output_thread_map.count(*rc.bc) > 0 &&
+			output_thread_map.at(*rc.bc)   != thread_no) {
+			omp_unset_lock(&outbuf_lock[worker_thread_index]);
+			worker_thread_index++;
+			worker_thread_index = worker_thread_index % num_worker_threads;
+			continue;
+		    }
+
+		    //
+		    // Found a record to output for this writer thread.
+		    //
+		    r  = rc.r;
+		    s  = rc.s;
+		    bc = rc.bc;
+		    output_buffer[worker_thread_index].pop();
+
+		    omp_unset_lock(&outbuf_lock[worker_thread_index]);
+		}
+		
+		int result = 1;
+
+		if (r->retain) {
+		    if (retain_header)
+		    	result = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+		    	    write_fastq(pair_1_fhs[*bc], s, r) :
+		    	    write_fasta(pair_1_fhs[*bc], s, r);
+		    else
+			result = (out_file_type == FileT::fastq || out_file_type == FileT::gzfastq) ?
+			    write_fastq(pair_1_fhs[*bc], r, overhang) :
+			    write_fasta(pair_1_fhs[*bc], r, overhang);
+		}
+
+		if (!result) {
+		    cerr << "Error writing to output file for '" << bc->str() << "'\n";
+		    return_val = -1;
+		    break;
+		}
+
+		if (discards && !r->retain) {
+		    #pragma omp critical
+		    result = out_file_type == FileT::fastq || out_file_type == FileT::gzfastq ?
+			write_fastq(discard_fh, s) :
+			write_fasta(discard_fh, s);
+		}
+		
+		if (!result) {
+		    cerr << "Error writing to discard file for '" << bc->str() << "'\n";
+		    return_val = -1;
+		    break;
+		}
+
+		delete r;
+		delete s;
+		delete bc;
+
+		worker_thread_index++;
+		worker_thread_index = worker_thread_index % num_worker_threads;
+
+		#pragma omp atomic
+		records_written++;
+
+		if (thread_no == logger_thread) {
+		    if (int(records_written / 1000000) > prev_log) {
+			prev_log = int(records_written/1000000);
+			sprintf(numstr, "%dM...", int(records_written/1000000));
+			line_len += strlen(numstr);
+			cerr << numstr;
+			if (line_len > 80) {
+			    cerr << "\n";
+			    line_len = 0;
+			}
+		    }
+		}
+	    }
+	    
+	} else {
+	    //
+	    // All other threads process individual reads.
+	    //
+	    Seq         *s;
+	    RawRead     *r;
+	    BarcodePair *bc;
+	    uint             thread_index         = thread_no - num_input_threads - num_output_threads;
+	    queue<Seq *>    &thread_input_buffer  = input_buffer[thread_index];
+	    omp_lock_t      &thread_inpbuf_lock   = inpbuf_lock[thread_index];
+	    queue<ReadComp> &thread_output_buffer = output_buffer[thread_index];
+	    omp_lock_t      &thread_outbuf_lock   = outbuf_lock[thread_index];
+
+	    map<BarcodePair, map<string, long>> &blog = local_barcode_log[thread_index];
+	    map<string, long> &cntr = local_counters[thread_index];
+
+	    while (stop == false || records_processed < records_read) {
+		
+		while (stop == false && thread_input_buffer.size() == 0) {
+		    #pragma omp flush(stop)
+		}
+
+		if (stop && thread_input_buffer.size() == 0)
+		    break;
+		    
+		omp_set_lock(&thread_inpbuf_lock);
+		s = thread_input_buffer.front();
+		thread_input_buffer.pop();
+		omp_unset_lock(&thread_inpbuf_lock);
+		
+		r = new RawRead(seq_init_len, 1, min_bc_size_1, win_size);
+
+		cntr["total"]++;
+		
+		parse_input_record(s, r);
+
+		bc = new BarcodePair();
+		//
+		// If no barcodes were specified, set the barcode object to be the input file name so
+		// that reads are written to an output file of the same name as the input file.
+		//
+		if (max_bc_size_1 == 0)
+		    bc->set(prefix);
+		
+		if (barcode_type == inline_null ||
+		    barcode_type == index_null)
+		    bc->set(r->se_bc);
+		else if (barcode_type == index_inline ||
+			 barcode_type == inline_index)
+		    bc->set(r->se_bc, r->pe_bc);
+
+		process_barcode(r, NULL, *bc, pair_1_fhs, se_bc, pe_bc, blog, cntr);
+
+		//
+		// Adjust the size of the read to accommodate truncating the sequence and variable
+		// barcode lengths. With standard Illumina data we want to output constant length
+		// reads even as the barcode size may change. Other technologies, like IonTorrent
+		// need to be truncated uniformly.
+		//
+		if (truncate_seq > 0) {
+		    if (truncate_seq + r->inline_bc_len <= r->len)
+			r->set_len(truncate_seq + r->inline_bc_len);
+		} else {
+		    if (barcode_type == inline_null || barcode_type == inline_inline || barcode_type == inline_index)
+			r->set_len(r->len - (max_bc_size_1 - r->inline_bc_len));
+		}
+
+		if (r->retain)
+		    process_singlet(r, renz_1, false, blog[*bc], cntr);
+
+		//
+		// Store the result in the output buffer for writing.
+		//
+		omp_set_lock(&thread_outbuf_lock);
+		thread_output_buffer.push(ReadComp(s, r, bc));
+		omp_unset_lock(&thread_outbuf_lock);
+		
+		#pragma omp atomic
+		records_processed++;
+
+                #pragma omp flush(records_read, stop)
+	    }
+	}
+    }
+    cerr << "\n";
+
+    for (int j = 0; j < num_worker_threads; j++) {
+	omp_destroy_lock(&inpbuf_lock[j]);
+	omp_destroy_lock(&outbuf_lock[j]);
+    }
+
+    //
+    // Consolidate counters and barcode logs from the worker threads.
+    //
+    consolidate_barcode_logs(barcode_log, local_barcode_log);
+    consolidate_counters(counter, local_counters);
+    
+    if (discards) delete discard_fh;
+
+    //
+    // Close the file and delete the Input object.
+    //
+    delete fh;
+
+    return return_val;
+}
+
 inline
 int
 process_singlet(RawRead *href,
@@ -925,8 +1864,6 @@ print_results(int argc, char **argv,
         return 0;
     }
 
-    cerr << "Outputing details to log: '" << log_path << "'\n\n";
-
     init_log(log, argc, argv);
 
     log << "\n"
@@ -965,7 +1902,7 @@ print_results(int argc, char **argv,
             << it->second["total"]       << "\n";
     }
     log << "END per_file_raw_read_counts" << "\n";
-    
+
     map<string, long> c;
     c["total"]        = 0;
     c["low_quality"]  = 0;
@@ -995,7 +1932,8 @@ print_results(int argc, char **argv,
              << " (" << as_percentage((double) n / c["total"]) << ")\n";
     };
 
-    cerr << c["total"] << " total sequences\n";
+    cerr << "\n"
+	 << c["total"] << " total sequences\n";
     if (bestrad)
 	print_nreads(c["transposed"], "reads were transposed [based on the BestRAD flag]");
     if (filter_illumina)
@@ -1007,19 +1945,32 @@ print_results(int argc, char **argv,
     print_nreads(c["noradtag"], "RAD cutsite not found drops");
     print_nreads(c["retained"], "retained reads");
 
+    if (max_bc_size_1 > 0)
+	cerr << "\n"
+	     << "Details logged: '" << log_path << "'\n"
+	     << "For a summary, execute one or more:\n"
+	     << "  stacks-dist-extract " << log_path << " total_raw_read_counts\n"
+	     << "  stacks-dist-extract --pretty " << log_path << " per_barcode_raw_read_counts\n";
+    else
+	cerr << "\n"
+	     << "Details logged: '" << log_path << "'\n"
+	     << "For a summary, execute:\n"
+	     << "  stacks-dist-extract " << log_path << " total_raw_read_counts\n";
+
+    
     log  << "\n"
 	 << "BEGIN total_raw_read_counts" << "\n"
          << "Total Sequences\t"      << c["total"]       << "\n";
     if (bestrad)
-        log << "Transposed reads\t" << c["transposed"] << "\n";
+        log << "Transposed reads\t" << c["transposed"] << "\t" << as_percentage((double) c["transposed"] / c["total"]) << "\n";
     if (filter_illumina)
-        log << "Failed Illumina filtered reads\t" << c["ill_filtered"] << "\n";
+        log << "Failed Illumina filtered reads\t" << c["ill_filtered"] << "\t" << as_percentage((double) c["ill_filtered"] / c["total"]) << "\n";
     if (filter_adapter)
-        log << "Reads containing adapter sequence\t" << c["adapter"] << "\n";
-    log << "Barcode Not Found\t"     << c["ambiguous"]   << "\n"
-        << "Low Quality\t"           << c["low_quality"] << "\n"
-        << "RAD Cutsite Not Found\t" << c["noradtag"]    << "\n"
-        << "Retained Reads\t"        << c["retained"]    << "\n"
+        log << "Reads containing adapter sequence\t" << c["adapter"] << "\t" << as_percentage((double) c["adapter"] / c["total"]) << "\n";
+    log << "Barcode Not Found\t"     << c["ambiguous"]   << "\t" << as_percentage((double) c["ambiguous"] / c["total"]) << "\n"
+        << "Low Quality\t"           << c["low_quality"] << "\t" << as_percentage((double) c["low_quality"] / c["total"]) << "\n"
+        << "RAD Cutsite Not Found\t" << c["noradtag"]    << "\t" << as_percentage((double) c["noradtag"] / c["total"]) << "\n"
+        << "Retained Reads\t"        << c["retained"]    << "\t" << as_percentage((double) c["retained"] / c["total"]) << "\n"
 	<< "END total_raw_read_counts" << "\n";
 
     if (max_bc_size_1 == 0) return 0;
@@ -1045,7 +1996,9 @@ print_results(int argc, char **argv,
     log << "Total\t"
         << "RAD Cutsite Not Found\t"
         << "Low Quality\t"
-        << "Retained Reads\n";
+        << "Retained Reads\t"
+	<< "Pct Retained\t"
+	<< "Pct Total Reads\n";
 
     set<BarcodePair> barcode_list;
 
@@ -1062,7 +2015,9 @@ print_results(int argc, char **argv,
             log << barcode_log[barcodes[i]]["total"]    << "\t"
                 << barcode_log[barcodes[i]]["noradtag"] << "\t"
                 << barcode_log[barcodes[i]]["low_qual"] << "\t"
-                << barcode_log[barcodes[i]]["retained"] << "\n";
+                << barcode_log[barcodes[i]]["retained"] << "\t"
+		<< as_percentage((double) barcode_log[barcodes[i]]["retained"] / barcode_log[barcodes[i]]["total"]) << "\t"
+		<< as_percentage((double) barcode_log[barcodes[i]]["retained"] / c["total"]) << "\n";
     }
 
     log << "END per_barcode_raw_read_counts" << "\n"
@@ -1127,13 +2082,13 @@ int parse_command_line(int argc, char* argv[]) {
             {"inline-index",         no_argument, NULL, 'Z'}, {"inline_index",         no_argument, NULL, 'Z'},
             {"barcode-dist-1", required_argument, NULL, 'B'}, {"barcode_dist_1", required_argument, NULL, 'B'},
             {"barcode-dist-2", required_argument, NULL, 'C'}, {"barcode_dist_2", required_argument, NULL, 'C'},
-            {"infile-type",    required_argument, NULL, 'i'}, {"infile_type",    required_argument, NULL, 'i'},
-            {"outfile-type",   required_argument, NULL, 'y'}, {"outfile_type",   required_argument, NULL, 'y'},
+            {"in-type",        required_argument, NULL, 'i'}, {"infile_type",    required_argument, NULL, 'i'},
+            {"out-type",       required_argument, NULL, 'y'}, {"outfile_type",   required_argument, NULL, 'y'},
             {"file",           required_argument, NULL, 'f'},
             {"file-p1",        required_argument, NULL, '1'}, {"file_p1",        required_argument, NULL, '1'},
             {"file-p2",        required_argument, NULL, '2'}, {"file_p2",        required_argument, NULL, '2'},
-            {"path",           required_argument, NULL, 'p'},
-            {"outpath",        required_argument, NULL, 'o'},
+            {"in-path",        required_argument, NULL, 'p'},
+            {"out-path",       required_argument, NULL, 'o'},
             {"truncate",       required_argument, NULL, 't'},
             {"renz-1",         required_argument, NULL, 'e'}, {"renz_1",         required_argument, NULL, 'e'},
             {"renz-2",         required_argument, NULL, 'z'}, {"renz_2",         required_argument, NULL, 'z'},
@@ -1145,6 +2100,7 @@ int parse_command_line(int argc, char* argv[]) {
             {"adapter-1",      required_argument, NULL, 'A'}, {"adapter_1",      required_argument, NULL, 'A'},
             {"adapter-2",      required_argument, NULL, 'G'}, {"adapter_2",      required_argument, NULL, 'G'},
             {"adapter-mm",     required_argument, NULL, 'T'}, {"adapter_mm",     required_argument, NULL, 'T'},
+	    {"threads",        required_argument, NULL, 1001},
             {0, 0, 0, 0}
         };
 
@@ -1215,6 +2171,9 @@ int parse_command_line(int argc, char* argv[]) {
         case 1000:
             bestrad = true;
             break;
+	case 1001:
+	    num_threads = is_integer(optarg);
+	    break;
         case 'B':
             barcode_dist_1 = is_integer(optarg);
             break;
@@ -1434,28 +2393,27 @@ void version() {
 
 void help() {
     cerr << "process_radtags " << VERSION << "\n"
-         << "process_radtags -p in_dir [--paired [--interleaved]] [-b barcode_file] -o out_dir -e enz [-c] [-q] [-r] [-t len]\n"
-         << "process_radtags -f in_file [-b barcode_file] -o out_dir -e enz [-c] [-q] [-r] [-t len]\n"
-         << "process_radtags -1 pair_1 -2 pair_2 [-b barcode_file] -o out_dir -e enz [-c] [-q] [-r] [-t len]\n"
+	 << "process_radtags -p in_dir [-P] [-b barcode_file] -o out_dir -e enz [--threads num] [-c] [-q] [-r] [-t len]\n"
+         << "process_radtags -f in_file [-b barcode_file] -o out_dir -e enz [--threads num] [-c] [-q] [-r] [-t len]\n"
+         << "process_radtags --in-path in_dir [--paired] [--barcodes barcode_file] --out-path out_dir --renz-1 enz [--renz-2 enz] [--threads num] [-c] [-q] [-r] [-t len]\n"
+         << "process_radtags -1 pair_1 -2 pair_2 [-b barcode_file] -o out_dir -e enz [--threads num] [-c] [-q] [-r] [-t len]\n"
          << "\n"
-         << "  p: path to a directory of files.\n"
-         << "  P,--paired: files contained within the directory are paired.\n"
-         << "  I,--interleaved: specify that the paired-end reads are interleaved in single files.\n"
-         << "  i: input file type, either 'fastq', 'gzfastq' (gzipped fastq), 'bam', or 'bustard' (default: guess, or gzfastq if unable to).\n"
-         << "  b: path to a file containing barcodes for this run, omit to ignore any barcoding.\n"
-         << "  o: path to output the processed files.\n"
-         << "  f: path to the input file if processing single-end sequences.\n"
-         << "  1: first input file in a set of paired-end sequences.\n"
-         << "  2: second input file in a set of paired-end sequences.\n"
-         << "  c,--clean: clean data, remove any read with an uncalled base ('N').\n"
-         << "  q,--quality: discard reads with low quality (phred) scores.\n"
-         << "  r,--rescue: rescue barcodes and RAD-Tags.\n"
-         << "  t: truncate final read length to this value.\n"
-         << "  D: capture discarded reads to a file.\n"
-         << "  E: specify how quality scores are encoded, 'phred33' (Illumina 1.8+/Sanger, default) or 'phred64' (Illumina 1.3-1.5).\n"
-         << "  w: set the size of the sliding window as a fraction of the read length, between 0 and 1 (default 0.15).\n"
-         << "  s: set the score limit. If the average score within the sliding window drops below this value, the read is discarded (default 10).\n"
-         << "  y: output type, either 'fastq', 'gzfastq', 'fasta', or 'gzfasta' (default: match input type).\n"
+         << "  -p,--in-path: path to a directory of files.\n"
+         << "  -P,--paired: files contained within the directory are paired.\n"
+         << "  -I,--interleaved: specify that the paired-end reads are interleaved in single files.\n"
+         << "  -i,--in-type: input file type, either 'fastq', 'gzfastq' (gzipped fastq), 'bam', or 'bustard' (default: guess, or gzfastq if unable to).\n"
+         << "  -b,--barcodes: path to a file containing barcodes for this run, omit to ignore any barcoding or for already demultiplexed data.\n"
+         << "  -o,--out-path: path to output the processed files.\n"
+         << "  -f: path to the input file if processing single-end sequences.\n"
+         << "  -1: first input file in a set of paired-end sequences.\n"
+         << "  -2: second input file in a set of paired-end sequences.\n"
+	 << "  --threads: number of threads to run.\n"
+         << "  -c,--clean: clean data, remove any read with an uncalled base ('N').\n"
+         << "  -q,--quality: discard reads with low quality (phred) scores.\n"
+         << "  -r,--rescue: rescue barcodes and RAD-Tag cut sites.\n"
+         << "  -t,--truncate: truncate final read length to this value.\n"
+         << "  -D,--discards: capture discarded reads to a file.\n"
+         << "  -y,--out-type: output type, either 'fastq', 'gzfastq', 'fasta', or 'gzfasta' (default: match input type).\n"
          << "\n"
          << "  Barcode options:\n"
          << "    --inline-null:   barcode is inline with sequence, occurs only on single-end read (default).\n"
@@ -1467,8 +2425,8 @@ void help() {
          << "    --index-inline:  barcode occurs in FASTQ header (Illumina i5 or i7 read) and is inline with single-end sequence (for single-end data) on paired-end read (for paired-end data).\n"
          << "\n"
          << "  Restriction enzyme options:\n"
-         << "    -e <enz>, --renz-1 <enz>: provide the restriction enzyme used (cut site occurs on single-end read)\n"
-         << "    --renz-2 <enz>: if a double digest was used, provide the second restriction enzyme used (cut site occurs on the paired-end read).\n"
+         << "    -e [enz], --renz-1 [enz]: provide the restriction enzyme used (cut site occurs on single-end read)\n"
+         << "    --renz-2 [enz]: if a double digest was used, provide the second restriction enzyme used (cut site occurs on the paired-end read).\n"
          << "    Currently supported enzymes include:\n"
          << "      ";
 
@@ -1502,6 +2460,9 @@ void help() {
          << "  Advanced options:\n"
          << "    --filter-illumina: discard reads that have been marked by Illumina's chastity/purity filter as failing.\n"
          << "    --disable-rad-check: disable checking if the RAD cut site is intact.\n"
+	 << "    --encoding: specify how quality scores are encoded, 'phred33' (Illumina 1.8+/Sanger, default) or 'phred64' (Illumina 1.3-1.5).\n"
+         << "    --window-size: set the size of the sliding window as a fraction of the read length, between 0 and 1 (default 0.15).\n"
+         << "    --score-limit: set the phred score limit. If the average score within the sliding window drops below this value, the read is discarded (default 10).\n"
          << "    --len-limit <limit>: specify a minimum sequence length (useful if your data has already been trimmed).\n"
          << "    --barcode-dist-1: the number of allowed mismatches when rescuing single-end barcodes (default 1).\n"
          << "    --barcode-dist-2: the number of allowed mismatches when rescuing paired-end barcodes (defaults to --barcode-dist-1).\n";


=====================================
src/process_radtags.h
=====================================
@@ -1,6 +1,6 @@
 // -*-mode:c++; c-style:k&r; c-basic-offset:4;-*-
 //
-// Copyright 2010-2018, Julian Catchen <jcatchen at illinois.edu>
+// Copyright 2010-2022, Julian Catchen <jcatchen at illinois.edu>
 //
 // This file is part of Stacks.
 //
@@ -32,6 +32,8 @@
 #include <sstream>
 
 #include <vector>
+#include <deque>
+#include <queue>
 #include <map>
 #include <set>
 #include <utility>
@@ -51,12 +53,22 @@
 void help( void );
 void version( void );
 int  parse_command_line(int, char **);
+int  allocate_threads();
+int  initialize_counters(map<string, long> &);
+int  consolidate_counters(map<string, long> &, vector<map<string, long>> &);
+int  consolidate_barcode_logs(map<BarcodePair, map<string, long>> &, vector<map<BarcodePair, map<string, long>>> &);
+
 template<typename fhType>
 int  process_reads(string,
                    set<string> &, set<string> &,
                    map<BarcodePair, fhType *> &,
                    map<string, long> &, map<BarcodePair, map<string, long> > &);
 template<typename fhType>
+int  process_reads_parallel(string,
+			    set<string> &, set<string> &,
+			    map<BarcodePair, fhType *> &,
+			    map<string, long> &, map<BarcodePair, map<string, long> > &);
+template<typename fhType>
 int  process_paired_reads(string, string,
                           set<string> &, set<string> &,
                           map<BarcodePair, fhType *> &,
@@ -64,6 +76,14 @@ int  process_paired_reads(string, string,
                           map<BarcodePair, fhType *> &,
                           map<BarcodePair, fhType *> &,
                           map<string, long> &, map<BarcodePair, map<string, long> > &);
+template<typename fhType>
+int  process_paired_reads_parallel(string, string,
+                          set<string> &, set<string> &,
+                          map<BarcodePair, fhType *> &,
+                          map<BarcodePair, fhType *> &,
+                          map<BarcodePair, fhType *> &,
+                          map<BarcodePair, fhType *> &,
+                          map<string, long> &, map<BarcodePair, map<string, long> > &);
 int  process_singlet(RawRead *,
                      string, bool,
                      map<string, long> &, map<string, long> &);


=====================================
src/renz.cc
=====================================
@@ -58,6 +58,8 @@ const char *ecoT22I[] = {"TGCAT",             // A/TGCAT, EcoT22I
                          "ATGCA"};
 const char *haeIII[]  = {"CC",                // GG/CC, HaeIII
                          "GG"};
+const char *hhaI[]    = {"GCG",               // GCG/C, HhaI
+			 "CGC"};
 const char *hinP1I[]  = {"CGC",               // G/CGC, HinP1I
                          "GCG"};
 const char *hindIII[] = {"AGCTT",             // A/AGCTT, HindIII
@@ -110,6 +112,9 @@ const char *pspXI[]   = {"TCGAGC", "TCGAGG", "TCGAGT", // VC/TCGAGB, PspXI
                          "GCTCGA", "CCTCGA", "ACTCGA"};
 const char *pstI[]    = {"TGCAG",             // CTGCA/G, PstI
                          "CTGCA"};
+const char *pstIshi[] = {"GCAG",              // CTGCA/G, PstI;
+                         "CTGC"};             //   Modified so that "CT" is part of the Illumina sequencing primer and not in the final sequence.
+                                              //   As published in Shirasawa, Hirakawa, and Isobe, 2016, DNA Research; doi: 10.1093/dnares/dsw004
 const char *rsaI[]    = {"AC",                // GT/AC, RsaI
                          "GT"};
 const char *sacI[]    = {"AGCTC",             // GAGCT/C, SacI
@@ -138,6 +143,7 @@ initialize_renz(map<string, const char **> &renz, map<string, int> &renz_cnt, ma
 
     renz["sbfI"]    = sbfI;    // CCTGCA/GG, SbfI
     renz["pstI"]    = pstI;    // CTGCA/G, PstI
+    renz["pstIshi"] = pstIshi; // CTGCA/G, PstI-SHI
     renz["notI"]    = notI;    // GC/GGCCGC, NotI
     renz["ecoRI"]   = ecoRI;   // G/AATTC, EcoRI
     renz["nspI"]    = nspI;
@@ -184,6 +190,7 @@ initialize_renz(map<string, const char **> &renz, map<string, int> &renz_cnt, ma
     renz["csp6I"]   = csp6I;   // G/TAC, Csp6I
     renz["bsaHI"]   = bsaHI;   // GR/CGYC, BsaHI
     renz["hpaII"]   = hpaII;   // C/CGG, HpaII
+    renz["hhaI"]    = hhaI;    // GCG/C, HhaI
     renz["ncoI"]    = ncoI;    // C/CATGG, NcoI
     renz["apaLI"]   = apaLI;   // G/TGCAC, ApaLI
     renz["hinP1I"]  = hinP1I;  // G/CGC, HinP1I
@@ -192,10 +199,11 @@ initialize_renz(map<string, const char **> &renz, map<string, int> &renz_cnt, ma
     renz["pacI"]    = pacI;    // AAT/TAATT, PacI
     renz["pspXI"]   = pspXI;   // VC/TCGAGB, PspXI
     renz["hpyCH4IV"] = hpyCH4IV; // A/CGT, HpyCH4IV
-    renz["ngoMIV"]   = ngoMIV;   // G/CCGGC, NgoMIV    
+    renz["ngoMIV"]   = ngoMIV;   // G/CCGGC, NgoMIV
 
     renz_cnt["sbfI"]    = 1;
     renz_cnt["pstI"]    = 1;
+    renz_cnt["pstIshi"] = 1;
     renz_cnt["notI"]    = 1;
     renz_cnt["ecoRI"]   = 1;
     renz_cnt["nspI"]    = 2;
@@ -242,6 +250,7 @@ initialize_renz(map<string, const char **> &renz, map<string, int> &renz_cnt, ma
     renz_cnt["csp6I"]   = 1;
     renz_cnt["bsaHI"]   = 2;
     renz_cnt["hpaII"]   = 1;
+    renz_cnt["hhaI"]    = 1;
     renz_cnt["ncoI"]    = 1;
     renz_cnt["apaLI"]   = 1;
     renz_cnt["hinP1I"]  = 1;
@@ -254,6 +263,7 @@ initialize_renz(map<string, const char **> &renz, map<string, int> &renz_cnt, ma
     
     renz_len["sbfI"]    = 6;
     renz_len["pstI"]    = 5;
+    renz_len["pstIshi"] = 4;
     renz_len["notI"]    = 6;
     renz_len["ecoRI"]   = 5;
     renz_len["nspI"]    = 5;
@@ -300,6 +310,7 @@ initialize_renz(map<string, const char **> &renz, map<string, int> &renz_cnt, ma
     renz_len["csp6I"]   = 3;
     renz_len["bsaHI"]   = 4;
     renz_len["hpaII"]   = 3;
+    renz_len["hhaI"]    = 3;
     renz_len["ncoI"]    = 5;
     renz_len["apaLI"]   = 5;
     renz_len["hinP1I"]  = 3;


=====================================
src/ustacks.cc
=====================================
@@ -1,6 +1,6 @@
 // -*-mode:c++; c-style:k&r; c-basic-offset:4;-*-
 //
-// Copyright 2010-2021, Julian Catchen <jcatchen at illinois.edu>
+// Copyright 2010-2022, Julian Catchen <jcatchen at illinois.edu>
 //
 // This file is part of Stacks.
 //
@@ -244,9 +244,10 @@ try{
     }
 
     //
-    // Report how many reads were used.
+    // Report how many loci were built, how many reads were used, and final coverage.
     //
-    cerr << "Final coverage: ";
+    cerr << "Final number of stacks built: " << merged.size() << "\n"
+	 << "Final coverage: ";
     report_cov();
     cerr << "\n";
 
@@ -259,7 +260,7 @@ try{
     //
     // Write output files.
     //
-    cerr << "Writing tags, SNPs, and alleles files...\n";
+    cerr << "Writing tags, SNPs, and alleles files to '" << prefix_path << "'...";
     write_results(merged, unique, remainders);
     cerr << "done.\n";
 
@@ -2753,8 +2754,6 @@ load_seq_ids(vector<char *> &seq_ids)
     else if (in_file_type == FileT::gzfastq)
         fh = new GzFastq(in_file.c_str());
 
-    cerr << "Refetching read IDs...";
-
     char *id;
     Seq c;
     c.id   = new char[id_len];



View it on GitLab: https://salsa.debian.org/med-team/stacks/-/compare/3e0a1f35bd7d5ab2033ea7b3f9b51e9ffdd5eba8...3a737b16f64920e7f9c47ed9fa4fbd190a90231c

-- 
View it on GitLab: https://salsa.debian.org/med-team/stacks/-/compare/3e0a1f35bd7d5ab2033ea7b3f9b51e9ffdd5eba8...3a737b16f64920e7f9c47ed9fa4fbd190a90231c
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20220927/f362561e/attachment-0001.htm>


More information about the debian-med-commit mailing list