[med-svn] [Git][med-team/bbmap][manpages] Add new manpages and move all manpages into debian/mans subdir
Andreas Tille
gitlab at salsa.debian.org
Thu Apr 4 13:39:07 BST 2019
Andreas Tille pushed to branch manpages at Debian Med / bbmap
Commits:
3758d4c6 by Andreas Tille at 2019-04-04T12:14:22Z
Add new manpages and move all manpages into debian/mans subdir
- - - - -
8 changed files:
- debian/createmanpages
- debian/manpages
- + debian/mans/bbduk.sh.1
- debian/bbmap.sh.1 → debian/mans/bbmap.sh.1
- + debian/mans/bbnorm.sh.1
- debian/bloomfilter.sh.1 → debian/mans/bloomfilter.sh.1
- + debian/mans/dedupe.sh.1
- + debian/mans/reformat.sh.1
Changes:
=====================================
debian/createmanpages
=====================================
@@ -1,5 +1,5 @@
#!/bin/sh
-MANDIR=debian
+MANDIR=debian/mans
mkdir -p $MANDIR
VERSION=`dpkg-parsechangelog | awk '/^Version:/ {print $2}' | sed -e 's/^[0-9]*://' -e 's/-.*//' -e 's/[+~]dfsg$//'`
@@ -24,6 +24,30 @@ help2man --no-info --no-discard-stderr --help-option=" " \
--version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
echo $AUTHOR >> $MANDIR/${progname}.1
+progname=bbnorm.sh
+help2man --no-info --no-discard-stderr --help-option=" " \
+ --name="Kmer-based error-correction and normalization tool" \
+ --version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
+echo $AUTHOR >> $MANDIR/${progname}.1
+
+progname=dedupe.sh
+help2man --no-info --no-discard-stderr --help-option=" " \
+ --name="Simplifies assemblies by removing duplicate or contained" \
+ --version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
+echo $AUTHOR >> $MANDIR/${progname}.1
+
+progname=reformat.sh
+help2man --no-info --no-discard-stderr --help-option=" " \
+ --name="Reformats reads between fasta/fastq/scarf/fasta+qual/sam, interleaved/paired, and ASCII-33/64" \
+ --version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
+echo $AUTHOR >> $MANDIR/${progname}.1
+
+progname=bbduk.sh
+help2man --no-info --no-discard-stderr --help-option=" " \
+ --name="Filters, trims, or masks reads with kmer matches to an artifact/contaminant file" \
+ --version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
+echo $AUTHOR >> $MANDIR/${progname}.1
+
echo "$MANDIR/*.1" > debian/manpages
cat <<EOT
=====================================
debian/manpages
=====================================
@@ -1 +1 @@
-debian/*.1
+debian/mans/*.1
=====================================
debian/mans/bbduk.sh.1
=====================================
@@ -0,0 +1,488 @@
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
+.TH BBDUK.SH "1" "April 2019" "bbduk.sh 38.43" "User Commands"
+.SH NAME
+bbduk.sh \- Filters, trims, or masks reads with kmer matches to an artifact/contaminant file
+.SH SYNOPSIS
+.B bbduk.sh
+\fI\,in=<input file> out=<output file> ref=<contaminant files>\/\fR
+.SH AUTHOR
+Written by Brian Bushnell
+Last modified March 21, 2019
+.PP
+Description: Compares reads to the kmers in a reference dataset, optionally
+allowing an edit distance. Splits the reads into two outputs \- those that
+match the reference, and those that don't. Can also trim (remove) the matching
+parts of the reads rather than binning the reads.
+Please read bbmap/docs/guides/BBDukGuide.txt for more information.
+.PP
+Input may be stdin or a fasta or fastq file, compressed or uncompressed.
+If you pipe via stdin/stdout, please include the file type; e.g. for gzipped
+fasta input, set in=stdin.fa.gz
+.PP
+Input parameters:
+in=<file> Main input. in=stdin.fq will pipe from stdin.
+in2=<file> Input for 2nd read of pairs in a different file.
+ref=<file,file> Comma\-delimited list of reference files.
+.TP
+In addition to filenames, you may also use the keywords:
+adapters, artifacts, phix, lambda, pjet, mtst, kapa
+.PP
+literal=<seq,seq> Comma\-delimited list of literal reference sequences.
+touppercase=f (tuc) Change all bases upper\-case.
+interleaved=auto (int) t/f overrides interleaved autodetection.
+qin=auto Input quality offset: 33 (Sanger), 64, or auto.
+reads=\-1 If positive, quit after processing X reads or pairs.
+copyundefined=f (cu) Process non\-AGCT IUPAC reference bases by making all
+.TP
+possible unambiguous copies.
+Intended for short motifs
+.IP
+or adapter barcodes, as time/memory use is exponential.
+.PP
+samplerate=1 Set lower to only process a fraction of input reads.
+samref=<file> Optional reference fasta for processing sam files.
+.PP
+Output parameters:
+out=<file> (outnonmatch) Write reads here that do not contain
+.TP
+kmers matching the database.
+\&'out=stdout.fq' will pipe
+.IP
+to standard out.
+.PP
+out2=<file> (outnonmatch2) Use this to write 2nd read of pairs to a
+.IP
+different file.
+.PP
+outm=<file> (outmatch) Write reads here that fail filters. In default
+.TP
+kfilter mode, this means any read with a matching kmer.
+In any mode, it also includes reads that fail filters such
+as minlength, mingc, maxgc, entropy, etc. In other words,
+it includes all reads that do not go to 'out'.
+.PP
+outm2=<file> (outmatch2) Use this to write 2nd read of pairs to a
+.IP
+different file.
+.PP
+outs=<file> (outsingle) Use this to write singleton reads whose mate
+.IP
+was trimmed shorter than minlen.
+.PP
+stats=<file> Write statistics about which contamininants were detected.
+refstats=<file> Write statistics on a per\-reference\-file basis.
+rpkm=<file> Write RPKM for each reference sequence (for RNA\-seq).
+dump=<file> Dump kmer tables to a file, in fasta format.
+duk=<file> Write statistics in duk's format. *DEPRECATED*
+nzo=t Only write statistics about ref sequences with nonzero hits.
+overwrite=t (ow) Grant permission to overwrite files.
+showspeed=t (ss) 'f' suppresses display of processing speed.
+ziplevel=2 (zl) Compression level; 1 (min) through 9 (max).
+fastawrap=70 Length of lines in fasta output.
+qout=auto Output quality offset: 33 (Sanger), 64, or auto.
+statscolumns=3 (cols) Number of columns for stats output, 3 or 5.
+.IP
+5 includes base counts.
+.PP
+rename=f Rename reads to indicate which sequences they matched.
+refnames=f Use names of reference files rather than scaffold IDs.
+trd=f Truncate read and ref names at the first whitespace.
+ordered=f Set to true to output reads in same order as input.
+maxbasesout=\-1 If positive, quit after writing approximately this many
+.IP
+bases to out (outu/outnonmatch).
+.PP
+maxbasesoutm=\-1 If positive, quit after writing approximately this many
+.IP
+bases to outm (outmatch).
+.PP
+json=f Print to screen in json format.
+.PP
+Histogram output parameters:
+bhist=<file> Base composition histogram by position.
+qhist=<file> Quality histogram by position.
+qchist=<file> Count of bases with each quality value.
+aqhist=<file> Histogram of average read quality.
+bqhist=<file> Quality histogram designed for box plots.
+lhist=<file> Read length histogram.
+phist=<file> Polymer length histogram.
+gchist=<file> Read GC content histogram.
+ihist=<file> Insert size histogram, for paired reads in mapped sam.
+gcbins=100 Number gchist bins. Set to 'auto' to use read length.
+maxhistlen=6000 Set an upper bound for histogram lengths; higher uses
+.TP
+more memory.
+The default is 6000 for some histograms
+.IP
+and 80000 for others.
+.PP
+Histograms for mapped sam/bam files only:
+histbefore=t Calculate histograms from reads before processing.
+ehist=<file> Errors\-per\-read histogram.
+qahist=<file> Quality accuracy histogram of error rates versus quality
+.IP
+score.
+.PP
+indelhist=<file> Indel length histogram.
+mhist=<file> Histogram of match, sub, del, and ins rates by position.
+idhist=<file> Histogram of read count versus percent identity.
+idbins=100 Number idhist bins. Set to 'auto' to use read length.
+varfile=<file> Ignore substitution errors listed in this file when
+.TP
+calculating error rates.
+Can be generated with
+.IP
+CallVariants.
+.PP
+vcf=<file> Ignore substitution errors listed in this VCF file
+.IP
+when calculating error rates.
+.PP
+ignorevcfindels=t Also ignore indels listed in the VCF.
+.PP
+Processing parameters:
+k=27 Kmer length used for finding contaminants. Contaminants
+.TP
+shorter than k will not be found.
+k must be at least 1.
+.PP
+rcomp=t Look for reverse\-complements of kmers in addition to
+.IP
+forward kmers.
+.PP
+maskmiddle=t (mm) Treat the middle base of a kmer as a wildcard, to
+.IP
+increase sensitivity in the presence of errors.
+.PP
+minkmerhits=1 (mkh) Reads need at least this many matching kmers
+.IP
+to be considered as matching the reference.
+.PP
+minkmerfraction=0.0 (mkf) A reads needs at least this fraction of its total
+.TP
+kmers to hit a ref, in order to be considered a match.
+If this and minkmerhits are set, the greater is used.
+.PP
+mincovfraction=0.0 (mcf) A reads needs at least this fraction of its total
+.TP
+bases to be covered by ref kmers to be considered a match.
+If specified, mcf overrides mkh and mkf.
+.PP
+hammingdistance=0 (hdist) Maximum Hamming distance for ref kmers (subs only).
+.IP
+Memory use is proportional to (3*K)^hdist.
+.PP
+qhdist=0 Hamming distance for query kmers; impacts speed, not memory.
+editdistance=0 (edist) Maximum edit distance from ref kmers (subs
+.TP
+and indels).
+Memory use is proportional to (8*K)^edist.
+.PP
+hammingdistance2=0 (hdist2) Sets hdist for short kmers, when using mink.
+qhdist2=0 Sets qhdist for short kmers, when using mink.
+editdistance2=0 (edist2) Sets edist for short kmers, when using mink.
+forbidn=f (fn) Forbids matching of read kmers containing N.
+.TP
+By default, these will match a reference 'A' if
+hdist>0 or edist>0, to increase sensitivity.
+.PP
+removeifeitherbad=t (rieb) Paired reads get sent to 'outmatch' if either is
+.TP
+match (or either is trimmed shorter than minlen).
+Set to false to require both.
+.PP
+trimfailures=f Instead of discarding failed reads, trim them to 1bp.
+.IP
+This makes the statistics a bit odd.
+.PP
+findbestmatch=f (fbm) If multiple matches, associate read with sequence
+.TP
+sharing most kmers.
+Reduces speed.
+.PP
+skipr1=f Don't do kmer\-based operations on read 1.
+skipr2=f Don't do kmer\-based operations on read 2.
+ecco=f For overlapping paired reads only. Performs errorcorrection with BBMerge prior to kmer operations.
+recalibrate=f (recal) Recalibrate quality scores. Requires calibration
+.IP
+matrices generated by CalcTrueQuality.
+.PP
+sam=<file,file> If recalibration is desired, and matrices have not already
+.IP
+been generated, BBDuk will create them from the sam file.
+.PP
+amino=f Run in amino acid mode. Some features have not been
+.TP
+tested, but kmer\-matching works fine.
+Maximum k is 12.
+.PP
+Speed and Memory parameters:
+threads=auto (t) Set number of threads to use; default is number of
+.IP
+logical processors.
+.PP
+prealloc=f Preallocate memory in table. Allows faster table loading
+.IP
+and more efficient memory usage, for a large reference.
+.PP
+monitor=f Kill this process if it crashes. monitor=600,0.01 would
+.IP
+kill after 600 seconds under 1% usage.
+.PP
+minrskip=1 (mns) Force minimal skip interval when indexing reference
+.TP
+kmers.
+1 means use all, 2 means use every other kmer, etc.
+.PP
+maxrskip=1 (mxs) Restrict maximal skip interval when indexing
+.TP
+reference kmers. Normally all are used for scaffolds<100kb,
+but with longer scaffolds, up to maxrskip\-1 are skipped.
+.PP
+rskip= Set both minrskip and maxrskip to the same value.
+.IP
+If not set, rskip will vary based on sequence length.
+.PP
+qskip=1 Skip query kmers to increase speed. 1 means use all.
+speed=0 Ignore this fraction of kmer space (0\-15 out of 16) in both
+.TP
+reads and reference.
+Increases speed and reduces memory.
+.PP
+Note: Do not use more than one of 'speed', 'qskip', and 'rskip'.
+.PP
+Trimming/Filtering/Masking parameters:
+Note \- if ktrim, kmask, and ksplit are unset, the default behavior is kfilter.
+All kmer processing modes are mutually exclusive.
+Reads only get sent to 'outm' purely based on kmer matches in kfilter mode.
+.PP
+ktrim=f Trim reads to remove bases matching reference kmers.
+.TP
+Values:
+f (don't trim),
+r (trim to the right),
+l (trim to the left)
+.PP
+kmask= Replace bases matching ref kmers with another symbol.
+.TP
+Allows any non\-whitespace character, and processes short
+kmers on both ends if mink is set. 'kmask=lc' will
+convert masked bases to lowercase.
+.PP
+maskfullycovered=f (mfc) Only mask bases that are fully covered by kmers.
+ksplit=f For single\-ended reads only. Reads will be split into
+.TP
+pairs around the kmer.
+If the kmer is at the end of the
+.TP
+read, it will be trimmed instead.
+Singletons will go to
+.TP
+out, and pairs will go to outm.
+Do not use ksplit with
+.IP
+other operations such as quality\-trimming or filtering.
+.PP
+mink=0 Look for shorter kmers at read tips down to this length,
+.TP
+when k\-trimming or masking.
+0 means disabled. Enabling
+.IP
+this will disable maskmiddle.
+.PP
+qtrim=f Trim read ends to remove bases with quality below trimq.
+.TP
+Performed AFTER looking for kmers.
+Values:
+.TP
+rl (trim both ends),
+f (neither end),
+r (right end only),
+l (left end only),
+w (sliding window).
+.PP
+trimq=6 Regions with average quality BELOW this will be trimmed,
+.TP
+if qtrim is set to something other than f.
+Can be a
+.IP
+floating\-point number like 7.3.
+.PP
+trimclip=f Trim soft\-clipped bases from sam files.
+minlength=10 (ml) Reads shorter than this after trimming will be
+.TP
+discarded.
+Pairs will be discarded if both are shorter.
+.PP
+mlf=0 (minlengthfraction) Reads shorter than this fraction of
+.IP
+original length after trimming will be discarded.
+.PP
+maxlength= Reads longer than this after trimming will be discarded.
+.IP
+Pairs will be discarded only if both are longer.
+.PP
+minavgquality=0 (maq) Reads with average quality (after trimming) below
+.IP
+this will be discarded.
+.PP
+maqb=0 If positive, calculate maq from this many initial bases.
+minbasequality=0 (mbq) Reads with any base below this quality (after
+.IP
+trimming) will be discarded.
+.PP
+maxns=\-1 If non\-negative, reads with more Ns than this
+.IP
+(after trimming) will be discarded.
+.PP
+mcb=0 (minconsecutivebases) Discard reads without at least
+.IP
+this many consecutive called bases.
+.PP
+ottm=f (outputtrimmedtomatch) Output reads trimmed to shorter
+.IP
+than minlength to outm rather than discarding.
+.PP
+tp=0 (trimpad) Trim this much extra around matching kmers.
+tbo=f (trimbyoverlap) Trim adapters based on where paired
+.IP
+reads overlap.
+.PP
+strictoverlap=t Adjust sensitivity for trimbyoverlap mode.
+minoverlap=14 Require this many bases of overlap for detection.
+mininsert=40 Require insert size of at least this for overlap.
+.IP
+Should be reduced to 16 for small RNA sequencing.
+.PP
+tpe=f (trimpairsevenly) When kmer right\-trimming, trim both
+.IP
+reads to the minimum length of either.
+.PP
+forcetrimleft=0 (ftl) If positive, trim bases to the left of this position
+.IP
+(exclusive, 0\-based).
+.PP
+forcetrimright=0 (ftr) If positive, trim bases to the right of this position
+.IP
+(exclusive, 0\-based).
+.PP
+forcetrimright2=0 (ftr2) If positive, trim this many bases on the right end.
+forcetrimmod=0 (ftm) If positive, right\-trim length to be equal to zero,
+.IP
+modulo this number.
+.PP
+restrictleft=0 If positive, only look for kmer matches in the
+.IP
+leftmost X bases.
+.PP
+restrictright=0 If positive, only look for kmer matches in the
+.IP
+rightmost X bases.
+.PP
+mingc=0 Discard reads with GC content below this.
+maxgc=1 Discard reads with GC content above this.
+gcpairs=t Use average GC of paired reads.
+.IP
+Also affects gchist.
+.PP
+tossjunk=f Discard reads with invalid characters as bases.
+swift=f Trim Swift sequences: Trailing C/T/N R1, leading G/A/N R2.
+.PP
+Header\-parsing parameters \- these require Illumina headers:
+chastityfilter=f (cf) Discard reads with id containing ' 1:Y:' or ' 2:Y:'.
+barcodefilter=f Remove reads with unexpected barcodes if barcodes is set,
+.TP
+or barcodes containing 'N' otherwise.
+A barcode must be
+.TP
+the last part of the read header.
+Values:
+.TP
+t:
+Remove reads with bad barcodes.
+.TP
+f:
+Ignore barcodes.
+.IP
+crash: Crash upon encountering bad barcodes.
+.PP
+barcodes= Comma\-delimited list of barcodes or files of barcodes.
+xmin=\-1 If positive, discard reads with a lesser X coordinate.
+ymin=\-1 If positive, discard reads with a lesser Y coordinate.
+xmax=\-1 If positive, discard reads with a greater X coordinate.
+ymax=\-1 If positive, discard reads with a greater Y coordinate.
+.PP
+Polymer trimming:
+trimpolya=0 If greater than 0, trim poly\-A or poly\-T tails of
+.IP
+at least this length on either end of reads.
+.PP
+trimpolygleft=0 If greater than 0, trim poly\-G prefixes of at least this
+.TP
+length on the left end of reads.
+Does not trim poly\-C.
+.PP
+trimpolygright=0 If greater than 0, trim poly\-G tails of at least this
+.TP
+length on the right end of reads.
+Does not trim poly\-C.
+.PP
+trimpolyg=0 This sets both left and right at once.
+filterpolyg=0 If greater than 0, remove reads with a poly\-G prefix of
+.IP
+at least this length (on the left).
+.PP
+Note: there are also equivalent poly\-C flags.
+.PP
+Polymer tracking:
+pratio=base,base 'pratio=G,C' will print the ratio of G to C polymers.
+plen=20 Length of homopolymers to count.
+.PP
+Entropy/Complexity parameters:
+entropy=\-1 Set between 0 and 1 to filter reads with entropy below
+.TP
+that value.
+Higher is more stringent.
+.PP
+entropywindow=50 Calculate entropy using a sliding window of this length.
+entropyk=5 Calculate entropy using kmers of this length.
+minbasefrequency=0 Discard reads with a minimum base frequency below this.
+entropymask=f Values:
+.TP
+f:
+Discard low\-entropy sequences.
+.TP
+t:
+Mask low\-entropy parts of sequences with N.
+.IP
+lc: Change low\-entropy parts of sequences to lowercase.
+.PP
+entropymark=f Mark each base with its entropy value. This is on a scale
+.TP
+of 0\-41 and is reported as quality scores, so the output
+should be fastq or fasta+qual.
+.PP
+Cardinality estimation:
+cardinality=f (loglog) Count unique kmers using the LogLog algorithm.
+cardinalityout=f (loglogout) Count unique kmers in output reads.
+loglogk=31 Use this kmer length for counting.
+loglogbuckets=1999 Use this many buckets for counting.
+.PP
+Java Parameters:
+.PP
+\fB\-Xmx\fR This will set Java's memory usage, overriding autodetection.
+.TP
+\fB\-Xmx20g\fR will
+specify 20 gigs of RAM, and \fB\-Xmx200m\fR will specify 200 megs.
+The max is typically 85% of physical memory.
+.PP
+\fB\-eoom\fR This flag will cause the process to exit if an
+.TP
+out\-of\-memory exception occurs.
+Requires Java 8u92+.
+.PP
+\fB\-da\fR Disable assertions.
+.PP
+Please contact Brian Bushnell at bbushnell at lbl.gov if you encounter any problems.
+.SH AUTHOR
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
=====================================
debian/bbmap.sh.1 → debian/mans/bbmap.sh.1
=====================================
=====================================
debian/mans/bbnorm.sh.1
=====================================
@@ -0,0 +1,139 @@
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
+.TH BBNORM.SH "1" "April 2019" "bbnorm.sh 38.43" "User Commands"
+.SH NAME
+bbnorm.sh \- Kmer-based error-correction and normalization tool
+.SH SYNOPSIS
+.B bbnorm.sh
+\fI\,in=<input> out=<reads to keep> outt=<reads to toss> hist=<histogram output>\/\fR
+.SH AUTHOR
+Written by Brian Bushnell
+Last modified October 19, 2017
+.PP
+Description: Normalizes read depth based on kmer counts.
+Can also error\-correct, bin reads by kmer depth, and generate a kmer depth histogram.
+However, Tadpole has superior error\-correction to BBNorm.
+Please read bbmap/docs/guides/BBNormGuide.txt for more information.
+.PP
+Input parameters:
+in=null Primary input. Use in2 for paired reads in a second file
+in2=null Second input file for paired reads in two files
+extra=null Additional files to use for input (generating hash table) but not for output
+fastareadlen=2^31 Break up FASTA reads longer than this. Can be useful when processing scaffolded genomes
+tablereads=\-1 Use at most this many reads when building the hashtable (\fB\-1\fR means all)
+kmersample=1 Process every nth kmer, and skip the rest
+readsample=1 Process every nth read, and skip the rest
+interleaved=auto May be set to true or false to force the input read file to ovverride autodetection of the input file as paired interleaved.
+qin=auto ASCII offset for input quality. May be 33 (Sanger), 64 (Illumina), or auto.
+.PP
+Output parameters:
+out=<file> File for normalized or corrected reads. Use out2 for paired reads in a second file
+outt=<file> (outtoss) File for reads that were excluded from primary output
+reads=\-1 Only process this number of reads, then quit (\fB\-1\fR means all)
+sampleoutput=t Use sampling on output as well as input (not used if sample rates are 1)
+keepall=f Set to true to keep all reads (e.g. if you just want error correction).
+zerobin=f Set to true if you want kmers with a count of 0 to go in the 0 bin instead of the 1 bin in histograms.
+.TP
+Default is false, to prevent confusion about how there can be 0\-count kmers.
+The reason is that based on the 'minq' and 'minprob' settings, some kmers may be excluded from the bloom filter.
+.PP
+tmpdir= This will specify a directory for temp files (only needed for multipass runs). If null, they will be written to the output directory.
+usetempdir=t Allows enabling/disabling of temporary directory; if disabled, temp files will be written to the output directory.
+qout=auto ASCII offset for output quality. May be 33 (Sanger), 64 (Illumina), or auto (same as input).
+rename=f Rename reads based on their kmer depth.
+.PP
+Hashing parameters:
+k=31 Kmer length (values under 32 are most efficient, but arbitrarily high values are supported)
+bits=32 Bits per cell in bloom filter; must be 2, 4, 8, 16, or 32. Maximum kmer depth recorded is 2^cbits. Automatically reduced to 16 in 2\-pass.
+.IP
+Large values decrease accuracy for a fixed amount of memory, so use the lowest number you can that will still capture highest\-depth kmers.
+.PP
+hashes=3 Number of times each kmer is hashed and stored. Higher is slower.
+.IP
+Higher is MORE accurate if there is enough memory, and LESS accurate if there is not enough memory.
+.PP
+prefilter=f True is slower, but generally more accurate; filters out low\-depth kmers from the main hashtable. The prefilter is more memory\-efficient because it uses 2\-bit cells.
+prehashes=2 Number of hashes for prefilter.
+prefilterbits=2 (pbits) Bits per cell in prefilter.
+prefiltersize=0.35 Fraction of memory to allocate to prefilter.
+buildpasses=1 More passes can sometimes increase accuracy by iteratively removing low\-depth kmers
+minq=6 Ignore kmers containing bases with quality below this
+minprob=0.5 Ignore kmers with overall probability of correctness below this
+threads=auto (t) Spawn exactly X hashing threads (default is number of logical processors). Total active threads may exceed X due to I/O threads.
+rdk=t (removeduplicatekmers) When true, a kmer's count will only be incremented once per read pair, even if that kmer occurs more than once.
+.PP
+Normalization parameters:
+fixspikes=f (fs) Do a slower, high\-precision bloom filter lookup of kmers that appear to have an abnormally high depth due to collisions.
+target=100 (tgt) Target normalization depth. NOTE: All depth parameters control kmer depth, not read depth.
+.TP
+For kmer depth Dk, read depth Dr, read length R, and kmer size K:
+Dr=Dk*(R/(R\-K+1))
+.PP
+maxdepth=\-1 (max) Reads will not be downsampled when below this depth, even if they are above the target depth.
+mindepth=5 (min) Kmers with depth below this number will not be included when calculating the depth of a read.
+minkmers=15 (mgkpr) Reads must have at least this many kmers over min depth to be retained. Aka 'mingoodkmersperread'.
+percentile=54.0 (dp) Read depth is by default inferred from the 54th percentile of kmer depth, but this may be changed to any number 1\-100.
+uselowerdepth=t (uld) For pairs, use the depth of the lower read as the depth proxy.
+deterministic=t (dr) Generate random numbers deterministically to ensure identical output between multiple runs. May decrease speed with a huge number of threads.
+passes=2 (p) 1 pass is the basic mode. 2 passes (default) allows greater accuracy, error detection, better contol of output depth.
+.PP
+Error detection parameters:
+hdp=90.0 (highdepthpercentile) Position in sorted kmer depth array used as proxy of a read's high kmer depth.
+ldp=25.0 (lowdepthpercentile) Position in sorted kmer depth array used as proxy of a read's low kmer depth.
+tossbadreads=f (tbr) Throw away reads detected as containing errors.
+requirebothbad=f (rbb) Only toss bad pairs if both reads are bad.
+errordetectratio=125 (edr) Reads with a ratio of at least this much between their high and low depth kmers will be classified as error reads.
+highthresh=12 (ht) Threshold for high kmer. A high kmer at this or above are considered non\-error.
+lowthresh=3 (lt) Threshold for low kmer. Kmers at this and below are always considered errors.
+.PP
+Error correction parameters:
+ecc=f Set to true to correct errors. NOTE: Tadpole is now preferred for ecc as it does a better job.
+ecclimit=3 Correct up to this many errors per read. If more are detected, the read will remain unchanged.
+errorcorrectratio=140 (ecr) Adjacent kmers with a depth ratio of at least this much between will be classified as an error.
+echighthresh=22 (echt) Threshold for high kmer. A kmer at this or above may be considered non\-error.
+eclowthresh=2 (eclt) Threshold for low kmer. Kmers at this and below are considered errors.
+eccmaxqual=127 Do not correct bases with quality above this value.
+aec=f (aggressiveErrorCorrection) Sets more aggressive values of ecr=100, ecclimit=7, echt=16, eclt=3.
+cec=f (conservativeErrorCorrection) Sets more conservative values of ecr=180, ecclimit=2, echt=30, eclt=1, sl=4, pl=4.
+meo=f (markErrorsOnly) Marks errors by reducing quality value of suspected errors; does not correct anything.
+mue=t (markUncorrectableErrors) Marks errors only on uncorrectable reads; requires 'ecc=t'.
+overlap=f (ecco) Error correct by read overlap.
+.PP
+Depth binning parameters:
+lowbindepth=10 (lbd) Cutoff for low depth bin.
+highbindepth=80 (hbd) Cutoff for high depth bin.
+outlow=<file> Pairs in which both reads have a median below lbd go into this file.
+outhigh=<file> Pairs in which both reads have a median above hbd go into this file.
+outmid=<file> All other pairs go into this file.
+.PP
+Histogram parameters:
+hist=<file> Specify a file to write the input kmer depth histogram.
+histout=<file> Specify a file to write the output kmer depth histogram.
+histcol=3 (histogramcolumns) Number of histogram columns, 2 or 3.
+pzc=f (printzerocoverage) Print lines in the histogram with zero coverage.
+histlen=1048576 Max kmer depth displayed in histogram. Also affects statistics displayed, but does not affect normalization.
+.PP
+Peak calling parameters:
+peaks=<file> Write the peaks to this file. Default is stdout.
+minHeight=2 (h) Ignore peaks shorter than this.
+minVolume=5 (v) Ignore peaks with less area than this.
+minWidth=3 (w) Ignore peaks narrower than this.
+minPeak=2 (minp) Ignore peaks with an X\-value below this.
+maxPeak=BIG (maxp) Ignore peaks with an X\-value above this.
+maxPeakCount=8 (maxpc) Print up to this many peaks (prioritizing height).
+.PP
+Java Parameters:
+\fB\-Xmx\fR This will set Java's memory usage, overriding autodetection.
+.TP
+\fB\-Xmx20g\fR will specify 20 gigs of RAM, and \fB\-Xmx200m\fR will specify 200 megs.
+The max is typically 85% of physical memory.
+.PP
+\fB\-eoom\fR This flag will cause the process to exit if an
+.TP
+out\-of\-memory exception occurs.
+Requires Java 8u92+.
+.PP
+\fB\-da\fR Disable assertions.
+.PP
+Please contact Brian Bushnell at bbushnell at lbl.gov if you encounter any problems.
+.SH AUTHOR
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
=====================================
debian/bloomfilter.sh.1 → debian/mans/bloomfilter.sh.1
=====================================
=====================================
debian/mans/dedupe.sh.1
=====================================
@@ -0,0 +1,137 @@
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
+.TH DEDUPE.SH "1" "April 2019" "dedupe.sh 38.43" "User Commands"
+.SH NAME
+dedupe.sh \- Simplifies assemblies by removing duplicate or contained
+.SH SYNOPSIS
+.B dedupe.sh
+\fI\,in=<file or stdin> out=<file or stdout>\/\fR
+.SH AUTHOR
+Written by Brian Bushnell and Jonathan Rood
+Last modified November 20, 2017
+.PP
+Description: Accepts one or more files containing sets of sequences (reads or scaffolds).
+Removes duplicate sequences, which may be specified to be exact matches, subsequences, or sequences within some percent identity.
+Can also find overlapping sequences and group them into clusters.
+Please read bbmap/docs/guides/DedupeGuide.txt for more information.
+.PP
+An example of running Dedupe for clustering short reads:
+dedupe.sh in=x.fq am=f ac=f fo c pc rnc=f mcs=4 mo=100 s=1 pto cc qin=33 csf=stats.txt pattern=cluster_%.fq dot=graph.dot
+.PP
+Input may be fasta or fastq, compressed or uncompressed.
+Output may be stdout or a file. With no output parameter, data will be written to stdout.
+If 'out=null', there will be no output, but statistics will still be printed.
+You can also use 'dedupe <infile> <outfile>' without the 'in=' and 'out='.
+.PP
+I/O parameters:
+in=<file,file> A single file or a comma\-delimited list of files.
+out=<file> Destination for all output contigs.
+pattern=<file> Clusters will be written to individual files, where the '%' symbol in the pattern is replaced by cluster number.
+outd=<file> Optional; removed duplicates will go here.
+csf=<file> (clusterstatsfile) Write a list of cluster names and sizes.
+dot=<file> (graph) Write a graph in dot format. Requires 'fo' and 'pc' flags.
+threads=auto (t) Set number of threads to use; default is number of logical processors.
+overwrite=t (ow) Set to false to force the program to abort rather than overwrite an existing file.
+showspeed=t (ss) Set to 'f' to suppress display of processing speed.
+minscaf=0 (ms) Ignore contigs/scaffolds shorter than this.
+interleaved=auto If true, forces fastq input to be paired and interleaved.
+ziplevel=2 Set to 1 (lowest) through 9 (max) to change compression level; lower compression is faster.
+.PP
+Output format parameters:
+storename=t (sn) Store scaffold names (set false to save memory).
+#addpairnum=f Add .1 and .2 to numeric id of read1 and read2.
+storequality=t (sq) Store quality values for fastq assemblies (set false to save memory).
+uniquenames=t (un) Ensure all output scaffolds have unique names. Uses more memory.
+numbergraphnodes=t (ngn) Label dot graph nodes with read numbers rather than read names.
+sort=f Sort output (otherwise it will be random). Options:
+.TP
+length:
+Sort by length
+.TP
+quality: Sort by quality
+name: Sort by name
+id: Sort by input order
+.PP
+ascending=f Sort in ascending order.
+ordered=f Output sequences in input order. Equivalent to sort=id ascending.
+renameclusters=f (rnc) Rename contigs to indicate which cluster they are in.
+printlengthinedges=f (ple) Print the length of contigs in edges.
+.PP
+Processing parameters:
+absorbrc=t (arc) Absorb reverse\-complements as well as normal orientation.
+absorbmatch=t (am) Absorb exact matches of contigs.
+absorbcontainment=t (ac) Absorb full containments of contigs.
+#absorboverlap=f (ao) Absorb (merge) non\-contained overlaps of contigs (TODO).
+findoverlap=f (fo) Find overlaps between contigs (containments and non\-containments). Necessary for clustering.
+uniqueonly=f (uo) If true, all copies of duplicate reads will be discarded, rather than keeping 1.
+rmn=f (requirematchingnames) If true, both names and sequence must match.
+usejni=f (jni) Do alignments in C code, which is faster, if an edit distance is allowed.
+.IP
+This will require compiling the C code; details are in \fI\,/jni/README.txt\/\fP.
+.PP
+Subset parameters:
+subsetcount=1 (sstc) Number of subsets used to process the data; higher uses less memory.
+subset=0 (sst) Only process reads whose ((ID%subsetcount)==subset).
+.PP
+Clustering parameters:
+cluster=f (c) Group overlapping contigs into clusters.
+pto=f (preventtransitiveoverlaps) Do not look for new edges between nodes in the same cluster.
+minclustersize=1 (mcs) Do not output clusters smaller than this.
+pbr=f (pickbestrepresentative) Only output the single highest\-quality read per cluster.
+.PP
+Cluster postprocessing parameters:
+processclusters=f (pc) Run the cluster processing phase, which performs the selected operations in this category.
+.IP
+For example, pc AND cc must be enabled to perform cc.
+.PP
+fixmultijoins=t (fmj) Remove redundant overlaps between the same two contigs.
+removecycles=t (rc) Remove all cycles so clusters form trees.
+cc=t (canonicizeclusters) Flip contigs so clusters have a single orientation.
+fcc=f (fixcanoncontradictions) Truncate graph at nodes with canonization disputes.
+foc=f (fixoffsetcontradictions) Truncate graph at nodes with offset disputes.
+mst=f (maxspanningtree) Remove cyclic edges, leaving only the longest edges that form a tree.
+.PP
+Overlap Detection Parameters
+exact=t (ex) Only allow exact symbol matches. When false, an 'N' will match any symbol.
+touppercase=t (tuc) Convert input bases to upper\-case; otherwise, lower\-case will not match.
+maxsubs=0 (s) Allow up to this many mismatches (substitutions only, no indels). May be set higher than maxedits.
+maxedits=0 (e) Allow up to this many edits (subs or indels). Higher is slower.
+minidentity=100 (mid) Absorb contained sequences with percent identity of at least this (includes indels).
+minlengthpercent=0 (mlp) Smaller contig must be at least this percent of larger contig's length to be absorbed.
+minoverlappercent=0 (mop) Overlap must be at least this percent of smaller contig's length to cluster and merge.
+minoverlap=200 (mo) Overlap must be at least this long to cluster and merge.
+depthratio=0 (dr) When non\-zero, overlaps will only be formed between reads with a depth ratio of at most this.
+.TP
+Should be above 1.
+Depth is determined by parsing the read names; this information can be added
+.IP
+by running KmerNormalize (khist.sh, bbnorm.sh, or ecc.sh) with the flag 'rename'
+.PP
+k=31 Seed length used for finding containments and overlaps. Anything shorter than k will not be found.
+numaffixmaps=1 (nam) Number of prefixes/suffixes to index per contig. Higher is more sensitive, if edits are allowed.
+hashns=f Set to true to search for matches using kmers containing Ns. Can lead to extreme slowdown in some cases.
+#ignoreaffix1=f (ia1) Ignore first affix (for testing).
+#storesuffix=f (ss) Store suffix as well as prefix. Automatically set to true when doing inexact matches.
+.PP
+Other Parameters
+qtrim=f Set to qtrim=rl to trim leading and trailing Ns.
+trimq=6 Quality trim level.
+forcetrimleft=\-1 (ftl) If positive, trim bases to the left of this position (exclusive, 0\-based).
+forcetrimright=\-1 (ftr) If positive, trim bases to the right of this position (exclusive, 0\-based).
+.PP
+Note on Proteins / Amino Acids
+Dedupe supports amino acid space via the 'amino' flag. This also changes the default kmer length to 10.
+In amino acid mode, all flags related to canonicity and reverse\-complementation are disabled,
+and nam (numaffixmaps) is currently limited to 2 per tip.
+.PP
+Java Parameters:
+\fB\-Xmx\fR This will set Java's memory usage, overriding autodetection.
+.TP
+\fB\-Xmx20g\fR will specify 20 gigs of RAM, and \fB\-Xmx200m\fR will specify 200 megs.
+The max is typically 85% of physical memory.
+.PP
+\fB\-eoom\fR This flag will cause the process to exit if an out\-of\-memory exception occurs. Requires Java 8u92+.
+\fB\-da\fR Disable assertions.
+.PP
+Please contact Brian Bushnell at bbushnell at lbl.gov if you encounter any problems.
+.SH AUTHOR
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
=====================================
debian/mans/reformat.sh.1
=====================================
@@ -0,0 +1,206 @@
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
+.TH REFORMAT.SH "1" "April 2019" "reformat.sh 38.43" "User Commands"
+.SH NAME
+reformat.sh \- Reformats reads between fasta/fastq/scarf/fasta+qual/sam, interleaved/paired, and ASCII-33/64
+.SH SYNOPSIS
+.B reformat.sh
+\fI\,in=<file> in2=<file2> out=<outfile> out2=<outfile2>\/\fR
+.SH AUTHOR
+Written by Brian Bushnell
+Last modified February 21, 2019
+.PP
+Description: Reformats reads to change ASCII quality encoding, interleaving, file format, or compression format.
+Optionally performs additional functions such as quality trimming, subsetting, and subsampling.
+Supports fastq, fasta, fasta+qual, scarf, oneline, sam, bam, gzip, bz2.
+Please read bbmap/docs/guides/ReformatGuide.txt for more information.
+.PP
+in2 and out2 are for paired reads and are optional.
+If input is paired and there is only one output file, it will be written interleaved.
+.PP
+Parameters and their defaults:
+.PP
+ow=f (overwrite) Overwrites files that already exist.
+app=f (append) Append to files that already exist.
+zl=4 (ziplevel) Set compression level, 1 (low) to 9 (max).
+int=f (interleaved) Determines whether INPUT file is considered interleaved.
+fastawrap=70 Length of lines in fasta output.
+fastareadlen=0 Set to a non\-zero number to break fasta files into reads of at most this length.
+fastaminlen=1 Ignore fasta reads shorter than this.
+qin=auto ASCII offset for input quality. May be 33 (Sanger), 64 (Illumina), or auto.
+qout=auto ASCII offset for output quality. May be 33 (Sanger), 64 (Illumina), or auto (same as input).
+qfake=30 Quality value used for fasta to fastq reformatting.
+qfin=<.qual file> Read qualities from this qual file, for the reads coming from 'in=<fasta file>'
+qfin2=<.qual file> Read qualities from this qual file, for the reads coming from 'in2=<fasta file>'
+qfout=<.qual file> Write qualities from this qual file, for the reads going to 'out=<fasta file>'
+qfout2=<.qual file> Write qualities from this qual file, for the reads coming from 'out2=<fasta file>'
+outsingle=<file> (outs) If a read is longer than minlength and its mate is shorter, the longer one goes here.
+deleteinput=f Delete input upon successful completion.
+ref=<file> Optional reference fasta for sam processing.
+.PP
+Processing Parameters:
+.PP
+verifypaired=f (vpair) When true, checks reads to see if the names look paired. Prints an error message if not.
+verifyinterleaved=f (vint) sets 'vpair' to true and 'interleaved' to true.
+allowidenticalnames=f (ain) When verifying pair names, allows identical names, instead of requiring \fI\,/1\/\fP and \fI\,/2\/\fP or 1: and 2:
+tossbrokenreads=f (tbr) Discard reads that have different numbers of bases and qualities. By default this will be detected and cause a crash.
+ignorebadquality=f (ibq) Fix out\-of\-range quality values instead of crashing with a warning.
+addslash=f Append ' /1' and ' /2' to read names, if not already present. Please include the flag 'int=t' if the reads are interleaved.
+spaceslash=t Put a space before the slash in addslash mode.
+addcolon=f Append ' 1:' and ' 2:' to read names, if not already present. Please include the flag 'int=t' if the reads are interleaved.
+underscore=f Change whitespace in read names to underscores.
+rcomp=f (rc) Reverse\-compliment reads.
+rcompmate=f (rcm) Reverse\-compliment read 2 only.
+changequality=t (cq) N bases always get a quality of 0 and ACGT bases get a min quality of 2.
+quantize=f Quantize qualities to a subset of values like NextSeq. Can also be used with comma\-delimited list, like quantize=0,8,13,22,27,32,37
+tuc=f (touppercase) Change lowercase letters in reads to uppercase.
+uniquenames=f Make duplicate names unique by appending _<number>.
+remap= A set of pairs: remap=CTGN will transform C>T and G>N.
+.IP
+Use remap1 and remap2 to specify read 1 or 2.
+.PP
+iupacToN=f (itn) Convert non\-ACGTN symbols to N.
+monitor=f Kill this process if it crashes. monitor=600,0.01 would kill after 600 seconds under 1% usage.
+crashjunk=t Crash when encountering reads with invalid bases.
+tossjunk=f Discard reads with invalid characters as bases.
+fixjunk=f Convert invalid bases to N.
+fixheaders=f Convert nonstandard header characters to standard ASCII.
+recalibrate=f (recal) Recalibrate quality scores. Must first generate matrices with CalcTrueQuality.
+maxcalledquality=41 Quality scores capped at this upper bound.
+mincalledquality=2 Quality scores of ACGT bases will be capped at lower bound.
+trimreaddescription=f (trd) Trim the names of reads after the first whitespace.
+trimrname=f For sam/bam files, trim rname/rnext fields after the first space.
+fixheaders=f Replace characters in headers such as space, *, and | to make them valid file names.
+warnifnosequence=t For fasta, issue a warning if a sequenceless header is encountered.
+warnfirsttimeonly=t Issue a warning for only the first sequenceless header.
+utot=f Convert U to T (for RNA \-> DNA translation).
+padleft=0 Pad the left end of sequences with this many symbols.
+padright=0 Pad the right end of sequences with this many symbols.
+pad=0 Set padleft and padright to the same value.
+padsymbol=N Symbol to use for padding.
+.PP
+Histogram output parameters:
+.PP
+bhist=<file> Base composition histogram by position.
+qhist=<file> Quality histogram by position.
+qchist=<file> Count of bases with each quality value.
+aqhist=<file> Histogram of average read quality.
+bqhist=<file> Quality histogram designed for box plots.
+lhist=<file> Read length histogram.
+gchist=<file> Read GC content histogram.
+gcbins=100 Number gchist bins. Set to 'auto' to use read length.
+gcplot=f Add a graphical representation to the gchist.
+maxhistlen=6000 Set an upper bound for histogram lengths; higher uses more memory.
+.IP
+The default is 6000 for some histograms and 80000 for others.
+.PP
+Histograms for sam files only (requires sam format 1.4 or higher):
+.PP
+ehist=<file> Errors\-per\-read histogram.
+qahist=<file> Quality accuracy histogram of error rates versus quality score.
+indelhist=<file> Indel length histogram.
+mhist=<file> Histogram of match, sub, del, and ins rates by read location.
+ihist=<file> Insert size histograms. Requires paired reads in a sam file.
+idhist=<file> Histogram of read count versus percent identity.
+idbins=100 Number idhist bins. Set to 'auto' to use read length.
+.PP
+Sampling parameters:
+.PP
+reads=\-1 Set to a positive number to only process this many INPUT reads (or pairs), then quit.
+skipreads=\-1 Skip (discard) this many INPUT reads before processing the rest.
+samplerate=1 Randomly output only this fraction of reads; 1 means sampling is disabled.
+sampleseed=\-1 Set to a positive number to use that prng seed for sampling (allowing deterministic sampling).
+samplereadstarget=0 (srt) Exact number of OUTPUT reads (or pairs) desired.
+samplebasestarget=0 (sbt) Exact number of OUTPUT bases desired.
+.IP
+Important: srt/sbt flags should not be used with stdin, samplerate, qtrim, minlength, or minavgquality.
+.PP
+upsample=f Allow srt/sbt to upsample (duplicate reads) when the target is greater than input.
+prioritizelength=f If true, calculate a length threshold to reach the target, and retain all reads of at least that length (must set srt or sbt).
+.PP
+Trimming and filtering parameters:
+.PP
+qtrim=f Trim read ends to remove bases with quality below trimq.
+.IP
+Values: t (trim both ends), f (neither end), r (right end only), l (left end only), w (sliding window).
+.PP
+trimq=6 Regions with average quality BELOW this will be trimmed. Can be a floating\-point number like 7.3.
+minlength=0 (ml) Reads shorter than this after trimming will be discarded. Pairs will be discarded only if both are shorter.
+mlf=0 (mlf) Reads shorter than this fraction of original length after trimming will be discarded.
+maxlength=0 If nonzero, reads longer than this after trimming will be discarded.
+breaklength=0 If nonzero, reads longer than this will be broken into multiple reads of this length. Does not work for paired reads.
+requirebothbad=t (rbb) Only discard pairs if both reads are shorter than minlen.
+invertfilters=f (invert) Output failing reads instead of passing reads.
+minavgquality=0 (maq) Reads with average quality (after trimming) below this will be discarded.
+maqb=0 If positive, calculate maq from this many initial bases.
+chastityfilter=f (cf) Reads with names containing ' 1:Y:' or ' 2:Y:' will be discarded.
+barcodefilter=f Remove reads with unexpected barcodes if barcodes is set, or barcodes containing 'N' otherwise.
+.IP
+A barcode must be the last part of the read header.
+.PP
+barcodes= Comma\-delimited list of barcodes or files of barcodes.
+maxns=\-1 If 0 or greater, reads with more Ns than this (after trimming) will be discarded.
+minconsecutivebases=0 (mcb) Discard reads without at least this many consecutive called bases.
+forcetrimleft=0 (ftl) If nonzero, trim left bases of the read to this position (exclusive, 0\-based).
+forcetrimright=0 (ftr) If nonzero, trim right bases of the read after this position (exclusive, 0\-based).
+forcetrimright2=0 (ftr2) If positive, trim this many bases on the right end.
+forcetrimmod=5 (ftm) If positive, trim length to be equal to zero modulo this number.
+mingc=0 Discard reads with GC content below this.
+maxgc=1 Discard reads with GC content above this.
+gcpairs=t Use average GC of paired reads.
+.IP
+Also affects gchist.
+.PP
+Sam and bam processing options:
+.PP
+mappedonly=f Toss unmapped reads.
+unmappedonly=f Toss mapped reads.
+pairedonly=f Toss reads that are not mapped as proper pairs.
+unpairedonly=f Toss reads that are mapped as proper pairs.
+primaryonly=f Toss secondary alignments. Set this to true for sam to fastq conversion.
+minmapq=\-1 If non\-negative, toss reads with mapq under this.
+maxmapq=\-1 If non\-negative, toss reads with mapq over this.
+requiredbits=0 (rbits) Toss sam lines with any of these flag bits unset. Similar to samtools \fB\-f\fR.
+filterbits=0 (fbits) Toss sam lines with any of these flag bits set. Similar to samtools \fB\-F\fR.
+stoptag=f Set to true to write a tag indicating read stop location, prefixed by YS:i:
+sam= Set to 'sam=1.3' to convert '=' and 'X' cigar symbols (from sam 1.4+ format) to 'M'.
+.IP
+Set to 'sam=1.4' to convert 'M' to '=' and 'X' (sam=1.4 requires MD tags to be present, or ref to be specified).
+.PP
+Sam and bam alignment filtering options:
+These require = and X symbols in cigar strings, or MD tags, or areference fasta.
+\fB\-1\fR means disabled; to filter reads with any of a symbol type, set to 0.
+.PP
+subfilter=\-1 Discard reads with more than this many substitutions.
+insfilter=\-1 Discard reads with more than this many insertions.
+delfilter=\-1 Discard reads with more than this many deletions.
+indelfilter=\-1 Discard reads with more than this many indels.
+editfilter=\-1 Discard reads with more than this many edits.
+inslenfilter=\-1 Discard reads with an insertion longer than this.
+dellenfilter=\-1 Discard reads with a deletion longer than this.
+idfilter=\-1.0 Discard reads with identity below this.
+clipfilter=\-1 Discard reads with more than this many soft\-clipped bases.
+.PP
+Kmer counting and cardinality estimation:
+k=0 If positive, count the total number of kmers.
+cardinality=f (loglog) Count unique kmers using the LogLog algorithm.
+loglogbuckets=1999 Use this many buckets for cardinality estimation.
+.PP
+Shortcuts:
+The # symbol will be substituted for 1 and 2. The % symbol in out will be substituted for input name minus extensions.
+For example:
+reformat.sh in=read#.fq out=%.fa
+\&...is equivalent to:
+reformat.sh in1=read1.fq in2=read2.fq out1=read1.fa out2=read2.fa
+.PP
+Java Parameters:
+\fB\-Xmx\fR This will set Java's memory usage, overriding autodetection.
+.TP
+\fB\-Xmx20g\fR will specify 20 gigs of RAM, and \fB\-Xmx200m\fR will specify 200 megs.
+The max is typically 85% of physical memory.
+.PP
+\fB\-eoom\fR This flag will cause the process to exit if an out\-of\-memory exception occurs. Requires Java 8u92+.
+\fB\-da\fR Disable assertions.
+.PP
+Please contact Brian Bushnell at bbushnell at lbl.gov if you encounter any problems.
+.SH AUTHOR
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
View it on GitLab: https://salsa.debian.org/med-team/bbmap/commit/3758d4c612caa1e1f97bf4d89e5ef85f1f08d4e2
--
View it on GitLab: https://salsa.debian.org/med-team/bbmap/commit/3758d4c612caa1e1f97bf4d89e5ef85f1f08d4e2
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20190404/cae8d8e2/attachment-0001.html>
More information about the debian-med-commit
mailing list