01d39e41 by Andreas Tille at 2019-04-05T12:50:47Z
Ad manpage for bbduk.sh
- - - - -
1 changed file:
- + debian/mans/bbduk.sh.1
@@ -0,0 +1,637 @@
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
+.TH BBDUK.SH "1" "April 2019" "bbduk.sh 38.43" "User Commands"
+bbduk.sh \- Filters, trims, or masks reads with kmer matches to an artifact/contaminant file
+.B bbduk.sh
+\fI\,in=<input file> out=<output file> ref=<contaminant files>\/\fR
+Compares reads to the kmers in a reference dataset, optionally
+allowing an edit distance. Splits the reads into two outputs \- those that
+match the reference, and those that don't. Can also trim (remove) the matching
+parts of the reads rather than binning the reads.
+Please read bbmap/docs/guides/BBDukGuide.txt for more information.
+Input may be stdin or a fasta or fastq file, compressed or uncompressed.
+If you pipe via stdin/stdout, please include the file type; e.g. for gzipped
+fasta input, set in=stdin.fa.gz
+.SS Input parameters
+Main input. in=stdin.fq will pipe from stdin.
+Input for 2nd read of pairs in a different file.
+Comma\-delimited list of reference files.
+In addition to filenames, you may also use the keywords:
+adapters, artifacts, phix, lambda, pjet, mtst, kapa
+Comma\-delimited list of literal reference sequences.
+(tuc) Change all bases upper\-case.
+(int) t/f overrides interleaved autodetection.
+Input quality offset: 33 (Sanger), 64, or auto.
+If positive, quit after processing X reads or pairs.
+(cu) Process non\-AGCT IUPAC reference bases by making all
+possible unambiguous copies.
+Intended for short motifs
+or adapter barcodes, as time/memory use is exponential.
+Set lower to only process a fraction of input reads.
+Optional reference fasta for processing sam files.
+.SS Output parameters
+(outnonmatch) Write reads here that do not contain
+kmers matching the database.
+\&'out=stdout.fq' will pipe
+to standard out.
+(outnonmatch2) Use this to write 2nd read of pairs to a
+different file.
+(outmatch) Write reads here that fail filters. In default
+kfilter mode, this means any read with a matching kmer.
+In any mode, it also includes reads that fail filters such
+as minlength, mingc, maxgc, entropy, etc. In other words,
+it includes all reads that do not go to 'out'.
+(outmatch2) Use this to write 2nd read of pairs to a
+different file.
+(outsingle) Use this to write singleton reads whose mate
+was trimmed shorter than minlen.
+Write statistics about which contamininants were detected.
+Write statistics on a per\-reference\-file basis.
+Write RPKM for each reference sequence (for RNA\-seq).
+Dump kmer tables to a file, in fasta format.
+Write statistics in duk's format. *DEPRECATED*
+Only write statistics about ref sequences with nonzero hits.
+(ow) Grant permission to overwrite files.
+(ss) 'f' suppresses display of processing speed.
+(zl) Compression level; 1 (min) through 9 (max).
+Length of lines in fasta output.
+Output quality offset: 33 (Sanger), 64, or auto.
+cols) Number of columns for stats output, 3 or 5.
+5 includes base counts.
+Rename reads to indicate which sequences they matched.
+Use names of reference files rather than scaffold IDs.
+Truncate read and ref names at the first whitespace.
+Set to true to output reads in same order as input.
+If positive, quit after writing approximately this many
+bases to out (outu/outnonmatch).
+If positive, quit after writing approximately this many
+bases to outm (outmatch).
+Print to screen in json format.
+.SS Histogram output parameters
+Base composition histogram by position.
+Quality histogram by position.
+Count of bases with each quality value.
+Histogram of average read quality.
+Quality histogram designed for box plots.
+Read length histogram.
+Polymer length histogram.
+Read GC content histogram.
+Insert size histogram, for paired reads in mapped sam.
+Number gchist bins. Set to 'auto' to use read length.
+Set an upper bound for histogram lengths; higher uses
+more memory.
+The default is 6000 for some histograms
+and 80000 for others.
+.SS Histograms for mapped sam/bam files only
+Calculate histograms from reads before processing.
+Errors\-per\-read histogram.
+Quality accuracy histogram of error rates versus quality
+Indel length histogram.
+Histogram of match, sub, del, and ins rates by position.
+Histogram of read count versus percent identity.
+Number idhist bins. Set to 'auto' to use read length.
+Ignore substitution errors listed in this file when
+calculating error rates.
+Can be generated with
+Ignore substitution errors listed in this VCF file
+when calculating error rates.
+Also ignore indels listed in the VCF.
+.SS Processing parameters
+Kmer length used for finding contaminants. Contaminants
+shorter than k will not be found.
+k must be at least 1.
+Look for reverse\-complements of kmers in addition to
+forward kmers.
+(mm) Treat the middle base of a kmer as a wildcard, to
+increase sensitivity in the presence of errors.
+(mkh) Reads need at least this many matching kmers
+to be considered as matching the reference.
+(mkf) A reads needs at least this fraction of its total
+kmers to hit a ref, in order to be considered a match.
+If this and minkmerhits are set, the greater is used.
+(mcf) A reads needs at least this fraction of its total
+bases to be covered by ref kmers to be considered a match.
+If specified, mcf overrides mkh and mkf.
+(hdist) Maximum Hamming distance for ref kmers (subs only).
+Memory use is proportional to (3*K)^hdist.
+Hamming distance for query kmers; impacts speed, not memory.
+(edist) Maximum edit distance from ref kmers (subs
+and indels).
+Memory use is proportional to (8*K)^edist.
+(hdist2) Sets hdist for short kmers, when using mink.
+Sets qhdist for short kmers, when using mink.
+(edist2) Sets edist for short kmers, when using mink.
+(fn) Forbids matching of read kmers containing N.
+By default, these will match a reference 'A' if
+hdist>0 or edist>0, to increase sensitivity.
+(rieb) Paired reads get sent to 'outmatch' if either is
+match (or either is trimmed shorter than minlen).
+Set to false to require both.
+Instead of discarding failed reads, trim them to 1bp.
+This makes the statistics a bit odd.
+(fbm) If multiple matches, associate read with sequence
+sharing most kmers.
+Reduces speed.
+Don't do kmer\-based operations on read 1.
+Don't do kmer\-based operations on read 2.
+For overlapping paired reads only. Performs errorcorrection with BBMerge prior to kmer operations.
+(recal) Recalibrate quality scores. Requires calibration
+matrices generated by CalcTrueQuality.
+If recalibration is desired, and matrices have not already
+been generated, BBDuk will create them from the sam file.
+Run in amino acid mode. Some features have not been
+tested, but kmer\-matching works fine.
+Maximum k is 12.
+.SS Speed and Memory parameters
+(t) Set number of threads to use; default is number of
+logical processors.
+Preallocate memory in table. Allows faster table loading
+and more efficient memory usage, for a large reference.
+Kill this process if it crashes. monitor=600,0.01 would
+kill after 600 seconds under 1% usage.
+(mns) Force minimal skip interval when indexing reference
+1 means use all, 2 means use every other kmer, etc.
+(mxs) Restrict maximal skip interval when indexing
+reference kmers. Normally all are used for scaffolds<100kb,
+but with longer scaffolds, up to maxrskip\-1 are skipped.
+Set both minrskip and maxrskip to the same value.
+If not set, rskip will vary based on sequence length.
+Skip query kmers to increase speed. 1 means use all.
+Ignore this fraction of kmer space (0\-15 out of 16) in both
+reads and reference.
+Increases speed and reduces memory.
+Note: Do not use more than one of 'speed', 'qskip', and 'rskip'.
+.SS Trimming/Filtering/Masking parameters
+Note \- if ktrim, kmask, and ksplit are unset, the default behavior is kfilter.
+All kmer processing modes are mutually exclusive.
+Reads only get sent to 'outm' purely based on kmer matches in kfilter mode.
+Trim reads to remove bases matching reference kmers.
+f (don't trim),
+r (trim to the right),
+l (trim to the left)
+Replace bases matching ref kmers with another symbol.
+Allows any non\-whitespace character, and processes short
+kmers on both ends if mink is set. 'kmask=lc' will
+convert masked bases to lowercase.
+maskfullycovered=f (mfc) Only mask bases that are fully covered by kmers.
+ksplit=f For single\-ended reads only. Reads will be split into
+pairs around the kmer.
+If the kmer is at the end of the
+read, it will be trimmed instead.
+Singletons will go to
+out, and pairs will go to outm.
+Do not use ksplit with
+other operations such as quality\-trimming or filtering.
+mink=0 Look for shorter kmers at read tips down to this length,
+when k\-trimming or masking.
+0 means disabled. Enabling
+this will disable maskmiddle.
+qtrim=f Trim read ends to remove bases with quality below trimq.
+Performed AFTER looking for kmers.
+rl (trim both ends),
+f (neither end),
+r (right end only),
+l (left end only),
+w (sliding window).
+Regions with average quality BELOW this will be trimmed,
+if qtrim is set to something other than f.
+Can be a
+floating\-point number like 7.3.
+Trim soft\-clipped bases from sam files.
+(ml) Reads shorter than this after trimming will be
+Pairs will be discarded if both are shorter.
+(minlengthfraction) Reads shorter than this fraction of
+original length after trimming will be discarded.
+Reads longer than this after trimming will be discarded.
+Pairs will be discarded only if both are longer.
+(maq) Reads with average quality (after trimming) below
+this will be discarded.
+If positive, calculate maq from this many initial bases.
+(mbq) Reads with any base below this quality (after
+trimming) will be discarded.
+If non\-negative, reads with more Ns than this
+(after trimming) will be discarded.
+(minconsecutivebases) Discard reads without at least
+this many consecutive called bases.
+(outputtrimmedtomatch) Output reads trimmed to shorter
+than minlength to outm rather than discarding.
+(trimpad) Trim this much extra around matching kmers.
+(trimbyoverlap) Trim adapters based on where paired
+reads overlap.
+Adjust sensitivity for trimbyoverlap mode.
+Require this many bases of overlap for detection.
+Require insert size of at least this for overlap.
+Should be reduced to 16 for small RNA sequencing.
+(trimpairsevenly) When kmer right\-trimming, trim both
+reads to the minimum length of either.
+(ftl) If positive, trim bases to the left of this position
+(exclusive, 0\-based).
+(ftr) If positive, trim bases to the right of this position
+(exclusive, 0\-based).
+(ftr2) If positive, trim this many bases on the right end.
+(ftm) If positive, right\-trim length to be equal to zero,
+modulo this number.
+If positive, only look for kmer matches in the
+leftmost X bases.
+If positive, only look for kmer matches in the
+rightmost X bases.
+Discard reads with GC content below this.
+Discard reads with GC content above this.
+Use average GC of paired reads.
+Also affects gchist.
+Discard reads with invalid characters as bases.
+Trim Swift sequences: Trailing C/T/N R1, leading G/A/N R2.
+.SS Header\-parsing parameters \- these require Illumina headers
+(cf) Discard reads with id containing ' 1:Y:' or ' 2:Y:'.
+Remove reads with unexpected barcodes if barcodes is set,
+or barcodes containing 'N' otherwise.
+A barcode must be
+the last part of the read header.
+t: Remove reads with bad barcodes.
+f: Ignore barcodes.
+crash: Crash upon encountering bad barcodes.
+Comma\-delimited list of barcodes or files of barcodes.
+If positive, discard reads with a lesser X coordinate.
+If positive, discard reads with a lesser Y coordinate.
+If positive, discard reads with a greater X coordinate.
+If positive, discard reads with a greater Y coordinate.
+.SS Polymer trimming
+If greater than 0, trim poly\-A or poly\-T tails of
+at least this length on either end of reads.
+If greater than 0, trim poly\-G prefixes of at least this
+length on the left end of reads.
+Does not trim poly\-C.
+If greater than 0, trim poly\-G tails of at least this
+length on the right end of reads.
+Does not trim poly\-C.
+This sets both left and right at once.
+If greater than 0, remove reads with a poly\-G prefix of
+at least this length (on the left).
+Note: there are also equivalent poly\-C flags.
+.SS Polymer tracking
+'pratio=G,C' will print the ratio of G to C polymers.
+Length of homopolymers to count.
+.SS Entropy/Complexity parameters
+Set between 0 and 1 to filter reads with entropy below
+that value.
+Higher is more stringent.
+Calculate entropy using a sliding window of this length.
+Calculate entropy using kmers of this length.
+Discard reads with a minimum base frequency below this.
+f: Discard low\-entropy sequences.
+t: Mask low\-entropy parts of sequences with N.
+lc: Change low\-entropy parts of sequences to lowercase.
+Mark each base with its entropy value. This is on a scale
+of 0\-41 and is reported as quality scores, so the output
+should be fastq or fasta+qual.
+.SS Cardinality estimation
+(loglog) Count unique kmers using the LogLog algorithm.
+(loglogout) Count unique kmers in output reads.
+Use this kmer length for counting.
+Use this many buckets for counting.
+.SS Java Parameters
+This will set Java's memory usage, overriding autodetection.
+\fB\-Xmx20g\fR will
+specify 20 gigs of RAM, and \fB\-Xmx200m\fR will specify 200 megs.
+The max is typically 85% of physical memory.
+This flag will cause the process to exit if an
+out\-of\-memory exception occurs.
+Requires Java 8u92+.
+Disable assertions.
+Written by Brian Bushnell, Last modified March 21, 2019
+Please contact Brian Bushnell at bbushnell at lbl.gov if you encounter any problems.
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
