[med-svn] [Git][med-team/bbmap][master] Ad manpage for bbduk.sh

Andreas Tille gitlab at salsa.debian.org
Fri Apr 5 13:51:09 BST 2019



Andreas Tille pushed to branch master at Debian Med / bbmap


Commits:
01d39e41 by Andreas Tille at 2019-04-05T12:50:47Z
Ad manpage for bbduk.sh

- - - - -


1 changed file:

- + debian/mans/bbduk.sh.1


Changes:

=====================================
debian/mans/bbduk.sh.1
=====================================
@@ -0,0 +1,637 @@
+.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.47.8.
+.TH BBDUK.SH "1" "April 2019" "bbduk.sh 38.43" "User Commands"
+.SH NAME
+bbduk.sh \- Filters, trims, or masks reads with kmer matches to an artifact/contaminant file
+.SH SYNOPSIS
+.B bbduk.sh
+\fI\,in=<input file> out=<output file> ref=<contaminant files>\/\fR
+.SH DESCRIPTION
+Compares reads to the kmers in a reference dataset, optionally
+allowing an edit distance. Splits the reads into two outputs \- those that
+match the reference, and those that don't. Can also trim (remove) the matching
+parts of the reads rather than binning the reads.
+Please read bbmap/docs/guides/BBDukGuide.txt for more information.
+.PP
+Input may be stdin or a fasta or fastq file, compressed or uncompressed.
+If you pipe via stdin/stdout, please include the file type; e.g. for gzipped
+fasta input, set in=stdin.fa.gz
+.SH OPTIONS
+.SS Input parameters
+.TP
+in=<file>
+Main input. in=stdin.fq will pipe from stdin.
+.TP
+in2=<file>
+Input for 2nd read of pairs in a different file.
+.TP
+ref=<file,file>
+Comma\-delimited list of reference files.
+.IP
+In addition to filenames, you may also use the keywords:
+adapters, artifacts, phix, lambda, pjet, mtst, kapa
+.TP
+literal=<seq,seq>
+Comma\-delimited list of literal reference sequences.
+.TP
+touppercase=f
+(tuc) Change all bases upper\-case.
+.TP
+interleaved=auto
+(int) t/f overrides interleaved autodetection.
+.TP
+qin=auto
+Input quality offset: 33 (Sanger), 64, or auto.
+.TP
+reads=\-1
+If positive, quit after processing X reads or pairs.
+.TP
+copyundefined=f
+(cu) Process non\-AGCT IUPAC reference bases by making all
+possible unambiguous copies.
+Intended for short motifs
+or adapter barcodes, as time/memory use is exponential.
+.TP
+samplerate=1
+Set lower to only process a fraction of input reads.
+.TP
+samref=<file>
+Optional reference fasta for processing sam files.
+.SS Output parameters
+.TP
+out=<file>
+(outnonmatch) Write reads here that do not contain
+kmers matching the database.
+\&'out=stdout.fq' will pipe
+to standard out.
+.TP
+out2=<file>
+(outnonmatch2) Use this to write 2nd read of pairs to a
+different file.
+.TP
+outm=<file>
+(outmatch) Write reads here that fail filters.  In default
+kfilter mode, this means any read with a matching kmer.
+In any mode, it also includes reads that fail filters such
+as minlength, mingc, maxgc, entropy, etc.  In other words,
+it includes all reads that do not go to 'out'.
+.TP
+outm2=<file>
+(outmatch2) Use this to write 2nd read of pairs to a
+different file.
+.TP
+outs=<file>
+(outsingle) Use this to write singleton reads whose mate
+was trimmed shorter than minlen.
+.TP
+stats=<file>
+Write statistics about which contamininants were detected.
+.TP
+refstats=<file>
+Write statistics on a per\-reference\-file basis.
+.TP
+rpkm=<file>
+Write RPKM for each reference sequence (for RNA\-seq).
+.TP
+dump=<file>
+Dump kmer tables to a file, in fasta format.
+.TP
+duk=<file>
+Write statistics in duk's format. *DEPRECATED*
+.TP
+nzo=t
+Only write statistics about ref sequences with nonzero hits.
+.TP
+overwrite=t
+(ow) Grant permission to overwrite files.
+.TP
+showspeed=t
+(ss) 'f' suppresses display of processing speed.
+.TP
+ziplevel=2
+(zl) Compression level; 1 (min) through 9 (max).
+.TP
+fastawrap=70
+Length of lines in fasta output.
+.TP
+qout=auto
+Output quality offset: 33 (Sanger), 64, or auto.
+.TP
+statscolumns=3
+cols) Number of columns for stats output, 3 or 5.
+5 includes base counts.
+.TP
+rename=f
+Rename reads to indicate which sequences they matched.
+.TP
+refnames=f
+Use names of reference files rather than scaffold IDs.
+.TP
+trd=f
+Truncate read and ref names at the first whitespace.
+.TP
+ordered=f
+Set to true to output reads in same order as input.
+.TP
+maxbasesout=\-1
+If positive, quit after writing approximately this many
+bases to out (outu/outnonmatch).
+.TP
+maxbasesoutm=\-1
+If positive, quit after writing approximately this many
+bases to outm (outmatch).
+.TP
+json=f
+Print to screen in json format.
+.SS Histogram output parameters
+.TP
+bhist=<file>
+Base composition histogram by position.
+.TP
+qhist=<file>
+Quality histogram by position.
+.TP
+qchist=<file>
+Count of bases with each quality value.
+.TP
+aqhist=<file>
+Histogram of average read quality.
+.TP
+bqhist=<file>
+Quality histogram designed for box plots.
+.TP
+lhist=<file>
+Read length histogram.
+.TP
+phist=<file>
+Polymer length histogram.
+.TP
+gchist=<file>
+Read GC content histogram.
+.TP
+ihist=<file>
+Insert size histogram, for paired reads in mapped sam.
+.TP
+gcbins=100
+Number gchist bins.  Set to 'auto' to use read length.
+.TP
+maxhistlen=6000
+Set an upper bound for histogram lengths; higher uses
+more memory.
+The default is 6000 for some histograms
+and 80000 for others.
+.SS Histograms for mapped sam/bam files only
+.TP
+histbefore=t
+Calculate histograms from reads before processing.
+.TP
+ehist=<file>
+Errors\-per\-read histogram.
+.TP
+qahist=<file>
+Quality accuracy histogram of error rates versus quality
+score.
+.TP
+indelhist=<file>
+Indel length histogram.
+.TP
+mhist=<file>
+Histogram of match, sub, del, and ins rates by position.
+.TP
+idhist=<file>
+Histogram of read count versus percent identity.
+.TP
+idbins=100
+Number idhist bins.  Set to 'auto' to use read length.
+.TP
+varfile=<file>
+Ignore substitution errors listed in this file when
+calculating error rates.
+Can be generated with
+CallVariants.
+.TP
+vcf=<file>
+Ignore substitution errors listed in this VCF file
+when calculating error rates.
+.TP
+ignorevcfindels=t
+Also ignore indels listed in the VCF.
+.SS Processing parameters
+.TP
+k=27
+Kmer length used for finding contaminants.  Contaminants
+shorter than k will not be found.
+k must be at least 1.
+.TP
+rcomp=t
+Look for reverse\-complements of kmers in addition to
+forward kmers.
+.TP
+maskmiddle=t
+(mm) Treat the middle base of a kmer as a wildcard, to
+increase sensitivity in the presence of errors.
+.TP
+minkmerhits=1
+(mkh) Reads need at least this many matching kmers
+to be considered as matching the reference.
+.TP
+minkmerfraction=0.0
+(mkf) A reads needs at least this fraction of its total
+kmers to hit a ref, in order to be considered a match.
+If this and minkmerhits are set, the greater is used.
+.TP
+mincovfraction=0.0
+(mcf) A reads needs at least this fraction of its total
+bases to be covered by ref kmers to be considered a match.
+If specified, mcf overrides mkh and mkf.
+.TP
+hammingdistance=0
+(hdist) Maximum Hamming distance for ref kmers (subs only).
+Memory use is proportional to (3*K)^hdist.
+.TP
+qhdist=0
+Hamming distance for query kmers; impacts speed, not memory.
+.TP
+editdistance=0
+(edist) Maximum edit distance from ref kmers (subs
+and indels).
+.IP
+Memory use is proportional to (8*K)^edist.
+.TP
+hammingdistance2=0
+(hdist2) Sets hdist for short kmers, when using mink.
+.TP
+qhdist2=0
+Sets qhdist for short kmers, when using mink.
+.TP
+editdistance2=0
+(edist2) Sets edist for short kmers, when using mink.
+.TP
+forbidn=f
+(fn) Forbids matching of read kmers containing N.
+.IP
+By default, these will match a reference 'A' if
+hdist>0 or edist>0, to increase sensitivity.
+.TP
+removeifeitherbad=t
+(rieb) Paired reads get sent to 'outmatch' if either is
+match (or either is trimmed shorter than minlen).
+Set to false to require both.
+.TP
+trimfailures=f
+Instead of discarding failed reads, trim them to 1bp.
+.IP
+This makes the statistics a bit odd.
+.TP
+findbestmatch=f
+(fbm) If multiple matches, associate read with sequence
+sharing most kmers.
+Reduces speed.
+.TP
+skipr1=f
+Don't do kmer\-based operations on read 1.
+.TP
+skipr2=f
+Don't do kmer\-based operations on read 2.
+.TP
+ecco=f
+For overlapping paired reads only.  Performs errorcorrection with BBMerge prior to kmer operations.
+.TP
+recalibrate=f
+(recal) Recalibrate quality scores.  Requires calibration
+matrices generated by CalcTrueQuality.
+.TP
+sam=<file,file>
+If recalibration is desired, and matrices have not already
+been generated, BBDuk will create them from the sam file.
+.TP
+amino=f
+Run in amino acid mode.  Some features have not been
+tested, but kmer\-matching works fine.
+Maximum k is 12.
+.SS Speed and Memory parameters
+.TP
+threads=auto
+(t) Set number of threads to use; default is number of
+logical processors.
+.TP
+prealloc=f
+Preallocate memory in table.  Allows faster table loading
+and more efficient memory usage, for a large reference.
+.TP
+monitor=f
+Kill this process if it crashes.  monitor=600,0.01 would
+kill after 600 seconds under 1% usage.
+.TP
+minrskip=1
+(mns) Force minimal skip interval when indexing reference
+kmers.
+.IP
+1 means use all, 2 means use every other kmer, etc.
+.TP
+maxrskip=1
+(mxs) Restrict maximal skip interval when indexing
+reference kmers. Normally all are used for scaffolds<100kb,
+but with longer scaffolds, up to maxrskip\-1 are skipped.
+.TP
+rskip=
+Set both minrskip and maxrskip to the same value.
+.IP
+If not set, rskip will vary based on sequence length.
+.TP
+qskip=1
+Skip query kmers to increase speed.  1 means use all.
+.TP
+speed=0
+Ignore this fraction of kmer space (0\-15 out of 16) in both
+reads and reference.
+Increases speed and reduces memory.
+.IP
+Note: Do not use more than one of 'speed', 'qskip', and 'rskip'.
+.SS Trimming/Filtering/Masking parameters
+Note \- if ktrim, kmask, and ksplit are unset, the default behavior is kfilter.
+All kmer processing modes are mutually exclusive.
+Reads only get sent to 'outm' purely based on kmer matches in kfilter mode.
+.TP
+ktrim=f
+Trim reads to remove bases matching reference kmers.
+.IP
+Values:
+.IP
+f (don't trim),
+.IP
+r (trim to the right),
+.IP
+l (trim to the left)
+.TP
+kmask=
+Replace bases matching ref kmers with another symbol.
+.TP
+Allows any non\-whitespace character, and processes short
+kmers on both ends if mink is set.  'kmask=lc' will
+convert masked bases to lowercase.
+.PP
+maskfullycovered=f  (mfc) Only mask bases that are fully covered by kmers.
+ksplit=f            For single\-ended reads only.  Reads will be split into
+.TP
+pairs around the kmer.
+If the kmer is at the end of the
+.TP
+read, it will be trimmed instead.
+Singletons will go to
+.TP
+out, and pairs will go to outm.
+Do not use ksplit with
+.IP
+other operations such as quality\-trimming or filtering.
+.PP
+mink=0              Look for shorter kmers at read tips down to this length,
+.TP
+when k\-trimming or masking.
+0 means disabled.  Enabling
+.IP
+this will disable maskmiddle.
+.PP
+qtrim=f             Trim read ends to remove bases with quality below trimq.
+.TP
+Performed AFTER looking for kmers.
+Values:
+.TP
+rl (trim both ends),
+f (neither end),
+r (right end only),
+l (left end only),
+w (sliding window).
+.TP
+trimq=6
+Regions with average quality BELOW this will be trimmed,
+if qtrim is set to something other than f.
+Can be a
+floating\-point number like 7.3.
+.TP
+trimclip=f
+Trim soft\-clipped bases from sam files.
+.TP
+minlength=10
+(ml) Reads shorter than this after trimming will be
+discarded.
+Pairs will be discarded if both are shorter.
+.TP
+mlf=0
+(minlengthfraction) Reads shorter than this fraction of
+original length after trimming will be discarded.
+.TP
+maxlength=
+Reads longer than this after trimming will be discarded.
+Pairs will be discarded only if both are longer.
+.TP
+minavgquality=0
+(maq) Reads with average quality (after trimming) below
+this will be discarded.
+.TP
+maqb=0
+If positive, calculate maq from this many initial bases.
+.TP
+minbasequality=0
+(mbq) Reads with any base below this quality (after
+trimming) will be discarded.
+.TP
+maxns=\-1
+If non\-negative, reads with more Ns than this
+(after trimming) will be discarded.
+.TP
+mcb=0
+(minconsecutivebases) Discard reads without at least
+this many consecutive called bases.
+.TP
+ottm=f
+(outputtrimmedtomatch) Output reads trimmed to shorter
+than minlength to outm rather than discarding.
+.TP
+tp=0
+(trimpad) Trim this much extra around matching kmers.
+.TP
+tbo=f
+(trimbyoverlap) Trim adapters based on where paired
+reads overlap.
+.TP
+strictoverlap=t
+Adjust sensitivity for trimbyoverlap mode.
+.TP
+minoverlap=14
+Require this many bases of overlap for detection.
+.TP
+mininsert=40
+Require insert size of at least this for overlap.
+Should be reduced to 16 for small RNA sequencing.
+.TP
+tpe=f
+(trimpairsevenly) When kmer right\-trimming, trim both
+reads to the minimum length of either.
+.TP
+forcetrimleft=0
+(ftl) If positive, trim bases to the left of this position
+(exclusive, 0\-based).
+.TP
+forcetrimright=0
+(ftr) If positive, trim bases to the right of this position
+(exclusive, 0\-based).
+.TP
+forcetrimright2=0
+(ftr2) If positive, trim this many bases on the right end.
+.TP
+forcetrimmod=0
+(ftm) If positive, right\-trim length to be equal to zero,
+modulo this number.
+.TP
+restrictleft=0
+If positive, only look for kmer matches in the
+leftmost X bases.
+.TP
+restrictright=0
+If positive, only look for kmer matches in the
+rightmost X bases.
+.TP
+mingc=0
+Discard reads with GC content below this.
+.TP
+maxgc=1
+Discard reads with GC content above this.
+.TP
+gcpairs=t
+Use average GC of paired reads.
+Also affects gchist.
+.TP
+tossjunk=f
+Discard reads with invalid characters as bases.
+.TP
+swift=f
+Trim Swift sequences: Trailing C/T/N R1, leading G/A/N R2.
+.SS Header\-parsing parameters \- these require Illumina headers
+.TP
+chastityfilter=f
+(cf) Discard reads with id containing ' 1:Y:' or ' 2:Y:'.
+.TP
+barcodefilter=f
+Remove reads with unexpected barcodes if barcodes is set,
+
+or barcodes containing 'N' otherwise.
+A barcode must be
+the last part of the read header.
+.IP
+Values:
+.IP
+t: Remove reads with bad barcodes.
+.IP
+f: Ignore barcodes.
+.IP
+crash: Crash upon encountering bad barcodes.
+.TP
+barcodes=
+Comma\-delimited list of barcodes or files of barcodes.
+.TP
+xmin=\-1
+If positive, discard reads with a lesser X coordinate.
+.TP
+ymin=\-1
+If positive, discard reads with a lesser Y coordinate.
+.TP
+xmax=\-1
+If positive, discard reads with a greater X coordinate.
+.TP
+ymax=\-1
+If positive, discard reads with a greater Y coordinate.
+.SS Polymer trimming
+.TP
+trimpolya=0
+If greater than 0, trim poly\-A or poly\-T tails of
+at least this length on either end of reads.
+.TP
+trimpolygleft=0
+If greater than 0, trim poly\-G prefixes of at least this
+length on the left end of reads.
+Does not trim poly\-C.
+.TP
+trimpolygright=0
+If greater than 0, trim poly\-G tails of at least this
+length on the right end of reads.
+Does not trim poly\-C.
+.TP
+trimpolyg=0
+This sets both left and right at once.
+.TP
+filterpolyg=0
+If greater than 0, remove reads with a poly\-G prefix of
+at least this length (on the left).
+.IP
+Note: there are also equivalent poly\-C flags.
+.SS Polymer tracking
+.TP
+pratio=base,base
+'pratio=G,C' will print the ratio of G to C polymers.
+.TP
+plen=20
+Length of homopolymers to count.
+.SS Entropy/Complexity parameters
+.TP
+entropy=\-1
+Set between 0 and 1 to filter reads with entropy below
+that value.
+Higher is more stringent.
+.TP
+entropywindow=50
+Calculate entropy using a sliding window of this length.
+.TP
+entropyk=5
+Calculate entropy using kmers of this length.
+.TP
+minbasefrequency=0
+Discard reads with a minimum base frequency below this.
+.TP
+entropymask=f
+Values:
+.IP
+f: Discard low\-entropy sequences.
+.IP
+t: Mask low\-entropy parts of sequences with N.
+.IP
+lc: Change low\-entropy parts of sequences to lowercase.
+.TP
+entropymark=f
+Mark each base with its entropy value.  This is on a scale
+of 0\-41 and is reported as quality scores, so the output
+should be fastq or fasta+qual.
+.SS Cardinality estimation
+.TP
+cardinality=f
+(loglog) Count unique kmers using the LogLog algorithm.
+.TP
+cardinalityout=f
+(loglogout) Count unique kmers in output reads.
+.TP
+loglogk=31
+Use this kmer length for counting.
+.TP
+loglogbuckets=1999
+Use this many buckets for counting.
+.SS Java Parameters
+.TP
+\fB\-Xmx\fR
+This will set Java's memory usage, overriding autodetection.
+.IP
+\fB\-Xmx20g\fR will
+specify 20 gigs of RAM, and \fB\-Xmx200m\fR will specify 200 megs.
+The max is typically 85% of physical memory.
+.TP
+\fB\-eoom\fR
+This flag will cause the process to exit if an
+out\-of\-memory exception occurs.
+Requires Java 8u92+.
+.TP
+\fB\-da\fR
+Disable assertions.
+.SH AUTHOR
+Written by Brian Bushnell, Last modified March 21, 2019
+.P
+Please contact Brian Bushnell at bbushnell at lbl.gov if you encounter any problems.
+.P
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.



View it on GitLab: https://salsa.debian.org/med-team/bbmap/commit/01d39e4192a6c9a91d85356e7ccac7e2fa42bf3a

-- 
View it on GitLab: https://salsa.debian.org/med-team/bbmap/commit/01d39e4192a6c9a91d85356e7ccac7e2fa42bf3a
You're receiving this email because of your account on salsa.debian.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20190405/d2a00845/attachment-0001.html>


More information about the debian-med-commit mailing list