[med-svn] [Git][med-team/bbmap][master] Add manpage for reformat.sh

Andreas Tille gitlab at salsa.debian.org
Sat Apr 6 07:13:13 BST 2019



Andreas Tille pushed to branch master at Debian Med / bbmap


Commits:
46a3f1ba by Andreas Tille at 2019-04-06T06:07:29Z
Add manpage for reformat.sh

- - - - -


1 changed file:

- + debian/mans/reformat.sh.1


Changes:

=====================================
debian/mans/reformat.sh.1
=====================================
@@ -0,0 +1,435 @@
+.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.47.8.
+.TH REFORMAT.SH "1" "April 2019" "reformat.sh 38.43" "User Commands"
+.SH NAME
+reformat.sh \- Reformats reads between fasta/fastq/scarf/fasta+qual/sam, interleaved/paired, and ASCII-33/64
+.SH SYNOPSIS
+.B reformat.sh
+\fI\,in=<file> in2=<file2> out=<outfile> out2=<outfile2>\/\fR
+.SH DESCRIPTION
+Reformats reads to change ASCII quality encoding, interleaving, file format, or compression format.
+Optionally performs additional functions such as quality trimming, subsetting, and subsampling.
+Supports fastq, fasta, fasta+qual, scarf, oneline, sam, bam, gzip, bz2.
+Please read bbmap/docs/guides/ReformatGuide.txt for more information.
+.PP
+in2 and out2 are for paired reads and are optional.
+If input is paired and there is only one output file, it will be written interleaved.
+.SH OPTIONS
+.SS Parameters and their defaults:
+.TP
+ow=f
+(overwrite) Overwrites files that already exist.
+.TP
+app=f
+(append) Append to files that already exist.
+.TP
+zl=4
+(ziplevel) Set compression level, 1 (low) to 9 (max).
+.TP
+int=f
+(interleaved) Determines whether INPUT file is considered interleaved.
+.TP
+fastawrap=70
+Length of lines in fasta output.
+.TP
+fastareadlen=0
+Set to a non\-zero number to break fasta files into reads of at most this length.
+.TP
+fastaminlen=1
+Ignore fasta reads shorter than this.
+.TP
+qin=auto
+ASCII offset for input quality.  May be 33 (Sanger), 64 (Illumina), or auto.
+.TP
+qout=auto
+ASCII offset for output quality.  May be 33 (Sanger), 64 (Illumina), or auto (same as input).
+.TP
+qfake=30
+Quality value used for fasta to fastq reformatting.
+.TP
+qfin=<.qual file>
+Read qualities from this qual file, for the reads coming from 'in=<fasta file>'
+.TP
+qfin2=<.qual file>
+Read qualities from this qual file, for the reads coming from 'in2=<fasta file>'
+.TP
+qfout=<.qual file>
+Write qualities from this qual file, for the reads going to 'out=<fasta file>'
+.TP
+qfout2=<.qual file>
+Write qualities from this qual file, for the reads coming from 'out2=<fasta file>'
+.TP
+outsingle=<file>
+(outs) If a read is longer than minlength and its mate is shorter, the longer one goes here.
+.TP
+deleteinput=f
+Delete input upon successful completion.
+.TP
+ref=<file>
+Optional reference fasta for sam processing.
+.SS Processing Parameters
+.TP
+verifypaired=f
+(vpair) When true, checks reads to see if the names look paired.  Prints an error message if not.
+.TP
+verifyinterleaved=f
+(vint) sets 'vpair' to true and 'interleaved' to true.
+.TP
+allowidenticalnames=f
+(ain) When verifying pair names, allows identical names, instead of requiring \fI\,/1\/\fP and \fI\,/2\/\fP or 1: and 2:
+.TP
+tossbrokenreads=f
+(tbr) Discard reads that have different numbers of bases and qualities.  By default this will be detected and cause a crash.
+.TP
+ignorebadquality=f
+(ibq) Fix out\-of\-range quality values instead of crashing with a warning.
+.TP
+addslash=f
+Append ' /1' and ' /2' to read names, if not already present.  Please include the flag 'int=t' if the reads are interleaved.
+.TP
+spaceslash=t
+Put a space before the slash in addslash mode.
+.TP
+addcolon=f
+Append ' 1:' and ' 2:' to read names, if not already present.  Please include the flag 'int=t' if the reads are interleaved.
+.TP
+underscore=f
+Change whitespace in read names to underscores.
+.TP
+rcomp=f
+(rc) Reverse\-compliment reads.
+.TP
+rcompmate=f
+(rcm) Reverse\-compliment read 2 only.
+.TP
+changequality=t
+(cq) N bases always get a quality of 0 and ACGT bases get a min quality of 2.
+.TP
+quantize=f
+Quantize qualities to a subset of values like NextSeq.  Can also be used with comma\-delimited list, like quantize=0,8,13,22,27,32,37
+.TP
+tuc=f
+(touppercase) Change lowercase letters in reads to uppercase.
+.TP
+uniquenames=f
+Make duplicate names unique by appending _<number>.
+.TP
+remap=
+A set of pairs: remap=CTGN will transform C>T and G>N.
+.IP
+Use remap1 and remap2 to specify read 1 or 2.
+.TP
+iupacToN=f
+(itn) Convert non\-ACGTN symbols to N.
+.TP
+monitor=f
+Kill this process if it crashes.  monitor=600,0.01 would kill after 600 seconds under 1% usage.
+.TP
+crashjunk=t
+Crash when encountering reads with invalid bases.
+.TP
+tossjunk=f
+Discard reads with invalid characters as bases.
+.TP
+fixjunk=f
+Convert invalid bases to N.
+.TP
+fixheaders=f
+Convert nonstandard header characters to standard ASCII.
+.TP
+recalibrate=f
+(recal) Recalibrate quality scores.  Must first generate matrices with CalcTrueQuality.
+.TP
+maxcalledquality=41
+Quality scores capped at this upper bound.
+.TP
+mincalledquality=2
+Quality scores of ACGT bases will be capped at lower bound.
+.TP
+trimreaddescription=f
+(trd) Trim the names of reads after the first whitespace.
+.TP
+trimrname=f
+For sam/bam files, trim rname/rnext fields after the first space.
+.TP
+fixheaders=f
+Replace characters in headers such as space, *, and | to make them valid file names.
+.TP
+warnifnosequence=t
+For fasta, issue a warning if a sequenceless header is encountered.
+.TP
+warnfirsttimeonly=t
+Issue a warning for only the first sequenceless header.
+.TP
+utot=f
+Convert U to T (for RNA \-> DNA translation).
+.TP
+padleft=0
+Pad the left end of sequences with this many symbols.
+.TP
+padright=0
+Pad the right end of sequences with this many symbols.
+.TP
+pad=0
+Set padleft and padright to the same value.
+.TP
+padsymbol=N
+Symbol to use for padding.
+.SS Histogram output parameters
+.TP
+bhist=<file>
+Base composition histogram by position.
+.TP
+qhist=<file>
+Quality histogram by position.
+.TP
+qchist=<file>
+Count of bases with each quality value.
+.TP
+aqhist=<file>
+Histogram of average read quality.
+.TP
+bqhist=<file>
+Quality histogram designed for box plots.
+.TP
+lhist=<file>
+Read length histogram.
+.TP
+gchist=<file>
+Read GC content histogram.
+.TP
+gcbins=100
+Number gchist bins.  Set to 'auto' to use read length.
+.TP
+gcplot=f
+Add a graphical representation to the gchist.
+.TP
+maxhistlen=6000
+Set an upper bound for histogram lengths; higher uses more memory.
+.IP
+The default is 6000 for some histograms and 80000 for others.
+.SS Histograms for sam files only (requires sam format 1.4 or higher):
+.TP
+ehist=<file>
+Errors\-per\-read histogram.
+.TP
+qahist=<file>
+Quality accuracy histogram of error rates versus quality score.
+.TP
+indelhist=<file>
+Indel length histogram.
+.TP
+mhist=<file>
+Histogram of match, sub, del, and ins rates by read location.
+.TP
+ihist=<file>
+Insert size histograms.  Requires paired reads in a sam file.
+.TP
+idhist=<file>
+Histogram of read count versus percent identity.
+.TP
+idbins=100
+Number idhist bins.  Set to 'auto' to use read length.
+.SS Sampling parameters
+.TP
+reads=\-1
+Set to a positive number to only process this many INPUT reads (or pairs), then quit.
+.TP
+skipreads=\-1
+Skip (discard) this many INPUT reads before processing the rest.
+.TP
+samplerate=1
+Randomly output only this fraction of reads; 1 means sampling is disabled.
+.TP
+sampleseed=\-1
+Set to a positive number to use that prng seed for sampling (allowing deterministic sampling).
+.TP
+samplereadstarget=0
+(srt) Exact number of OUTPUT reads (or pairs) desired.
+.TP
+samplebasestarget=0
+(sbt) Exact number of OUTPUT bases desired.
+.IP
+Important: srt/sbt flags should not be used with stdin, samplerate, qtrim, minlength, or minavgquality.
+.TP
+upsample=f
+Allow srt/sbt to upsample (duplicate reads) when the target is greater than input.
+.TP
+prioritizelength=f
+If true, calculate a length threshold to reach the target, and retain all reads of at least that length (must set srt or sbt).
+.SS Trimming and filtering parameters
+.TP
+qtrim=f
+Trim read ends to remove bases with quality below trimq.
+.IP
+Values: t (trim both ends), f (neither end), r (right end only), l (left end only), w (sliding window).
+.TP
+trimq=6
+Regions with average quality BELOW this will be trimmed.  Can be a floating\-point number like 7.3.
+.TP
+minlength=0
+(ml) Reads shorter than this after trimming will be discarded.  Pairs will be discarded only if both are shorter.
+.TP
+mlf=0
+(mlf) Reads shorter than this fraction of original length after trimming will be discarded.
+.TP
+maxlength=0
+If nonzero, reads longer than this after trimming will be discarded.
+.TP
+breaklength=0
+If nonzero, reads longer than this will be broken into multiple reads of this length.  Does not work for paired reads.
+.TP
+requirebothbad=t
+(rbb) Only discard pairs if both reads are shorter than minlen.
+.TP
+invertfilters=f
+(invert) Output failing reads instead of passing reads.
+.TP
+minavgquality=0
+(maq) Reads with average quality (after trimming) below this will be discarded.
+.TP
+maqb=0
+If positive, calculate maq from this many initial bases.
+.TP
+chastityfilter=f
+(cf) Reads with names  containing ' 1:Y:' or ' 2:Y:' will be discarded.
+.TP
+barcodefilter=f
+Remove reads with unexpected barcodes if barcodes is set, or barcodes containing 'N' otherwise.
+.IP
+A barcode must be the last part of the read header.
+.TP
+barcodes=
+Comma\-delimited list of barcodes or files of barcodes.
+.TP
+maxns=\-1
+If 0 or greater, reads with more Ns than this (after trimming) will be discarded.
+.TP
+minconsecutivebases=0
+(mcb) Discard reads without at least this many consecutive called bases.
+.TP
+forcetrimleft=0
+(ftl) If nonzero, trim left bases of the read to this position (exclusive, 0\-based).
+.TP
+forcetrimright=0
+(ftr) If nonzero, trim right bases of the read after this position (exclusive, 0\-based).
+.TP
+forcetrimright2=0
+(ftr2) If positive, trim this many bases on the right end.
+.TP
+forcetrimmod=5
+(ftm) If positive, trim length to be equal to zero modulo this number.
+.TP
+mingc=0
+Discard reads with GC content below this.
+.TP
+maxgc=1
+Discard reads with GC content above this.
+.TP
+gcpairs=t
+Use average GC of paired reads.
+.IP
+Also affects gchist.
+.SS Sam and bam processing options:
+.TP
+mappedonly=f
+Toss unmapped reads.
+.TP
+unmappedonly=f
+Toss mapped reads.
+.TP
+pairedonly=f
+Toss reads that are not mapped as proper pairs.
+.TP
+unpairedonly=f
+Toss reads that are mapped as proper pairs.
+.TP
+primaryonly=f
+Toss secondary alignments.  Set this to true for sam to fastq conversion.
+.TP
+minmapq=\-1
+If non\-negative, toss reads with mapq under this.
+.TP
+maxmapq=\-1
+If non\-negative, toss reads with mapq over this.
+.TP
+requiredbits=0
+(rbits) Toss sam lines with any of these flag bits unset.  Similar to samtools \fB\-f\fR.
+.TP
+filterbits=0
+(fbits) Toss sam lines with any of these flag bits set.  Similar to samtools \fB\-F\fR.
+.TP
+stoptag=f
+Set to true to write a tag indicating read stop location, prefixed by YS:i:
+.TP
+sam=
+Set to 'sam=1.3' to convert '=' and 'X' cigar symbols (from sam 1.4+ format) to 'M'.
+.IP
+Set to 'sam=1.4' to convert 'M' to '=' and 'X' (sam=1.4 requires MD tags to be present, or ref to be specified).
+.SS Sam and bam alignment filtering options:
+These require = and X symbols in cigar strings, or MD tags, or areference fasta.
+\fB\-1\fR means disabled; to filter reads with any of a symbol type, set to 0.
+.TP
+subfilter=\-1
+Discard reads with more than this many substitutions.
+.TP
+insfilter=\-1
+Discard reads with more than this many insertions.
+.TP
+delfilter=\-1
+Discard reads with more than this many deletions.
+.TP
+indelfilter=\-1
+Discard reads with more than this many indels.
+.TP
+editfilter=\-1
+Discard reads with more than this many edits.
+.TP
+inslenfilter=\-1
+Discard reads with an insertion longer than this.
+.TP
+dellenfilter=\-1
+Discard reads with a deletion longer than this.
+.TP
+idfilter=\-1.0
+Discard reads with identity below this.
+.TP
+clipfilter=\-1
+Discard reads with more than this many soft\-clipped bases.
+.SS Kmer counting and cardinality estimation:
+.TP
+k=0
+If positive, count the total number of kmers.
+.TP
+cardinality=f
+(loglog) Count unique kmers using the LogLog algorithm.
+.TP
+loglogbuckets=1999
+Use this many buckets for cardinality estimation.
+.SS Shortcuts
+The # symbol will be substituted for 1 and 2.  The % symbol in out will be substituted for input name minus extensions.
+For example:
+.IP
+reformat.sh in=read#.fq out=%.fa
+.PP
+\&...is equivalent to:
+.IP
+reformat.sh in1=read1.fq in2=read2.fq out1=read1.fa out2=read2.fa
+.SS Java Parameters
+.TP
+\fB\-Xmx\fR
+This will set Java's memory usage, overriding autodetection.
+.IP
+\fB\-Xmx20g\fR will specify 20 gigs of RAM, and \fB\-Xmx200m\fR will specify 200 megs.
+The max is typically 85% of physical memory.
+.TP
+\fB\-eoom\fR
+This flag will cause the process to exit if an out\-of\-memory exception occurs.  Requires Java 8u92+.
+.TP
+\fB\-da\fR
+Disable assertions.
+.SH AUTHOR
+Written by Brian Bushnell (Last modified February 21, 2019)
+.P
+Please contact Brian Bushnell at bbushnell at lbl.gov if you encounter any problems.
+.P
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.



View it on GitLab: https://salsa.debian.org/med-team/bbmap/commit/46a3f1ba1becd63d75cb4e79a7ede6505d1d4614

-- 
View it on GitLab: https://salsa.debian.org/med-team/bbmap/commit/46a3f1ba1becd63d75cb4e79a7ede6505d1d4614
You're receiving this email because of your account on salsa.debian.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20190406/2e34519e/attachment-0001.html>


More information about the debian-med-commit mailing list