[med-svn] [Git][med-team/bbmap][master] Add manpage for dedupe.sh

Andreas Tille gitlab at salsa.debian.org
Fri Apr 5 23:19:30 BST 2019



Andreas Tille pushed to branch master at Debian Med / bbmap


Commits:
74c7513c by Andreas Tille at 2019-04-05T22:19:13Z
Add manpage for dedupe.sh

- - - - -


1 changed file:

- + debian/mans/dedupe.sh.1


Changes:

=====================================
debian/mans/dedupe.sh.1
=====================================
@@ -0,0 +1,249 @@
+.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.47.8.
+.TH DEDUPE.SH "1" "April 2019" "dedupe.sh 38.43" "User Commands"
+.SH NAME
+dedupe.sh \- Simplifies assemblies by removing duplicate or contained
+.SH SYNOPSIS
+.B dedupe.sh
+\fI\,in=<file or stdin> out=<file or stdout>\/\fR
+.SH DESCRIPTION
+Accepts one or more files containing sets of sequences (reads or scaffolds).
+Removes duplicate sequences, which may be specified to be exact matches, subsequences, or sequences within some percent identity.
+Can also find overlapping sequences and group them into clusters.
+Please read bbmap/docs/guides/DedupeGuide.txt for more information.
+.SH EXAMPLES
+An example of running Dedupe for clustering short reads:
+.IP
+dedupe.sh in=x.fq am=f ac=f fo c pc rnc=f mcs=4 mo=100 s=1 pto cc qin=33 csf=stats.txt pattern=cluster_%.fq dot=graph.dot
+.PP
+Input may be fasta or fastq, compressed or uncompressed.
+Output may be stdout or a file.  With no output parameter, data will be written to stdout.
+If 'out=null', there will be no output, but statistics will still be printed.
+You can also use 'dedupe <infile> <outfile>' without the 'in=' and 'out='.
+.SH OPTIONS
+.SS I/O parameters
+.TP
+in=<file,file>
+A single file or a comma\-delimited list of files.
+.TP
+out=<file>
+Destination for all output contigs.
+.TP
+pattern=<file>
+Clusters will be written to individual files, where the '%' symbol in the pattern is replaced by cluster number.
+.TP
+outd=<file>
+Optional; removed duplicates will go here.
+.TP
+csf=<file>
+(clusterstatsfile) Write a list of cluster names and sizes.
+.TP
+dot=<file>
+(graph) Write a graph in dot format.  Requires 'fo' and 'pc' flags.
+.TP
+threads=auto
+(t) Set number of threads to use; default is number of logical processors.
+.TP
+overwrite=t
+(ow) Set to false to force the program to abort rather than overwrite an existing file.
+.TP
+showspeed=t
+(ss) Set to 'f' to suppress display of processing speed.
+.TP
+minscaf=0
+(ms) Ignore contigs/scaffolds shorter than this.
+.TP
+interleaved=auto
+If true, forces fastq input to be paired and interleaved.
+.TP
+ziplevel=2
+Set to 1 (lowest) through 9 (max) to change compression level; lower compression is faster.
+.SS Output format parameters
+.TP
+storename=t
+(sn) Store scaffold names (set false to save memory).
+.TP
+#addpairnum=f
+Add .1 and .2 to numeric id of read1 and read2.
+.TP
+storequality=t
+(sq) Store quality values for fastq assemblies (set false to save memory).
+.TP
+uniquenames=t
+(un) Ensure all output scaffolds have unique names.  Uses more memory.
+.TP
+numbergraphnodes=t
+(ngn) Label dot graph nodes with read numbers rather than read names.
+.TP
+sort=f
+Sort output (otherwise it will be random).  Options:
+.IP
+length: Sort by length
+.IP
+quality: Sort by quality
+.IP
+name:    Sort by name
+.IP
+id:      Sort by input order
+.TP
+ascending=f
+Sort in ascending order.
+.TP
+ordered=f
+Output sequences in input order.  Equivalent to sort=id ascending.
+.TP
+renameclusters=f
+(rnc) Rename contigs to indicate which cluster they are in.
+.TP
+printlengthinedges=f
+(ple) Print the length of contigs in edges.
+.SS Processing parameters
+.TP
+absorbrc=t
+(arc) Absorb reverse\-complements as well as normal orientation.
+.TP
+absorbmatch=t
+(am) Absorb exact matches of contigs.
+.TP
+absorbcontainment=t
+(ac) Absorb full containments of contigs.
+.TP
+#absorboverlap=f
+(ao) Absorb (merge) non\-contained overlaps of contigs (TODO).
+.TP
+findoverlap=f
+(fo) Find overlaps between contigs (containments and non\-containments).  Necessary for clustering.
+.TP
+uniqueonly=f
+(uo) If true, all copies of duplicate reads will be discarded, rather than keeping 1.
+.TP
+rmn=f
+(requirematchingnames) If true, both names and sequence must match.
+.TP
+usejni=f
+(jni) Do alignments in C code, which is faster, if an edit distance is allowed.
+This will require compiling the C code; details are in \fI\,/jni/README.txt\/\fP.
+.SS Subset parameters
+.TP
+subsetcount=1
+(sstc) Number of subsets used to process the data; higher uses less memory.
+.TP
+subset=0
+(sst) Only process reads whose ((ID%subsetcount)==subset).
+.SS Clustering parameters
+.TP
+cluster=f
+(c) Group overlapping contigs into clusters.
+.TP
+pto=f
+(preventtransitiveoverlaps) Do not look for new edges between nodes in the same cluster.
+.TP
+minclustersize=1
+(mcs) Do not output clusters smaller than this.
+.TP
+pbr=f
+(pickbestrepresentative) Only output the single highest\-quality read per cluster.
+.SS Cluster postprocessing parameters
+.TP
+processclusters=f
+(pc) Run the cluster processing phase, which performs the selected operations in this category.
+For example, pc AND cc must be enabled to perform cc.
+.TP
+fixmultijoins=t
+(fmj) Remove redundant overlaps between the same two contigs.
+.TP
+removecycles=t
+(rc) Remove all cycles so clusters form trees.
+.TP
+cc=t
+(canonicizeclusters) Flip contigs so clusters have a single orientation.
+.TP
+fcc=f
+(fixcanoncontradictions) Truncate graph at nodes with canonization disputes.
+.TP
+foc=f
+(fixoffsetcontradictions) Truncate graph at nodes with offset disputes.
+.TP
+mst=f
+(maxspanningtree) Remove cyclic edges, leaving only the longest edges that form a tree.
+.SS Overlap Detection Parameters
+.TP
+exact=t
+(ex) Only allow exact symbol matches.  When false, an 'N' will match any symbol.
+.TP
+touppercase=t
+(tuc) Convert input bases to upper\-case; otherwise, lower\-case will not match.
+.TP
+maxsubs=0
+(s) Allow up to this many mismatches (substitutions only, no indels).  May be set higher than maxedits.
+.TP
+maxedits=0
+(e) Allow up to this many edits (subs or indels).  Higher is slower.
+.TP
+minidentity=100
+(mid) Absorb contained sequences with percent identity of at least this (includes indels).
+.TP
+minlengthpercent=0
+(mlp) Smaller contig must be at least this percent of larger contig's length to be absorbed.
+.TP
+minoverlappercent=0
+(mop) Overlap must be at least this percent of smaller contig's length to cluster and merge.
+.TP
+minoverlap=200
+(mo) Overlap must be at least this long to cluster and merge.
+.TP
+depthratio=0
+(dr) When non\-zero, overlaps will only be formed between reads with a depth ratio of at most this.
+Should be above 1.
+Depth is determined by parsing the read names; this information can be added
+by running KmerNormalize (khist.sh, bbnorm.sh, or ecc.sh) with the flag 'rename'
+.TP
+k=31
+Seed length used for finding containments and overlaps.  Anything shorter than k will not be found.
+.TP
+numaffixmaps=1
+(nam) Number of prefixes/suffixes to index per contig. Higher is more sensitive, if edits are allowed.
+.TP
+hashns=f
+Set to true to search for matches using kmers containing Ns.  Can lead to extreme slowdown in some cases.
+.TP
+#ignoreaffix1=f
+(ia1) Ignore first affix (for testing).
+.TP
+#storesuffix=f
+(ss) Store suffix as well as prefix.  Automatically set to true when doing inexact matches.
+.SS Other Parameters
+.TP
+qtrim=f
+Set to qtrim=rl to trim leading and trailing Ns.
+.TP
+trimq=6
+Quality trim level.
+.TP
+forcetrimleft=\-1
+(ftl) If positive, trim bases to the left of this position (exclusive, 0\-based).
+.TP
+forcetrimright=\-1
+(ftr) If positive, trim bases to the right of this position (exclusive, 0\-based).
+.SS Note on Proteins / Amino Acids
+Dedupe supports amino acid space via the 'amino' flag.  This also changes the default kmer length to 10.
+In amino acid mode, all flags related to canonicity and reverse\-complementation are disabled,
+and nam (numaffixmaps) is currently limited to 2 per tip.
+.SS Java Parameters
+.TP
+\fB\-Xmx\fR
+This will set Java's memory usage, overriding autodetection.
+.IP
+\fB\-Xmx20g\fR will specify 20 gigs of RAM, and \fB\-Xmx200m\fR will specify 200 megs.
+The max is typically 85% of physical memory.
+.TP
+\fB\-eoom\fR
+This flag will cause the process to exit if an out\-of\-memory exception occurs.  Requires Java 8u92+.
+.TP
+\fB\-da\fR
+Disable assertions.
+.SH AUTHOR
+Written by Brian Bushnell and Jonathan Rood (Last modified November 20, 2017)
+.P
+Please contact Brian Bushnell at bbushnell at lbl.gov if you encounter any problems.
+.P
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.



View it on GitLab: https://salsa.debian.org/med-team/bbmap/commit/74c7513c30f9deeafc6d8ebde58153aa08371e45

-- 
View it on GitLab: https://salsa.debian.org/med-team/bbmap/commit/74c7513c30f9deeafc6d8ebde58153aa08371e45
You're receiving this email because of your account on salsa.debian.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20190405/01edf69c/attachment-0001.html>


More information about the debian-med-commit mailing list