[med-svn] [dnaclust] 01/01: Added manpage

Iain Learmonth irl-guest at moszumanska.debian.org
Sat Feb 1 18:20:31 UTC 2014


This is an automated email from the git hooks/post-receive script.

irl-guest pushed a commit to branch master
in repository dnaclust.

commit bfb68c5201c71ae2f124572f8fa9b5197683a76a
Author: Iain R. Learmonth <irl at fsfe.org>
Date:   Sat Feb 1 18:19:55 2014 +0000

    Added manpage
---
 debian/dnaclust.1 | 153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 153 insertions(+)

diff --git a/debian/dnaclust.1 b/debian/dnaclust.1
new file mode 100644
index 0000000..e1a3713
--- /dev/null
+++ b/debian/dnaclust.1
@@ -0,0 +1,153 @@
+'\" t
+.\"     Title: dnaclust
+.\"    Author: Mohammadreza Ghodsi <ghodsi at cs.umd.edu>
+.\" Generator: DocBook XSL Stylesheets v1.76.1 <http://docbook.sf.net/>
+.\"      Date: 02/01/2014
+.\"    Manual: Bioinformatics Tools
+.\"    Source: dnaclust
+.\"  Language: English
+.\"
+.TH "DNACLUST" "1" "02/01/2014" "dnaclust" "Bioinformatics Tools"
+.\" -----------------------------------------------------------------
+.\" * Define some portability stuff
+.\" -----------------------------------------------------------------
+.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.\" http://bugs.debian.org/507673
+.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
+.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\" -----------------------------------------------------------------
+.\" * set default formatting
+.\" -----------------------------------------------------------------
+.\" disable hyphenation
+.nh
+.\" disable justification (adjust text to left margin only)
+.ad l
+.\" -----------------------------------------------------------------
+.\" * MAIN CONTENT STARTS HERE *
+.\" -----------------------------------------------------------------
+.SH "NAME"
+dnaclust \- program to cluster large number of short DNA sequences
+.SH "SYNOPSIS"
+.HP \w'\fBdnaclust\fR\ 'u
+\fBdnaclust\fR {\fB\-i\fR\ |\ \fB\-\-input\fR}\ \fIinfile\fR [{\fB\-s\fR\ |\ \fB\-\-similarity\fR}\ \fIthreshold\fR] [{\fB\-m\fR\ |\ \fB\-\-multiple\-alignment\fR}] [{\fB\-d\fR\ |\ \fB\-\-header\fR}] [{\fB\-l\fR\ |\ \fB\-\-left\-gaps\-allowed\fR}] [{\fB\-k\fR\ |\ \fB\-\-k\-mer\-length\fR}\ \fIlength\fR] [{\fB\-a\fR\ |\ \fB\-\-approximate\-filter\fR}] [\fB\-\-no\-k\-mer\-filter\fR]
+.HP \w'\fBdnaclust\fR\ 'u
+\fBdnaclust\fR [{\fB\-h\fR\ |\ \fB\-\-help\fR} | {\fB\-v\fR\ |\ \fB\-\-version\fR}]
+.SH "DESCRIPTION"
+.PP
+This manual page documents briefly the
+\fBdnaclust\fR
+program\&.
+.PP
+\fBdnaclust\fR
+is a tool for clustering large number of short DNA sequences\&. The clusters are created in such a way that the "radius" of each clusters is no more than the specified threshold\&.
+.PP
+The input sequences to be clustered should be in Fasta format\&. The id of each sequence is based on the first word of the seqeunce in the Fasta format\&. The first word is the prefix of the header up to the first occurance of white space charachters in the header\&. The output is written to STDOUT\&. If you want the output to be written to a file, just redirect the output (See Examples)\&.
+.PP
+The output has two modes: the default clustering mode, and clustering with multiple sequence alignment\&. In the clustering mode (without multiple alignment), each cluster will be printed on a separate line\&. The line will contain the ids of the sequences in the cluster\&. The first id in each line is the cluster center sequence id\&. Because of the way our clusters are constructed, the length of the cluster center sequence is always greater than or equal to the length of any of the seq [...]
+.PP
+For more information about the multiple sequence alignment mode see the description of
+\fB\-\-multiple\-alignment\fR
+option\&.
+.SH "OPTIONS"
+.PP
+The program follows the usual GNU command line syntax, with long options starting with two dashes (\*(Aq\-\*(Aq)\&. A summary of options is included below\&.
+.PP
+\fB\-\-similarity \fR\fB\fIthreshold\fR\fR, \fB\-s \fR\fB\fIthreshold\fR\fR
+.RS 4
+The similarity threshold specifies the radius of the clusters created\&. This parameter is a floating point number between 0 and 1\&. It is calculated based on semi\-global alignment of a sequence to the cluster center sequence\&. Namely similarity = 1 \- (edit distance) / (length of the shorter sequence)\&. The edit distance is the minimum number of insertions, deletions or substitutions necessary to aling a sequence to the cluster center sequence\&. Our algorithms are faster when the s [...]
+.RE
+.PP
+\fB\-\-k\-mer\-length \fR\fB\fIlength\fR\fR, \fB\-k \fR\fB\fIlength\fR\fR
+.RS 4
+When you use the k\-mer filter (which is enabled by default) you can specify the maximum length of the k\-mers used for filtering\&.
+.sp
+The longer k\-mer lengths require more memory to store k\-mer counts and the filtering will be slower\&. However with the longer k\-mer length, the filter will be more specific and therefore the sequence alignment search may be faster\&.
+.sp
+There is a tradeoff between filtering and searching time\&. If you do not specify the k\-mer length a value of log4(median of the lengths of the input sequences) is picked automatically\&. By using this option you can override the default value\&.
+.sp
+Keep in mind, however, that longer k\-mer lengths would require more memory to store the filtering data structures\&.
+.RE
+.PP
+\fB\-\-approximate\-filter \fR, \fB\-a \fR
+.RS 4
+By default the k\-mer filter is 100 percent sensitive\&. This means that in the output clustering, no two cluster centers are within the threshold distance from each other\&. The exact filter, however, is somewhat slow\&. This option speeds up the filter by using a heuristic\&. The use of the approximate filter may result in cluster centers that are close, and a larger number clusters overall\&. However the approximate filter is usually several times faster than the exact sensitive filte [...]
+.RE
+.PP
+\fB\-\-allow\-left\-gaps \fR, \fB\-l \fR
+.RS 4
+With this option the distances are measured based on semi\-global alignment\&. The semi\-global alignment allows for gaps without penalty on both ends of the shorter sequence\&.
+.sp
+The default alignment is a one sided semi\-global alignment\&. i\&.e\&. gaps are only allowed at the right end of the shorter sequence without penalty\&. This behavior corresponds to the data from targeted sequening of a region (e\&.g\&. of 16S ribosomal RNA gene)\&.
+.RE
+.PP
+\fB\-\-multiple\-alignment\fR, \fB\-m\fR
+.RS 4
+Set the output format to show the multiple sequence alignment of each cluster\&. The gaps in the alignments are represented with the dash \*(Aq\-\*(Aq charachter\&.
+.sp
+The format of the MSA output is as follows: The MSA of each cluster spans several lines\&. The MSA starts with a line containing charachter \*(Aq#\*(Aq followed by the number of sequences in that cluster\&. The aligned sequences (which may contain gaps) follow in the Fasta format\&. Each Fasta record will be composed of two lines\&. The header line and the sequence line\&. Since each aligned sequence is output on a single line, the output may contain very long lines\&. Please use \*(Aqle [...]
+.RE
+.PP
+\fB\-\-no\-k\-mer\-filter\fR
+.RS 4
+Disables the k\-mer filter\&. Suitable for clustering very short sequences at a high similarity threshold\&.
+.RE
+.PP
+\fB\-d\fR, \fB\-\-header\fR
+.RS 4
+Write program options to output\&.
+.RE
+.PP
+\fB\-h\fR, \fB\-\-help\fR
+.RS 4
+Show summary of options\&.
+.RE
+.PP
+\fB\-v\fR, \fB\-\-version\fR
+.RS 4
+Show version of program\&.
+.RE
+.SH "EXAMPLES"
+.PP
+
+.sp
+.if n \{\
+.RS 4
+.\}
+.nf
+\&./dnaclust file\&.fasta \-l \-s 0\&.98 \-k 3 > clusters
+.fi
+.if n \{\
+.RE
+.\}
+.sp
+.SH "BUGS"
+.PP
+The program is currently limited to only work with the
+boost
+library\&.
+.SH "SEE ALSO"
+.PP
+
+\fBawk\fR(1),
+\fBless\fR(1),
+\fBwc\fR(1)
+.SH "AUTHOR"
+.PP
+\fBMohammadreza Ghodsi\fR <\&ghodsi at cs\&.umd\&.edu\&>
+.RS 4
+Designed and developed and maintains this package\&. Wrote this manpage\&.
+.RE
+.SH "COPYRIGHT"
+.br
+Copyright \(co 2010 Mohammadreza Ghodsi
+.br
+.PP
+This manual page was written for the Debian system (but may be used by others)\&.
+.PP
+Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 or (at your option) any later version published by the Free Software Foundation\&.
+.PP
+On Debian systems, the complete text of the GNU General Public License can be found in
+/usr/share/common\-licenses/GPL\&.
+.sp

-- 
Alioth's /git/debian-med/git-commit-notice on /srv/git.debian.org/git/debian-med/dnaclust.git



More information about the debian-med-commit mailing list