[med-svn] [daligner] 01/09: Create manpages

Afif Elghraoui afif-guest at moszumanska.debian.org
Sat Aug 29 20:26:29 UTC 2015


This is an automated email from the git hooks/post-receive script.

afif-guest pushed a commit to branch master
in repository daligner.

commit 385d525a8961572471973c42deaeea08327270dd
Author: Afif Elghraoui <afif at ghraoui.name>
Date:   Sat Aug 29 10:37:59 2015 -0700

    Create manpages
---
 debian/daligner-doc/HPCdaligner.1.md |  77 +++++++++++++++++++++++
 debian/daligner-doc/HPCmapper.1.md   |  46 ++++++++++++++
 debian/daligner-doc/LAcat.1.md       |  29 +++++++++
 debian/daligner-doc/LAcheck.1.md     |  41 +++++++++++++
 debian/daligner-doc/LAmerge.1.md     |  37 ++++++++++++
 debian/daligner-doc/LAshow.1.md      |  72 ++++++++++++++++++++++
 debian/daligner-doc/LAsort.1.md      |  31 ++++++++++
 debian/daligner-doc/LAsplit.1.md     |  33 ++++++++++
 debian/daligner-doc/daligner.1.md    | 114 +++++++++++++++++++++++++++++++++++
 9 files changed, 480 insertions(+)

diff --git a/debian/daligner-doc/HPCdaligner.1.md b/debian/daligner-doc/HPCdaligner.1.md
new file mode 100644
index 0000000..1b84509
--- /dev/null
+++ b/debian/daligner-doc/HPCdaligner.1.md
@@ -0,0 +1,77 @@
+% HPCdaligner(1) 1.0
+%
+% August 2015
+
+# NAME
+
+HPCdaligner – generate a script to run **daligner**(1)
+
+# SYNOPSIS
+
+**HPCdaligner** [**-vbAI**] [**-k***int(14)*] [**-w***int(6)*]
+	[**-h***int(35)*] [**-t***int*] [**-M***int*]
+	[**-e***double(.70)*] [**-l***int(1000)*] [**-s***int(100)*] [**-H***int*]
+	[**-m***track*]+ [**-dal***int(4)*] [**-deg***int(25)*]
+	*path:db|dam* [*first:int*[-*last:int*]]
+
+# DESCRIPTION
+
+**HPCdaligner** writes a UNIX shell script to the standard output that consists
+of a sequence of commands that effectively run **daligner**(1) on all pairs of
+blocks of a split database and then externally sorts and merges them using
+**LAsort**(1) and **LAmerge**(1) into a collection of alignment files with
+names *path.#.las* where # ranges from 1 to the number of blocks the database
+is split into. These sorted files if concatenated by say **LAcat**(1)
+would contain all the alignments in sorted order (of a-read, then b-read, ...).
+Moreover, all overlaps for a given a-read are guaranteed to not be split across
+files, so one can run artifact analyzers or error correction on each sorted
+file in parallel.
+
+The database must have been previously split by **DBsplit**(1) and all the
+parameters, except **-v**, **-dal**, and **-deg**, are passed through to the
+calls to **daligner**(1). The defaults for these parameters are as for
+**daligner**(1). The **-v** flag, for verbose-mode, is also passed to all
+calls to **LAsort**(1) and **LAmerge**(1). **-dal** and **-deg** options are
+described later.
+
+For a database divided into N sub-blocks, the calls to **daligner**(1) will
+produce in total 2TN^2 .las files assuming daligner runs with T threads.
+These will then be sorted and merged into N^2 sorted .las files, one for each
+block pair. These are then merged in ceil(log_deg N) phases where the number of
+files decreases geometrically in **-deg** until there is 1 file per row of the
+N x N block matrix. So at the end one has N sorted .las files that when
+concatenated would give a single large sorted overlap file.
+
+The **-dal** option (default 4) gives the desired number of block comparisons
+per call to **daligner**(1). Some must contain *dal*-1 comparisons, and the
+first *dal*-2 block comparisons even less, but the **HPCdaligner** "planner"
+does the best it can to give an average load of dal block comparisons per
+command. The **-deg** option (default 25) gives the maximum number of files
+that will be merged in a single **LAmerge**(1) command. The planner makes the
+most even k-ary tree of merges, where the number of levels is ceil(log_deg N).
+
+If the integers *first* and *last* are missing, then the script produced
+is for every block in the database. If *first* is present, then
+**HPCdaligner** produces an incremental script that compares blocks *first*
+through *last* (*last* = *first* if not present) against each other and all
+previous blocks 1 through *first*-1, and then incrementally updates the .las
+files for blocks 1 through *first*-1, and creates the .las files for
+blocks *first* through *last*.
+
+Each UNIX command line output by the **HPCdaligner** can be a batch job
+(we use the && operator to combine several commands into one line to make this
+so). Dependencies between jobs can be maintained simply by first running all
+the **daligner**(1) jobs, then all the initial sort jobs, and then all the
+jobs in each phase of the external merge sort. Each of these phases is
+separated by an informative comment line for your scripting convenience.
+
+# SEE ALSO
+
+**daligner**(1)
+**LAsort**(1)
+**LAmerge**(1)
+**LAshow**(1)
+**LAcat**(1)
+**LAsplit**(1)
+**LAcheck**(1)
+**HPCmapper**(1)
diff --git a/debian/daligner-doc/HPCmapper.1.md b/debian/daligner-doc/HPCmapper.1.md
new file mode 100644
index 0000000..6f2a7a8
--- /dev/null
+++ b/debian/daligner-doc/HPCmapper.1.md
@@ -0,0 +1,46 @@
+% HPCmapper(1) 1.0
+%
+% August 2015
+
+# NAME
+
+HPCmapper
+
+# SYNOPSIS
+
+**HPCmapper** [**-vb**] [**-k***int(20)*] [**-w***int(6)*] [**-h***int(50)*]
+	[**-t***int*] [**-M***int*] [**-e***double(.85)*]
+	[**-l***int(1000)*] [**-s***int(100)*] [**-H***int*]
+    [**-m***track*]+ [**-dal***int(4)*] [**-deg***int(25)*]
+	*ref:db|dam* *reads:db|dam* [*first:int*[-*last:int*]]
+
+# DESCRIPTION
+
+**HPCmapper** writes a UNIX shell script to the standard output that
+consists of a sequence of commands that effectively "maps" every read in
+the DB *reads* against a reference set of sequences in the DB *ref*,
+recording all the found local alignments in the sequence of files
+*ref.reads.1.las*, *ref.reads.2.las*, ... where *ref.reads.k.las*
+contains the alignments between all of *ref* and the k'th block of
+*reads*.  The parameters are exactly the same as for **HPCdaligner**(1)
+save that the **-k**, **-h**, and **-e** defaults are set
+appropriately for mapping, and the **-A** and **-I** options
+make no sense as *ref* and *reads* are expected to be distinct
+data sets.
+
+If the integers *first* and *last* are missing, then the
+script produced is for every block in the database *reads*.
+If *first* is present then **HPCmapper** produces an script
+that compares blocks *first* through *last* (*last* = *first*
+if not present) against DAM *ref*.
+
+# SEE ALSO
+
+**daligner**(1)
+**LAsort**(1)
+**LAmerge**(1)
+**LAshow**(1)
+**LAcat**(1)
+**LAsplit**(1)
+**LAcheck**(1)
+**HPCdaligner**(1)
diff --git a/debian/daligner-doc/LAcat.1.md b/debian/daligner-doc/LAcat.1.md
new file mode 100644
index 0000000..de24fb1
--- /dev/null
+++ b/debian/daligner-doc/LAcat.1.md
@@ -0,0 +1,29 @@
+% LAcat(1) 1.0
+%
+% August 2015
+
+# NAME
+
+LAcat – concatenate .las files
+
+# SYNOPSIS
+
+**LAcat** *source:las* > *target.las*
+
+# DESCRIPTION
+
+Given argument *source*, find all files *source*.1.las, *source*.2.las, ...
+*source*.n.las where *source*.i.las exists for every i in [1,n]. Then
+concatenate these files in order into a single .las file and pipe the result
+to the standard output.
+
+# SEE ALSO
+
+**daligner**(1)
+**LAsort**(1)
+**LAmerge**(1)
+**LAshow**(1)
+**LAsplit**(1)
+**LAcheck**(1)
+**HPCdaligner**(1)
+**HPCmapper**(1)
diff --git a/debian/daligner-doc/LAcheck.1.md b/debian/daligner-doc/LAcheck.1.md
new file mode 100644
index 0000000..3c4970a
--- /dev/null
+++ b/debian/daligner-doc/LAcheck.1.md
@@ -0,0 +1,41 @@
+% LAcheck(1) 1.0
+%
+% August 2015
+
+# NAME
+
+LAcheck – verify structural integrity of .las files
+
+# SYNOPSIS
+
+**LAcheck** [**-vS**] *src1:db|dam* [*src2:db|dam*] *align:las* ...
+
+# DESCRIPTION
+
+LAcheck checks each .las file for structural integrity, where the a- and
+b-sequences come from *src1* or from *src1* and *src2*, respectively.
+That is, it makes sure each file makes sense as a plausible .las file,
+e.g. values are not out of bound, the number of records is correct, the number
+of trace points for a record is correct, and so on. The exit status is 0 if
+every file is deemed good, and 1 if at least one of the files looks corrupted.
+
+# OPTIONS
+
+**-S**
+:   Also check that the alignments are in sorted order
+
+**-v**
+:   Print a line for each .las file saying either the file is OK or reporting
+	the first error. If the **-v** option is not set, then the program
+	runs silently.
+
+# SEE ALSO
+
+**daligner**(1)
+**LAsort**(1)
+**LAmerge**(1)
+**LAshow**(1)
+**LAcat**(1)
+**LAsplit**(1)
+**HPCdaligner**(1)
+**HPCmapper**(1)
diff --git a/debian/daligner-doc/LAmerge.1.md b/debian/daligner-doc/LAmerge.1.md
new file mode 100644
index 0000000..6ad2de5
--- /dev/null
+++ b/debian/daligner-doc/LAmerge.1.md
@@ -0,0 +1,37 @@
+% LAmerge(1) 1.0
+%
+% August 2015
+
+# NAME
+
+LAmerge – merge .las files into a single sorted file
+
+# SYNOPSIS
+
+**LAmerge** [**-v**] *merge:las* *parts:las* ...
+
+# DESCRIPTION
+
+Merge the .las files *parts* into a singled sorted file *merge*, where it is
+assumed that the input *parts* files are sorted. Due to operating system
+limits, the number of *parts* files must be <= 252. With the **-v** option
+set, the program reports the number of records read and written.
+
+Used correctly, **LAmerge** and **LAsort**(1) together allow one to perform
+an "external" sort that produces a collection of sorted files containing in
+aggregate all the local alignments found by the **daligner**(1), such that
+their concatenation is sorted in order of (a,b,o,ab). In particular, this
+means that all the alignments for a given a-read will be found consecutively
+in one of the files. So computations that need to look at all the alignments
+for a given read can operate in simple sequential scans of these sorted files.
+
+# SEE ALSO
+
+**daligner**(1)
+**LAsort**(1)
+**LAshow**(1)
+**LAcat**(1)
+**LAsplit**(1)
+**LAcheck**(1)
+**HPCdaligner**(1)
+**HPCmapper**(1)
diff --git a/debian/daligner-doc/LAshow.1.md b/debian/daligner-doc/LAshow.1.md
new file mode 100644
index 0000000..141c508
--- /dev/null
+++ b/debian/daligner-doc/LAshow.1.md
@@ -0,0 +1,72 @@
+% LAshow(1) 1.0
+%
+% August 2015
+
+# NAME
+
+LAshow – display local alignments from .las files
+
+# SYNOPSIS
+
+**LAshow** [**-caroUF**] [**-i***int(4)*] [**-w***int(100)*]
+[**-b***int(10)*] *src1:db|dam* [*src2:db|dam*] *align:las*
+[*reads:FILE* | *reads:range* ... ]
+
+# DESCRIPTION
+
+**LAshow** produces a printed listing of the local alignments contained in the
+specified .las file, where the a- and b-reads come from *src1* or from *src1*
+and *src2*, respectively. If a file or list of read ranges is given, then only
+the overlaps for which the a-read is in the set specified by the file or list
+are displayed. See **DBshow**(1) for an explanation of how the file and list
+of read ranges are interpreted.
+
+# OPTIONS
+
+**-F**
+:   Reverse the roles of the a- and b- reads in the display
+
+**-U**
+:   Use uppercase for DNA sequence instead of the default lowercase
+
+**-i***indentation*
+:   Set the indent for the cartoon and/or alignment displays if they are
+	requested. (default: 4)
+
+**-b***num_symbols*
+:   Set the number of symbols on either side of the aligned segments in an
+	alignment display. (default: 10)
+
+**-w***w*
+:   This parameter is used for the display modes specified
+	by **-a** and **-r**. (default: 100)
+
+**-o**
+:   Only display alignments that are proper overlaps-- that is, where a
+	sequence end occurs at each end of the alignment
+
+## DISPLAY MODES
+
+**-c**
+:   Cartoon rendering of the alignment
+
+**-a**, **-r**
+:   Display an alignment of the local alignment
+
+The **-a** option puts exactly *w* columns per segment of
+the display, whereas the **-r** option puts exactly *w* a-read symbols in each
+segment of the display. The **-r** display mode is useful when one wants to
+visually compare two alignments involving the same a-read. If a combination of
+the **-c**, **-a**, and **-r** flags is set, then the cartoon comes first,
+then the **-a** alignment, and lastly the **-r** alignment.
+
+# SEE ALSO
+
+**daligner**(1)
+**LAsort**(1)
+**LAmerge**(1)
+**LAcat**(1)
+**LAsplit**(1)
+**LAcheck**(1)
+**HPCdaligner**(1)
+**HPCmapper**(1)
diff --git a/debian/daligner-doc/LAsort.1.md b/debian/daligner-doc/LAsort.1.md
new file mode 100644
index 0000000..d4443ec
--- /dev/null
+++ b/debian/daligner-doc/LAsort.1.md
@@ -0,0 +1,31 @@
+% LAsort(1) 1.0
+%
+% August 2015
+
+# NAME
+
+LAsort – sort .las alignment files
+
+# SYNOPSIS
+
+**LAsort** [**-v**] *align:las* ...
+
+# DESCRIPTION
+
+Sort each .las alignment file specified on the command line. For each file
+it reads in all the overlaps in the file and sorts them in lexicographical
+order of (a,b,o,ab) assuming each alignment is recorded as
+a[ab,ae] x b^o[bb,be]. It then writes them all to a file named *align.S.las*
+(assuming that the input file was *align.las*). With the **-v** option set,
+the program reports the number of records read and written.
+
+## SEE ALSO
+
+**daligner**(1)
+**LAmerge**(1)
+**LAshow**(1)
+**LAcat**(1)
+**LAsplit**(1)
+**LAcheck**(1)
+**HPCdaligner**(1)
+**HPCmapper**(1)
diff --git a/debian/daligner-doc/LAsplit.1.md b/debian/daligner-doc/LAsplit.1.md
new file mode 100644
index 0000000..723ec63
--- /dev/null
+++ b/debian/daligner-doc/LAsplit.1.md
@@ -0,0 +1,33 @@
+% LAsplit(1) 1.0
+%
+% August 2015
+
+# NAME
+
+LAsplit – divide an .las alignment file
+
+# SYNOPSIS
+
+**LAsplit** *target:las* {*parts:int* | *path:db|dam*} < *source.las*
+
+# DESCRIPTION
+
+If the second argument is an integer n, then divide the alignment file
+*source*, piped in through the standard input, as evenly as possible into n
+alignment files with the name *target.i.las* for i in [1,n], subject to the
+restriction that all alignment records for a given a-read are in the same file.
+
+If the second argument refers to a database *path.db* that has been
+partitioned, then divide the input alignment file into block .las files where
+all records whose a-read is in *path.i.db* are in *align.i.las*.
+
+# SEE ALSO
+
+**daligner**(1)
+**LAsort**(1)
+**LAmerge**(1)
+**LAshow**(1)
+**LAcat**(1)
+**LAcheck**(1)
+**HPCdaligner**(1)
+**HPCmapper**(1)
diff --git a/debian/daligner-doc/daligner.1.md b/debian/daligner-doc/daligner.1.md
new file mode 100644
index 0000000..aeb5dcb
--- /dev/null
+++ b/debian/daligner-doc/daligner.1.md
@@ -0,0 +1,114 @@
+% DALIGNER(1) 1.0
+%
+% August 2015
+
+# NAME
+
+daligner
+
+# SYNOPSIS
+
+**daligner**
+[**-vbAI**]
+[**-k***int(14)*] [**-w***int(6)*] [**-h***int(35)*]
+[**-t***int*] [**-M***int*]
+[**-e***double(.70)*] [**-l***int(1000)*] [**-s***int(100)*] [**-H***int*]
+[**-m***track*]+ *subject:db|dam* *target:db|dam* ...
+
+# DESCRIPTION
+Compare sequences in the trimmed *subject* block against those in the list of
+*target* blocks searching for local alignments involving at least **-l** base
+pairs (default 1000) or more, that have an average correlation rate of **-e**
+(default 70%). The local alignments found will be output in a sparse encoding
+where a trace point on the alignment is recorded every **-s** base pairs of
+the a-read (default 100bp). Reads are compared in both orientations and local
+alignments meeting the criteria are output to one of several created files
+described below. The **-v** option turns on a verbose reporting mode that gives
+statistics on each major step of the computation.
+
+The options **-k**, **-h**, and **-w** control the initial filtration search
+for possible matches between reads. Specifically, our search code looks for a
+pair of diagonal bands of width 2^w (default 2^6 = 64) that contain a
+collection of exact matching k-mers (default 14) between the two reads,
+such that the total number of bases covered by the k-mer hits is h
+(default 35). k cannot be larger than 32 in the current implementation.
+If the **-b** option is set, then the **daligner** assumes the data has a
+strong compositional bias (e.g. >65% AT rich), and at the cost of a bit more
+time, dynamically adjusts k-mer sizes depending on compositional bias, so that
+the mers used have an effective specificity of 4^k.
+
+If there are one or more interval tracks specified with the **-m** option, then
+the reads of the DB or DB's to which the mask applies are soft masked with
+the union of the intervals of all the interval tracks that apply, that is any
+k-mers that contain any bases in any of the masked intervals are ignored for
+the purposes of seeding a match. An interval track is a track, such as the
+"dust" track created by DBdust, that encodes a set of intervals over either
+the untrimmed or trimmed DB.
+
+Invariably, some k-mers are significantly over-represented (e.g. homopolymer
+runs). These k-mers create an excessive number of matching k-mer pairs and
+left unaddressed would cause daligner to overflow the available physical
+memory.  One way to deal with this is to explicitly set the **-t** parameter
+which suppresses the use of any k-mer that occurs more than *t* times in either
+the subject or target block.  However, a better way to handle the situation is
+to let the program automatically select a value of *t* that meets a given
+memory usage limit specified (in Gb) by the **-M** parameter. By default
+**daligner** will use the amount of physical memory as the choice for **-M**.
+If you want to use less, say only 8Gb on a 24Gb HPC cluster node because you
+want to run 3 **daligner** jobs on the node, then specify **-M***8*.
+Specifying **-M***0* basically indicates that you do not want **daligner** to
+self adjust k-mer suppression to fit within a given amount of memory.  
+
+For each subject, target pair of blocks, say X and Y, the program reports
+alignments where the a-read is in X and the b-read is in Y, and vice versa.
+However, if the **-A** option is set ("A" for "asymmetric") then just overlaps
+where the a-read is in X and the b-read is in Y are reported, and if X = Y,
+then it further reports only those overlaps where the a-read index is less than the b-read index.  In either case, if the **-I** option is set
+("I" for "identity") then when X = Y, overlaps between different
+portions of the same read will also be found and reported.  
+
+Each found alignment is recorded as -- a[ab,ae] x bo[bb,be] -- where a and b
+are the indices (in the trimmed DB) of the reads that overlap, o indicates
+whether the b-read is from the same or opposite strand, and [ab,ae] and
+[bb,be] are the intervals of a and bo, respectively, that align. The program
+places these alignment records in files whose name is of the form
+X.Y.[C|N]#.las where C indicates that the b-reads are complemented and N
+indicates they are not (both comparisons are performed) and # is
+the thread that detected and wrote out the collection of alignments contained
+in the file. That is the file X.Y.O#.las contains the alignments produced by
+thread # for which the a-read is from X and the b-read is from Y and in
+orientation O. The command
+**daligner -A** *X* *Y* produces 2\*NTHREAD thread files X.Y.?.las and
+**daligner** *X* *Y* produces 4\*NTHREAD files X.Y.?.las and Y.X.?.las
+(unless *X*=*Y* in which case only NTHREAD files, X.X.?.las, are produced).
+
+By default, **daligner** compares all overlaps between reads in the database
+that are greater than the minimum cutoff set when the DB or DBs were split,
+typically 1 or 2 Kbp. However, the HGAP assembly pipeline only wants to
+correct large reads, say 8Kbp or over, and so needs only the overlaps where
+the a-read is one of the large reads. By setting the **-H** parameter to say N,
+one alters **daligner** so that it only reports overlaps where the a-read is
+over N base-pairs long.
+
+While the default parameter settings are good for raw Pacbio data, **daligner**
+can be used for efficiently finding alignments in corrected reads or other
+less noisy reads. For example, for mapping applications against .dams, we run
+
+**daligner** **-k**20 **-h**60 **-e**.85
+
+and on corrected reads, we typically run
+
+**daligner** **-k**25 **-w**5 **-h**60 **-e**.95 **-s**500
+
+and at these settings it is very fast.
+
+# SEE ALSO
+
+**LAsort**(1)
+**LAmerge**(1)
+**LAshow**(1)
+**LAcat**(1)
+**LAsplit**(1)
+**LAcheck**(1)
+**HPCdaligner**(1)
+**HPCmapper**(1)

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/daligner.git



More information about the debian-med-commit mailing list