[med-svn] [blasr] 12/12: Partially rework blasr manpage

Afif Elghraoui afif-guest at moszumanska.debian.org
Thu Jul 30 08:24:42 UTC 2015


This is an automated email from the git hooks/post-receive script.

afif-guest pushed a commit to branch master
in repository blasr.

commit 4e52e20729e49825c0a554658fd99800cb1bbe93
Author: Afif Elghraoui <afif at ghraoui.name>
Date:   Thu Jul 30 01:21:52 2015 -0700

    Partially rework blasr manpage
---
 debian/blasr.1 | 458 ++++++++++++++++++++++++++++-----------------------------
 1 file changed, 228 insertions(+), 230 deletions(-)

diff --git a/debian/blasr.1 b/debian/blasr.1
index 45693ea..015be46 100644
--- a/debian/blasr.1
+++ b/debian/blasr.1
@@ -2,112 +2,123 @@
 .SH NAME
 blasr \- Map SMRT Sequences to a reference genome.
 .SH SYNOPSIS
-.IP
-blasr reads.bam genome.fasta \fB\-bam\fR \fB\-out\fR out.bam
-.IP
-blasr reads.fasta genome.fasta
-.IP
-blasr reads.fasta genome.fasta \fB\-sa\fR genome.fasta.sa
-.IP
-blasr reads.bax.h5 genome.fasta [\-sa genome.fasta.sa]
-.IP
-blasr reads.bax.h5 genome.fasta \fB\-sa\fR genome.fasta.sa \fB\-maxScore\fR \fB\-100\fR \fB\-minMatch\fR 15 ...
-.IP
-blasr reads.bax.h5 genome.fasta \fB\-sa\fR genome.fasta.sa \fB\-nproc\fR 24 \fB\-out\fR alignment.out ...
+.P
+.B blasr
+.I reads.bam
+.I genome.fasta \fB\-bam \-out\fI out.bam
+.P
+.B blasr
+.I reads.fasta
+.I genome.fasta
+.P
+.B blasr
+.I reads.fasta
+.I genome.fasta \fB\-sa\fI genome.fasta.sa
+.P
+.B blasr
+.I reads.bax.h5
+.I genome.fasta \fR[\fB\-sa \fIgenome.fasta.sa\fR]
+.P
+.B blasr
+.I reads.bax.h5
+.I genome.fasta \fB\-sa\fI genome.fasta.sa \fB\-maxScore\fR \-100 \fB\-minMatch\fR 15 ...
+.P
+.B blasr
+.I reads.bax.h5
+.I genome.fasta \fB\-sa\fI genome.fasta.sa \fB\-nproc\fR 24 \fB\-out\fI alignment.out\fR ...
 .SH DESCRIPTION
-.IP
-blasr is a read mapping program that maps reads to positions
+.P
+\fBblasr\fR is a read mapping program that maps reads to positions
 in a genome by clustering short exact matches between the read and
 the genome, and scoring clusters using alignment. The matches are
 generated by searching all suffixes of a read against the genome
 using a suffix array. Global chaining methods are used to score
 clusters of matches.
-.IP
+.P
 The only required inputs to blasr are a file of reads and a
 reference genome.  It is exremely useful to have read filtering
 information, and mapping runtime may decrease substantially when a
 precomputed suffix array index on the reference sequence is
 specified.
-.IP
+.P
 Although reads may be input in FASTA format, the recommended input is
 PacBio BAM files because these contain qualtiy value
 information that is used in the alignment and produces higher quality
 variant detection.
 Although alignments can be output in various formats, the recommended
 output format is PacBio BAM.
-Support to bax.h5 and plx.h5 files will be DEPRECATED.
-Support to region tables for h5 files will be DEPRECATED.
-.IP
+Support for bax.h5 and plx.h5 files will be \fBDEPRECATED\fR.
+Support for region tables for h5 files will be \fBDEPRECATED\fR.
+.P
 When suffix array index of a genome is not specified, the suffix array is
-built before producing alignment.   This may be prohibitively slow
-when the genome is large (e.g. Human).  It is best to precompute the
-suffix array of a genome using the program sawriter, and then specify
-the suffix array on the command line using \fB\-sa\fR genome.fa.sa.
-.IP
+built before producing alignment. This may be prohibitively slow
+when the genome is large (e.g. Human). It is best to precompute the
+suffix array of a genome using the program
+.BR sawriter (1),
+and then specify the suffix array on the command line using
+\fB\-sa\fR genome.fa.sa.
+.P
 The optional parameters are roughly divided into three categories:
 control over anchoring, alignment scoring, and output.
-.IP
+.P
 The default anchoring parameters are optimal for small genomes and
 samples with up to 5% divergence from the reference genome.  The main
 parameter governing speed and sensitivity is the \fB\-minMatch\fR parameter.
 For human genome alignments, a value of 11 or higher is recommended.
 Several methods may be used to speed up alignments, at the expense of
 possibly decreasing sensitivity.
-.IP
+.P
 Regions that are too repetitive may be ignored during mapping by
 limiting the number of positions a read maps to with the
-\fB\-maxAnchorsPerPosition\fR option.  Values between 500 and 1000 are effective
+\fB\-maxAnchorsPerPosition\fR option. Values between 500 and 1000 are effective
 in the human genome.
-.IP
+.P
 For small genomes such as bacterial genomes or BACs, the default parameters
 are sufficient for maximal sensitivity and good speed.
 .SH OPTIONS
-Options for blasr
-Basic usage: 'blasr reads.{bam|fasta|bax.h5|fofn} genome.fasta [\-options]
-.IP
-option Description (default_value).
-.IP
-Input Files.
 .TP
-reads.bam
-is a PacBio BAM file of reads.
-This is the preferred input to blasr because rich quality
+.B Input Files
+.RS
+.TP
+.B Reads
+.RS
+.TP
+.I reads.bam
+A PacBio BAM file of reads.
+This is the preferred input to \fBblasr\fR because rich quality
 value (insertion,deletion, and substitution quality values) information is
-maintained.  The extra quality information improves variant detection and mapping
-speed.
+maintained.  The extra quality information improves variant detection and mapping speed.
 .TP
-reads.fasta is a multi\-fasta file of reads.
-While any fasta file is valid input,
-.IP
-reads.bax.h5|reads.plx.h5 is the old DEPRECATED output format of SMRT reads.
-input.fofn  File of file names accepted.
-.HP
-\fB\-sa\fR suffixArrayFile
-.IP
-Use the suffix array 'sa' for detecting matches
-between the reads and the reference.  The suffix
-array has been prepared by the sawriter program.
-.HP
-\fB\-ctab\fR tab
+.I reads.fasta
+A multi\-fasta file of reads, though any fasta file is valid input
+.TP
+.IR reads.bax.h5 | reads.plx.h5
+the old \fBDEPRECATED\fR output format of SMRT reads.
 .TP
+.I input.fofn
+File of file names
+.RE
+.TP
+\fB\-sa\fI suffixArrayFile
+Use the suffix array 'sa' for detecting matches between the reads and the
+reference.
+The suffix array has been prepared by the \fBsawriter\fR(1) program.
+.TP
+\fB\-ctab\fI tab
 A table of tuple counts used to estimate match significance.
-This is
+This is by the program 'printTupleCountTable'.
+While it is quick to generate on the fly, if there are many invocations of
+\fBblasr\fR, it is useful to precompute the ctab.
 .TP
-by the program 'printTupleCountTable'.
-While it is quick to generate on
-.IP
-the fly, if there are many invocations of blasr, it is useful to
-precompute the ctab.
-.HP
-\fB\-regionTable\fR table (DEPRECATED)
-.IP
-Read in a read\-region table in HDF format for masking portions of reads.
+\fB\-regionTable\fI table\fR (\fBDEPRECATED\fR)
+Read in a read-region table in HDF format for masking portions of reads.
 This may be a single table if there is just one input file,
 or a fofn.  When a region table is specified, any region table inside
 the reads.plx.h5 or reads.bax.h5 files are ignored.
-.PP
-(DEPRECATED) Options for modifying reads.
-.IP
+.RE
+.B (DEPRECATED) Options for modifying reads.
+.RS
+.P
 There is ancilliary information about substrings of reads
 that is stored in a 'region table' for each read file.  Because
 HDF is used, the region table may be part of the .bax.h5 or .plx.h5 file,
@@ -116,226 +127,207 @@ is a subread, and any read may contain multiple subreads. The boundaries
 of the subreads may be inferred from the region table either directly or
 by definition of adapter boundaries.  Typically region tables also
 contain information for the location of the high and low quality regions of
-reads.  Reads produced by spurrious reads from empty ZMWs have a high
+reads.  Reads produced by spurious reads from empty ZMWs have a high
 quality start coordinate equal to high quality end, making no usable read.
-.HP
+.TP
 \fB\-useccs\fR
-.IP
 Align the circular consensus sequence (ccs), then report alignments
-of the ccs subreads to the window that the ccs was mapped to.  Only
-alignments of the subreads are reported.
-.HP
+of the ccs subreads to the window that the ccs was mapped to.
+Only alignments of the subreads are reported.
+.TP
 \fB\-useccsall\fR
-.IP
 Similar to \fB\-useccs\fR, except all subreads are aligned, rather than just
 the subreads used to call the ccs.  This will include reads that only
 cover part of the template.
-.HP
+.TP
 \fB\-useccsdenovo\fR
-.IP
 Align the circular consensus, and report only the alignment of the ccs
 sequence.
-.HP
-\fB\-noSplitSubreads\fR (false)
 .TP
-Do not split subreads at adapters.
-This is typically only
-.IP
+\fB\-noSplitSubreads\fR (false)
+Do not split subreads at adapters. This is typically only
 useful when the genome in an unrolled version of a known template, and
-contains template\-adapter\-reverse_template sequence.
-.HP
-\fB\-ignoreRegions\fR(false)
-.IP
+contains template-adapter-reverse_template sequence.
+.TP
+\fB\-ignoreRegions\fR (false)
 Ignore any information in the region table.
-.HP
-\fB\-ignoreHQRegions\fR (false)Ignore any hq regions in the region table.
-.IP
-Alignments To Report.
-.HP
-\fB\-bestn\fR n (10)
-.IP
-Report the top 'n' alignments.
-.HP
-\fB\-hitPolicy\fR
-.IP
-(all) Specify a policy to treat multiple hits from [all, allbest, random, randombest, leftmost]
 .TP
-all
-: report all alignments.
+\fB\-ignoreHQRegions\fR (false)
+Ignore any hq regions in the region table.
+.RE
+.B Alignments To Report
+.RS
 .TP
-allbest
-: report all equally top scoring alignments.
+\fB\-bestn\fI n \fR(10)
+Report the top \fIn\fR alignments.
+.TP
+\fB\-hitPolicy\fR (all)
+Specify a policy to treat multiple hits from [all, allbest, random, randombest, leftmost]
+.RS
+.TP
+.I all
+report all alignments.
+.TP
+.I allbest
+report all equally top scoring alignments.
+.TP
+.I random
+report a random alignment.
+.TP
+.I randombest
+report a random alignment from multiple equally top scoring alignments.
+.TP
+.I leftmost
+report an alignment which has the best alignmentscore and has the smallest mapping coordinate in any reference.
+.RE
 .TP
-random
-: report a random alignment.
-.IP
-randombest: report a random alignment from multiple equally top scoring alignments.
-leftmost  : report an alignment which has the best alignmentscore and has the smallest mapping coordinate in any reference.
-.HP
 \fB\-placeRepeatsRandomly\fR (false)
-.IP
-DEPRECATED! If true, equivalent to \fB\-hitPolicy\fR randombest.
-.HP
+\fBDEPRECATED!\fR If true, equivalent to \fB\-hitPolicy\fI randombest\fR.
+.TP
 \fB\-randomSeed\fR (0)
-.IP
 Seed for random number generator. By default (0), use current time as seed.
-.HP
+.TP
 \fB\-noSortRefinedAlignments\fR (false)
-.IP
 Once candidate alignments are generated and scored via sparse dynamic
 programming, they are rescored using local alignment that accounts
 for different error profiles.
 Resorting based on the local alignment may change the order the hits are returned.
-.HP
+.TP
 \fB\-allowAdjacentIndels\fR
-.IP
 When specified, adjacent insertion or deletions are allowed. Otherwise, adjacent
 insertion and deletions are merged into one operation.  Using quality values
 to guide pairwise alignments may dictate that the higher probability alignment
-contains adjacent insertions or deletions.  Current tools such as GATK do not permit
-this and so they are not reported by default.
-.IP
-Output Formats and Files
-.HP
-\fB\-out\fR out (terminal)
-.IP
-Write output to 'out'.
+contains adjacent insertions or deletions.
+Current tools such as GATK do not permit this and so they are not reported by
+default.
+.RE
+.B Output Formats and Files
+.RS
+.TP
+\fB\-out\fI out \fR(terminal)
+Write output to \fIout\fR.
 .TP
 \fB\-sam\fR
 Write output in SAM format.
-.HP
-\fB\-m\fR t
-.IP
+.TP
+\fB\-m\fI t
 If not printing SAM, modify the output of the alignment.
-.IP
-t=0 Print blast like output with |'s connecting matched nucleotides.
-.IP
-1 Print only a summary: score and pos.
-2 Print in Compare.xml format.
-3 Print in vulgar format (DEPRECATED).
-4 Print a longer tabular version of the alignment.
-5 Print in a machine\-parsable format that is read by compareSequences.py.
-.HP
+.TP
+When \fIt\fR is:
+.RS
+.TP
+0
+Print blast like output with |'s connecting matched nucleotides.
+.TP
+1
+Print only a summary: score and pos.
+.TP
+2
+Print in Compare.xml format.
+.TP
+3
+Print in vulgar format (\fBDEPRECATED\fR).
+.TP
+4
+Print a longer tabular version of the alignment.
+.TP
+5
+Print in a machine\-parsable format that is read by compareSequences.py.
+.RE
+.TP
 \fB\-header\fR
-.IP
 Print a header as the first line of the output file describing the contents of each column.
-.HP
-\fB\-titleTable\fR tab (NULL)
 .TP
+\fB\-titleTable\fI tab \fR(NULL)
 Construct a table of reference sequence titles.
-The reference sequences are
+The reference sequences are enumerated by row, 0,1,...
+The reference index is printed in alignment results rather than the full
+reference name.
+This makes output concise, particularly whenvery verbose titles exist in
+reference names.
 .TP
-enumerated by row, 0,1,...
-The reference index is printed in alignment results
+\fB\-unaligned\fI file
+Output reads that are not aligned to \fIfile\fR
 .TP
-rather than the full reference name.
-This makes output concise, particularly when
-.IP
-very verbose titles exist in reference names.
-.HP
-\fB\-unaligned\fR file
-.IP
-Output reads that are not aligned to 'file'
-.HP
-\fB\-clipping\fR [none|hard|subread|soft] (none)
+.IR \fB\-clipping\fI \0[ none | hard | subread | soft ] \0\fR(none)
 .IP
 Use no/hard/subread/soft clipping, ONLY for SAM/BAM output.
-.HP
+.TP
 \fB\-printSAMQV\fR (false)
-.IP
 Print quality values to SAM output.
-.HP
+.TP
 \fB\-cigarUseSeqMatch\fR (false)
-.IP
 CIGAR strings in SAM/BAM output use '=' and 'X' to represent sequence match and mismatch instead of 'M'.
-.IP
-Options for anchoring alignment regions. This will have the greatest effect on speed and sensitivity.
-.HP
-\fB\-minMatch\fR m (12)
+.RE
+.B Options for anchoring alignment regions.
+.RS
+.P
+This will have the greatest effect on speed and sensitivity.
 .TP
+\fB\-minMatch\fI m \fR(12)
 Minimum seed length.
-Higher minMatch will speed up alignment,
-.IP
-but decrease sensitivity.
-.HP
-\fB\-maxMatch\fR l (inf)
-.IP
-Stop mapping a read to the genome when the lcp length reaches l.
+Higher minMatch will speed up alignment, but decrease sensitivity.
+.TP
+\fB\-maxMatch\fI l \fR(inf)
+Stop mapping a read to the genome when the lcp length reaches \fIl\fR.
 This is useful when the query is part of the reference, for example when
 constructing pairwise alignments for de novo assembly.
-.HP
-\fB\-maxLCPLength\fR l (inf)
-.IP
+.TP
+\fB\-maxLCPLength\fI l \fR(inf)
 The same as \fB\-maxMatch\fR.
-.HP
-\fB\-maxAnchorsPerPosition\fR m (10000)
-.IP
-Do not add anchors from a position if it matches to more than 'm' locations in the target.
-.HP
-\fB\-advanceExactMatches\fR E (0)
 .TP
+\fB\-maxAnchorsPerPosition\fI m \fR(10000)
+Do not add anchors from a position if it matches to more than \fIm\fR locations in the target.
+.TP
+\fB\-advanceExactMatches\fI E \fR(0)
 Another trick for speeding up alignments with match \- E fewer anchors.
-Rather than
-.IP
-finding anchors between the read and the genome at every position in the read,
-when an anchor is found at position i in a read of length L, the next position
-in a read to find an anchor is at i+L\-E.
+Rather than finding anchors between the read and the genome at every position
+in the read, when an anchor is found at position i in a read of length L, the
+next position in a read to find an anchor is at i+L\-E.
 Use this when alignining already assembled contigs.
-.HP
-\fB\-nCandidates\fR n (10)
 .TP
-Keep up to 'n' candidates for the best alignment.
-A large value of n will slow mapping
-.IP
-because the slower dynamic programming steps are applied to more clusters of anchors
-which can be a rate limiting step when reads are very long.
-.HP
-\fB\-concordant\fR(false)
-.IP
-Map all subreads of a zmw (hole) to where the longest full pass subread of the zmw
-aligned to. This requires to use the region table and hq regions.
+\fB\-nCandidates\fI n \fR(10)
+Keep up to \fIn\fR candidates for the best alignment.
+A large value of n will slow mapping because the slower dynamic programming
+steps are applied to more clusters of anchors which can be a rate limiting
+step when reads are very long.
+.TP
+\fB\-concordant\fR (false)
+Map all subreads of a zmw (hole) to where the longest full pass subread of
+the zmw aligned to. This requires to use the region table and hq regions.
 This option only works when reads are in base or pulse h5 format.
-.HP
-\fB\-concordantTemplate\fR(mediansubread)
-.IP
+.TP
+\fB\-concordantTemplate\fR (mediansubread)
 Select a full pass subread of a zmw as template for concordant mapping.
-longestsubread \- use the longest full pass subread
-mediansubread  \- use the median length full pass subread
-typicalsubread \- use the second longest full pass subread if length of
-.IP
+longestsubread - use the longest full pass subread
+mediansubread  - use the median length full pass subread
+typicalsubread - use the second longest full pass subread if length of
 the longest full pass subread is an outlier
-.HP
-\fB\-fastMaxInterval\fR(false)
-.IP
+.TP
+\fB\-fastMaxInterval\fR (false)
 Fast search maximum increasing intervals as alignment candidates. The search
 is not as exhaustive as the default, but is much faster.
-.HP
-\fB\-aggressiveIntervalCut\fR(false)
-.IP
-Agreesively filter out non\-promising alignment candidates, if there
+.TP
+\fB\-aggressiveIntervalCut\fR (false)
+Agreesively filter out non-promising alignment candidates, if there
 exists at least one promising candidate. If this option is turned on,
-Blasr is likely to ignore short alignments of ALU elements.
-.HP
-\fB\-fastSDP\fR(false)
-.IP
+\fBblasr\fI is likely to ignore short alignments of ALU elements.
+.TP
+\fB\-fastSDP\fR (false)
 Use a fast heuristic algorithm to speed up sparse dynamic programming.
-.IP
-Options for Refining Hits.
-.HP
-\fB\-sdpTupleSize\fR K (11)
+.RE
+.B Options for Refining Hits
+.RS
 .TP
-Use matches of length K to speed dynamic programming alignments.
+\fB\-sdpTupleSize\fI K \fR(11)
+Use matches of length \fIK\fR to speed dynamic programming alignments.
 This controls
-.IP
 accuracy of assigning gaps in pairwise alignments once a mapping has been found,
 rather than mapping sensitivity itself.
-.HP
-\fB\-scoreMatrix\fR "score matrix string"
 .TP
+\fB\-scoreMatrix\fI score matrix string
 Specify an alternative score matrix for scoring fasta reads.
-The matrix is
-.IP
-in the format
+The matrix is in the format
 .IP
 ACGTN
 .IP
@@ -355,8 +347,9 @@ Set the penalty for opening an affine alignment.
 \fB\-affineExtend\fR a (0)
 .IP
 Change affine (extension) gap penalty. Lower value allows more gaps.
-.IP
-Options for overlap/dynamic programming alignments and pairwise overlap for de novo assembly.
+.RE
+.B Options for overlap/dynamic programming alignments and pairwise overlap for de novo assembly.
+.RS
 .HP
 \fB\-useQuality\fR (false)
 .IP
@@ -372,8 +365,9 @@ accuracy in homolymer regions.
 \fB\-affineAlign\fR (false)
 .IP
 Refine alignment using affine guided align.
-.IP
-Options for filtering reads and alignments
+.RE
+.B Options for filtering reads and alignments
+.RS
 .HP
 \fB\-minReadLength\fR l(50)
 .IP
@@ -402,8 +396,9 @@ Maximum score to output (high is bad, negative good).
 \fB\-minPctAccuracy\fR
 .IP
 (0) Report alignments only if their percentage accuray is greater than minAccuracy.
-.IP
-Options for parallel alignment.
+.RE
+.B Options for parallel alignment
+.RS
 .HP
 \fB\-nproc\fR N (1)
 .TP
@@ -420,8 +415,9 @@ are running on the same data, for example when on a multi\-rack cluster.
 \fB\-stride\fR S (1)
 .IP
 Align one read every 'S' reads.
-.IP
-Options for subsampling reads.
+.RE
+.B Options for subsampling reads.
+.RS
 .HP
 \fB\-subsample\fR (0)
 .IP
@@ -432,15 +428,17 @@ Proportion of reads to randomly subsample (expressed as a decimal) and align.
 When specified, only align reads whose ZMW hole numbers are in LIST.
 LIST is a comma\-delimited string of ranges, such as '1,2,3,10\-13'.
 This option only works when reads are in bam, bax.h5 or plx.h5 format.
+.RE
 .TP
 \fB\-h\fR
-Print this help file.
-.PP
+Print help information.
+.SH CITATION
 To cite BLASR, please use: Chaisson M.J., and Tesler G., Mapping
 single molecule sequencing reads using Basic Local Alignment with
 Successive Refinement (BLASR): Theory and Application, BMC
 Bioinformatics 2012, 13:238.
-Please report any bugs to 'https://github.com/PacificBiosciences/blasr/issues'.
+.SH BUGS
+Please report any bugs to \fIhttps://github.com/PacificBiosciences/blasr/issues\fR.
 .SH SEE ALSO
 .BR loadPulses (1)
 .BR pls2fasta (1)

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/blasr.git



More information about the debian-med-commit mailing list