[Debian-med-packaging] seg patches from work for Debian

Laszlo Kajan lkajan at rostlab.org
Thu Jul 5 11:21:10 UTC 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dear John!

Please find attached the patches I have created while working on 'seg' in order to get it into Debian.

If you think they are valuable, you may want to apply them to your upstream sources.

Thank you for writing seg and having it made available to the bioinformatics - and now the wider Debian - community!

Best regards,

Laszlo Kajan
Rost Lab / Debian Maintainer
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJP9XilAAoJEJvS1kCaDFL6encQANBdxOeV/Si0n7Hg6D+lR9hn
1Bg/uV0fAdvjhkySlPFVMZ1onI4pn4fLV8jYPO1iMRv2SHGnqMb9zOtEvMhzBvbz
iTbtV3E9uswcGaXGoWp5Zg787qovsuHmgDPoyG00aW7O0ntmk9DBC9OyUTsXoiqr
qIKLf8Xr/+bM5aNVBRo+L7okGEn5Cvqv0EGd6dNbmqlebhh1Z/ch/D7pXg4N9T8l
ZnlGJsfK6PLby10mdNlfkJ1M26lWKJFgWttcSX1IlxNylm/4dCrHtywQQFvzoJdD
ikxmAq17b3J7VlsLz2M1zMBxxMDOncSgT5NbWxs/s4adg4cenJM7Ky1R/Dc2NA6Q
kAbzdT/4lAHcj4MHlPu3lCSDZcrjV446b/CgVjtD09DCsYeR+2pVLCv+5YQlqDKF
EBT0Df0jyX42O7PjBljJZs7Qgj+yfI630ypE23AULCUQ3ugA5j3Nml9J197qR2Xb
3Lc2gE7pxGbZz5meznpuordYXrbnTvwWhUA0UceUnl4FOo9F/YU0X60HQgRV6ono
6Wyvg7aKHP+54p1N6NURAMDt0uki13+cgOg3M1lh4UulncR+DBGRDs3h9Z7EEqzM
Uym6LTaPdBfH4ZxcbGTmJSMaGKVP7MXDQxXeM+ws6E6ZdR3flFoZGsN1DrW+6IWE
jkAok/e4SDxqy9mxfhQg
=BiMQ
-----END PGP SIGNATURE-----
-------------- next part --------------
makefile
example
seg.c
genwin.c
autotools
seg.pod
-------------- next part --------------
Description: using autotools build system
 Upstream uses a simple makefile.
From: Laszlo Kajan <lkajan at rostlab.org>
Forwarded: no

Index: seg-1994101801/configure.ac
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ seg-1994101801/configure.ac	2012-07-04 20:31:38.946824368 +0000
@@ -0,0 +1,7 @@
+AC_INIT([seg], [1994101801])
+AC_CONFIG_SRCDIR([seg.c])
+AM_INIT_AUTOMAKE
+AC_CONFIG_FILES([Makefile])
+AC_PROG_CC
+
+AC_OUTPUT
Index: seg-1994101801/Makefile.am
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ seg-1994101801/Makefile.am	2012-07-04 20:31:38.946824368 +0000
@@ -0,0 +1,14 @@
+man_MANS = seg.1
+
+bin_PROGRAMS = seg
+
+seg_SOURCES = seg.c genwin.c genwin.h lnfac.h
+
+LDADD = -lm
+
+seg.1: seg.pod
+	sed -e 's|__datadir__|$(datadir)|g;s|__docdir__|$(docdir)|g;s|__pkgdatadir__|$(pkgdatadir)|g;s|__PREFIX__|$(prefix)|g;s|__sysconfdir__|$(sysconfdir)|g;s|__VERSION__|$(VERSION)|g;' "$<" | \
+	pod2man -c 'User Commands' -r "$(VERSION)" -name $(shell echo "$(basename $@)" | tr '[:lower:]' '[:upper:]') > "$@"
+
+clean-local:
+	rm -f $(man_MANS)
Index: seg-1994101801/AUTHORS
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ seg-1994101801/AUTHORS	2012-07-04 20:31:38.946824368 +0000
@@ -0,0 +1,4 @@
+John C. Wootton, Scott Federhen
+National Center For Biotechnology Information
+National Library of Medicine
+National Institutes of Health
Index: seg-1994101801/ChangeLog
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ seg-1994101801/ChangeLog	2012-07-04 20:31:38.950824087 +0000
@@ -0,0 +1,52 @@
+
+This directory contains C language source code for the SEG program of Wootton
+and Federhen, for identifying and masking segments of low compositional
+complexity in amino acid sequences.  This program is inappropriate for
+masking nucleotide sequences and, in fact, may strip some nucleotide
+ambiguity codes from nt. sequences as they are being read.
+
+The SEG program can be used as a plug-in filter of query sequences used in the
+NCBI BLAST programs.  See the -filter and -echofilter options described in the
+BLAST software's manual page.
+
+Input to SEG must be sequences in FASTA format.  Output can be produced in a
+variety of formats, with FASTA format being one of them when the -x option is
+used.  The file seg.doc includes a copy of the man page for the seg program.
+
+
+References:
+Wootton, J. C. and S. Federhen (1993).  Statistics of local complexity in amino
+acid sequences and sequence databases.  Computers and Chemistry 17:149-163.
+
+
+MODIFICATION HISTORY
+10/18/94
+Fixed a bug in the boundary conditions for the alphabet assignments
+(colorings) calculations. This condition seems not to arise in the
+current protein sequence databases, but does appear when the algorithm
+is customized for the nucleic acid alphabet.
+
+4/2/94
+Fixed a bug in the reading of input sequence files.  B, Z, and U letters found
+in the IUB amino acid alphabet and the NCBI standard amino acid alphabet
+were being stripped.
+
+3/30/94
+WRG improved speed by about 3X (roughly 5X overall since 3/21/94), due in part
+to the elimination of nearly all log() function calls, plus the removal of much
+unused or unnecessary code.
+
+3/21/94
+Included support for the special characters "*" (translation stop) and "-"
+(gap) which are found in some NCBI standard amino acid alphabets.
+
+WRG replaced repetitive dynamic calls to log(2.) and log(20.) with precomputed
+values, yielding a 33-50% speed improvement.
+
+WRG added EOF checks in several places, the lack of which could produce
+infinite looping.
+
+The previous version of seg is archived beneath the archive subdirectory.
+
+9/30/97
+HMF5 plugged a memory leak.
Index: seg-1994101801/NEWS
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ seg-1994101801/NEWS	2012-07-04 20:31:38.950824087 +0000
@@ -0,0 +1 @@
+2012-07-04	Debianization of seg.
Index: seg-1994101801/COPYING
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ seg-1994101801/COPYING	2012-07-04 20:32:08.274824880 +0000
@@ -0,0 +1,24 @@
+                            PUBLIC DOMAIN NOTICE
+               National Center for Biotechnology Information
+
+  This software/database is a "United States Government Work" under the
+  terms of the United States Copyright Act.  It was written as part of
+  the authors' official duties as United States Government employees and
+  thus cannot be copyrighted.  This software/database is freely available
+  to the public for use. The National Library of Medicine and the U.S.
+  Government have not placed any restriction on its use or reproduction.
+
+  Although all reasonable efforts have been taken to ensure the accuracy
+  and reliability of the software and data, the NLM and the U.S.
+  Government do not and cannot warrant the performance or results that
+  may be obtained by using this software or data. The NLM and the U.S.
+  Government disclaim all warranties, express or implied, including
+  warranties of performance, merchantability or fitness for any particular
+  purpose.
+
+  Please cite the authors in any work or product based on this material.
+
+ Authors: John C. Wootton, Scott Federhen
+          National Center For Biotechnology Information
+          National Library of Medicine
+          National Institutes of Health
-------------- next part --------------
Description: add an example
 Upstream has an example in the text documentation file.
From: Laszlo Kajan <lkajan at rostlab.org>
Forwarded: no

Index: seg-1994101801/prion.fa
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ seg-1994101801/prion.fa	2012-07-04 19:08:07.066840944 +0000
@@ -0,0 +1,6 @@
+>PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR
+MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQP
+HGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGA
+VVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV
+NITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV
+ILLISFLIFLIVG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: genwin.c
Type: text/x-csrc
Size: 1198 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/debian-med-packaging/attachments/20120705/bb193822/attachment-0002.c>
-------------- next part --------------
Description: rename original makefile
From: Laszlo Kajan <lkajan at rostlab.org>
Forwarded: no

Index: seg-1994101801/makefile
===================================================================
--- seg-1994101801.orig/makefile	2012-07-04 19:09:41.530825609 +0000
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,15 +0,0 @@
-
-all : seg
-
-seg : seg.c lnfac.h genwin.h genwin.o
-	cc -O -o seg seg.c genwin.o -lm
-
-hiseg : hiseg.c lnfac.h genwin.h genwin.o
-	cc -O -o hiseg hiseg.c genwin.o -lm
-
-genwin.o : genwin.c genwin.h
-	cc -O -c genwin.c
-
-clean:
-	rm -f seg seg.o genwin.o
-
Index: seg-1994101801/makefile.old
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ seg-1994101801/makefile.old	2012-07-04 19:09:41.530825609 +0000
@@ -0,0 +1,15 @@
+
+all : seg
+
+seg : seg.c lnfac.h genwin.h genwin.o
+	cc -O -o seg seg.c genwin.o -lm
+
+hiseg : hiseg.c lnfac.h genwin.h genwin.o
+	cc -O -o hiseg hiseg.c genwin.o -lm
+
+genwin.o : genwin.c genwin.h
+	cc -O -c genwin.c
+
+clean:
+	rm -f seg seg.o genwin.o
+
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seg.c
Type: text/x-csrc
Size: 481 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/debian-med-packaging/attachments/20120705/bb193822/attachment-0003.c>
-------------- next part --------------
Description: add man page source in POD format
 Upstream has documentation in text format - the basis of this patch.
From: Laszlo Kajan <lkajan at rostlab.org>
Forwarded: no

Index: seg-1994101801/seg.pod
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ seg-1994101801/seg.pod	2012-07-04 20:45:27.206822408 +0000
@@ -0,0 +1,290 @@
+=head1 NAME
+
+seg - segment sequence(s) by local complexity
+
+=head1 SYNOPSIS
+
+seg sequence [ W ] [ K(1) ] [ K(2) ] [ -x ] [ options ]
+
+=head1 DESCRIPTION
+
+seg divides sequences into contrasting segments of low-complexity
+and high-complexity.  Low-complexity segments defined by the
+algorithm represent "simple sequences" or "compositionally-biased
+regions".
+
+Locally-optimized low-complexity segments are produced at defined
+levels of stringency, based on formal definitions of local
+compositional complexity (Wootton & Federhen, 1993).  The segment
+lengths and the number of segments per sequence are determined
+automatically by the algorithm.
+
+The input is a FASTA-formatted sequence file, or a database file
+containing many FASTA-formatted  sequences.  seg is tuned for amino
+acid sequences.  For nucleotide sequences, see EXAMPLES OF
+PARAMETER SETS below.
+
+The stringency of the search for low-complexity segments is
+determined by three user-defined parameters, trigger window length
+[ W ], trigger complexity [ K(1) ] and extension complexity [ K(2)]
+(see below under PARAMETERS ).  The defaults provided are suitable
+for low-complexity masking of database search query sequences [ -x
+option required, see below].
+
+
+=head1 OUTPUTS AND APPLICATIONS
+
+(1) Readable segmented sequence [Default].  Regions of contrasting
+complexity are displayed in "tree format".  See EXAMPLES.
+
+(2) Low-complexity masking (see Altschul et al, 1994).  Produce a
+masked FASTA-formatted file, ready for  input as a query sequence for
+database search programs such as BLAST or FASTA.  The amino acids in
+low-complexity regions are replaced with "x" characters [-x option].
+See EXAMPLES.
+
+(3) Database construction.  Produce FASTA-formatted files containing
+low-complexity segments [-l  option], or high-complexity segments
+[-h option], or both [-a option].  Each segment is a separate
+sequence entry with an informative header line.
+
+=head1 ALGORITHM
+
+The SEG algorithm has two stages.  First, identification of
+approximate raw segments of low- complexity; second local
+optimization.
+
+At the first stage, the stringency and resolution of the search for
+low-complexity segments is determined  by the W, K(1) and K(2)
+parameters.  All trigger windows are defined, including overlapping
+windows, of length W and complexity less than or equal to K(1).
+"Complexity" here is defined by equation  (3) of Wootton & Federhen
+(1993).  Each trigger window is then extended into a contig in both
+directions by merging with extension windows, which are overlapping
+windows of length W and complexity  less than or equal to K(2).
+Each contig is a raw segment.
+
+At the second stage, each raw segment is reduced to a single
+optimal low-complexity segment, which  may be the entire raw
+segment but is usually a subsequence.  The optimal subsequence has
+the lowest  value of the probability P(0) (equation (5) of Wootton
+& Federhen, 1993).
+
+=head1 PARAMETERS
+
+These three numeric parameters are in obligatory order after the
+sequence file name.
+
+Trigger window length [ W ].  An integer greater than zero [ Default
+12 ].
+
+Trigger complexity. [ K1 ].  The maximum complexity of a trigger
+window in units of bits. K1 must  be equal to or greater than zero.
+The maximum value is 4.322 (log[base 2]20) for amino acid
+sequences [ Default 2.2 ].
+
+Extension complexity [ K2 ].  The maximum complexity of an extension
+window in units of bits.  Only values greater than K1 are effective
+in extending triggered windows.  Range of possible values is as for
+K1 [ Default 2.5 ].
+
+
+=head1 OPTIONS
+
+The following options may be placed in any order in the command
+line after the W, K1 and K2 parameters:
+
+=over
+
+=item  -a
+
+Output both low-complexity and high-complexity segments in a
+FASTA-formatted file, as a set of  separate entries with header
+lines.
+
+=item  -c  [characters-per-line]
+
+Number of sequence characters per line of
+output [Default 60].  Other characters, such as residue numbers, are additional.
+
+=item  -h
+
+Output only the high-complexity segments in a FASTA-formatted
+file, as a set of separate entries  with header lines.
+
+=item  -l
+
+Output only the low-complexity segments in a FASTA-formatted
+file, as a set of separate entries with  header lines.
+
+=item -m  [length]
+
+Minimum length in residues for a high-complexity
+segment [default 0].  Shorter segments are merged with adjacent
+low-complexity segments.
+
+=item -o
+
+Show all overlapping, independently-triggered low-complexity
+segments [these are merged by default].
+
+=item  -q
+
+Produce an output format with the sequence in a numbered block
+with markings to assist residue counting.  The low-complexity and
+high-complexity segments are in lower- and upper-case characters
+respectively.
+
+=item  -t  [length]
+
+"Maximum trim length" parameter [default 100]. This
+controls the search space (and  search time) during the
+optimization of raw segments (see ALGORITHM above).  By default,
+subsequences 100 or more residues shorter than the raw segment are
+omitted from the search. This parameter may be increased to give
+a more extensive search if raw segments are longer than 100 residues.
+
+=item -x
+
+The masking option for amino acid sequences.  Each input
+sequence is represented by a single output sequence in FASTA-format
+with low-complexity regions replaced by strings of "x" characters.
+
+=back
+
+=head1 EXAMPLES OF PARAMETER SETS
+
+Default parameters are given by 'seg sequence' (equivalent to 'seg
+sequence 12 2.2 2.5').  These  parameters are appropriate for low-
+complexity masking of many amino acid sequences [with -x option  ].
+
+=head2 Database-database comparisons:
+
+More stringent (lower) complexity parameters are suitable when
+masked sequences are compared with masked sequences.  For example,
+for BLAST or FASTA searches that compare two amino acid sequence
+databases, the following masking may be applied to both databases:
+
+  seg database 12 1.8 2.0 -x
+
+=head2 Homopolymer analysis:
+
+To examine all homopolymeric subsequences of length (for example)
+7 or greater:
+
+  seg sequence 7 0 0
+
+=head2 Non-globular regions of protein sequences:
+
+Many long non-globular domains may be diagnosed at longer window
+lengths, typically:
+
+  seg sequence 45 3.4 3.75
+
+For some shorter non-globular domains, the following set is
+appropriate:
+
+  seg sequence 25 3.0 3.3
+
+=head2 Nucleotide sequences:
+
+The maximum value of the complexity parameters is 2 (log[base 2]4).
+For masking, the following is approximately equivalent in effect
+to the default parameters for amino acid sequences:
+
+  seg sequence.na 21 1.4 1.6
+
+=head1 EXAMPLES
+
+The following is a file named 'prion' in FASTA format:
+
+ >PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR
+ MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQP
+ HGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGA
+ VVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV
+ NITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV
+ ILLISFLIFLIVG
+
+The command line:
+
+ seg __docdir__/examples/prion.fa
+
+gives the standard output below
+
+
+ >PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR
+
+                                   1-49   MANLGCWMLVLFVATWSDLGLCKKRPKPGG
+                                          WNTGGSRYPGQGSPGGNRY
+ ppqggggwgqphgggwgqphgggwgqphgg   50-94
+                gwgqphgggwgqggg
+                                  95-112  THSQWNKPSKPKTNMKHM
+        agaaaagavvgglggymlgsams  113-135
+                                 136-187  RPIIHFGSDYEDRYYRENMHRYPNQVYYRP
+                                          MDEYSNQNNFVHDCVNITIKQH
+                 tvttttkgenftet  188-201
+                                 202-236  DVKMMERVVEQMCITQYERESQAYYQRGSS
+                                          MVLFS
+               sppvillisflifliv  237-252
+                                 253-253  G
+
+The low-complexity sequences are on the left (lower case) and
+high-complexity sequences are on the right (upper case).  All
+sequence segments read from left to right and their order in the
+sequence is from top to bottom, as shown by the central column of
+residue numbers.
+
+The command line:
+
+  seg __docdir__/examples/prion.fa -x
+
+gives the following FASTA-formatted file:-
+
+ >PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR
+ MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYxxxxxxxxxxx
+ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxTHSQWNKPSKPKTNMKHMxxxxxxxx
+ xxxxxxxxxxxxxxxRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV
+ NITIKQHxxxxxxxxxxxxxxDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSxxxx
+ xxxxxxxxxxxxG
+
+=head1 SEE ALSO
+
+segn(1), blast(1), saps(1), xnu(1)
+
+=head1 AUTHORS
+
+John Wootton:     wootton at ncbi.nlm.nih.gov
+
+Scott Federhen:   federhen at ncbi.nlm.nih.gov
+
+ National Center for Biotechnology Information
+ Building 38A, Room 8N805
+ National Library of Medicine
+ National Institutes of Health
+ Bethesda, Maryland, MD 20894
+ U.S.A.
+
+
+=head1 PRIMARY REFERENCE
+
+Wootton, J.C., Federhen, S. (1993)  Statistics of local complexity
+in amino acid sequences and sequence  databases.  Computers &
+Chemistry 17: 149-163.
+
+
+=head1 OTHER REFERENCES
+
+Wootton, J.C. (1994)  Non-globular domains in protein sequences:
+automated segmentation using complexity measures.  Computers &
+Chemistry 18: (in press).
+
+Altschul, S.F., Boguski, M., Gish, W., Wootton, J.C. (1994)  Issues
+in searching molecular sequence  databases.  Nature Genetics 6:
+119-129.
+
+Wootton, J.C. (1994)  Simple sequences of protein and DNA. In:
+Nucleic Acid and Protein Sequence  Analysis: A Practical Approach.
+(Second Edition, Chapter 8, Bishop, M.J. and Rawlings, C.R. Eds.
+IRL  Press, Oxford) (In press).
+
+
Index: seg-1994101801/seg.doc
===================================================================
--- seg-1994101801.orig/seg.doc	2012-07-04 20:45:24.878825058 +0000
+++ seg-1994101801/seg.doc	2012-07-04 20:45:27.206822408 +0000
@@ -13,12 +13,12 @@
      seg sequence [ W ] [ K(1) ] [ K(2) ] [ -x ] [ options ]
 
 
-DESCRIPTION     
+DESCRIPTION
 -----------
 
 seg divides sequences into contrasting segments of low-complexity
 and high-complexity.  Low-complexity segments defined by the
-algorithm represent "simple sequences" or "compositionally-biased 
+algorithm represent "simple sequences" or "compositionally-biased
 regions".
 
 Locally-optimized low-complexity segments are produced at defined
@@ -29,36 +29,36 @@
 
 The input is a FASTA-formatted sequence file, or a database file
 containing many FASTA-formatted  sequences.  seg is tuned for amino
-acid sequences.  For nucleotide sequences, see EXAMPLES OF 
+acid sequences.  For nucleotide sequences, see EXAMPLES OF
 PARAMETER SETS below.
 
 The stringency of the search for low-complexity segments is
 determined by three user-defined parameters, trigger window length
-[ W ], trigger complexity [ K(1) ] and extension complexity [ K(2)]  
+[ W ], trigger complexity [ K(1) ] and extension complexity [ K(2)]
 (see below under PARAMETERS ).  The defaults provided are suitable
 for low-complexity masking of database search query sequences [ -x
 option required, see below].
 
 
-OUTPUTS AND APPLICATIONS   
+OUTPUTS AND APPLICATIONS
 ------------------------
 
 (1) Readable segmented sequence [Default].  Regions of contrasting
 complexity are displayed in "tree format".  See EXAMPLES.
 
-(2) Low-complexity masking (see Altschul et al, 1994).  Produce a 
+(2) Low-complexity masking (see Altschul et al, 1994).  Produce a
 masked FASTA-formatted file, ready for  input as a query sequence for
-database search programs such as BLAST or FASTA.  The amino acids in 
-low-complexity regions are replaced with "x" characters [-x option]. 
+database search programs such as BLAST or FASTA.  The amino acids in
+low-complexity regions are replaced with "x" characters [-x option].
 See EXAMPLES.
 
 (3) Database construction.  Produce FASTA-formatted files containing
 low-complexity segments [-l  option], or high-complexity segments
-[-h option], or both [-a option].  Each segment is a separate 
+[-h option], or both [-a option].  Each segment is a separate
 sequence entry with an informative header line.
 
 
-ALGORITHM     
+ALGORITHM
 ---------
 
 The SEG algorithm has two stages.  First, identification of
@@ -81,7 +81,7 @@
 the lowest  value of the probability P(0) (equation (5) of Wootton
 & Federhen, 1993).
 
-PARAMETERS     
+PARAMETERS
 ----------
 
 These three numeric parameters are in obligatory order after the
@@ -92,16 +92,16 @@
 
 Trigger complexity. [ K1 ].  The maximum complexity of a trigger
 window in units of bits. K1 must  be equal to or greater than zero.
-The maximum value is 4.322 (log[base 2]20) for amino acid 
+The maximum value is 4.322 (log[base 2]20) for amino acid
 sequences [ Default 2.2 ].
 
 Extension complexity [ K2 ].  The maximum complexity of an extension
 window in units of bits.  Only values greater than K1 are effective
-in extending triggered windows.  Range of possible values is as for 
+in extending triggered windows.  Range of possible values is as for
 K1 [ Default 2.5 ].
 
 
-OPTIONS     
+OPTIONS
 -------
 
 The following options may be placed in any order in the command
@@ -112,7 +112,7 @@
     lines.
 
 -c  [characters-per-line] Number of sequence characters per line of
-    output [Default 60].  Other characters, such as residue numbers, 
+    output [Default 60].  Other characters, such as residue numbers,
     are additional.
 
 -h  Output only the high-complexity segments in a FASTA-formatted
@@ -122,7 +122,7 @@
     file, as a set of separate entries with  header lines.
 
 -m  [length] Minimum length in residues for a high-complexity
-    segment [default 0].  Shorter segments are merged with adjacent 
+    segment [default 0].  Shorter segments are merged with adjacent
     low-complexity segments.
 
 -o  Show all overlapping, independently-triggered low-complexity
@@ -130,7 +130,7 @@
 
 -q  Produce an output format with the sequence in a numbered block
     with markings to assist residue counting.  The low-complexity and
-    high-complexity segments are in lower- and upper-case characters 
+    high-complexity segments are in lower- and upper-case characters
     respectively.
 
 -t  [length] "Maximum trim length" parameter [default 100]. This
@@ -145,7 +145,7 @@
     with low-complexity regions replaced by strings of "x" characters.
 
 
-EXAMPLES OF PARAMETER SETS  
+EXAMPLES OF PARAMETER SETS
 --------------------------
 
 Default parameters are given by 'seg sequence' (equivalent to 'seg
@@ -154,48 +154,48 @@
 
 Database-database comparisons:
 -----------------------------
-More stringent (lower) complexity parameters are suitable when  
-masked sequences are compared with masked sequences.  For example, 
-for BLAST or FASTA searches that compare two amino acid sequence  
+More stringent (lower) complexity parameters are suitable when
+masked sequences are compared with masked sequences.  For example,
+for BLAST or FASTA searches that compare two amino acid sequence
 databases, the following masking may be applied to both databases:
 
   seg database 12 1.8 2.0 -x
 
 Homopolymer analysis:
 --------------------
-To examine all homopolymeric subsequences of length (for example) 
+To examine all homopolymeric subsequences of length (for example)
 7 or greater:
 
-  seg sequence 7 0 0 
+  seg sequence 7 0 0
 
 Non-globular regions of protein sequences:
 -----------------------------------------
-Many long non-globular domains may be diagnosed at longer window  
+Many long non-globular domains may be diagnosed at longer window
 lengths, typically:
 
   seg sequence 45 3.4 3.75
 
-For some shorter non-globular domains, the following set is  
+For some shorter non-globular domains, the following set is
 appropriate:
 
   seg sequence 25 3.0 3.3
 
 Nucleotide sequences:
 --------------------
-The maximum value of the complexity parameters is 2 (log[base 2]4). 
-For masking, the following is approximately equivalent in effect 
+The maximum value of the complexity parameters is 2 (log[base 2]4).
+For masking, the following is approximately equivalent in effect
 to the default parameters for amino acid sequences:
 
   seg sequence.na 21 1.4 1.6
 
-EXAMPLES     
+EXAMPLES
 The following is a file named 'prion' in FASTA format:
 
 >PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR
-MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQP 
-HGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGA 
-VVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV 
-NITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV 
+MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQP
+HGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGTHSQWNKPSKPKTNMKHMAGAAAAGA
+VVGGLGGYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV
+NITIKQHTVTTTTKGENFTETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSSPPV
 ILLISFLIFLIVG
 
 The command line:
@@ -221,8 +221,8 @@
               sppvillisflifliv  237-252
                                 253-253  G
 
-The low-complexity sequences are on the left (lower case) and 
-high-complexity sequences are on the right (upper case).  All  
+The low-complexity sequences are on the left (lower case) and
+high-complexity sequences are on the right (upper case).  All
 sequence segments read from left to right and their order in the
 sequence is from top to bottom, as shown by the central column of
 residue numbers.
@@ -234,21 +234,21 @@
 gives the following FASTA-formatted file:-
 
 >PRIO_HUMAN MAJOR PRION PROTEIN PRECURSOR
-MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYxxxxxxxxxxx 
-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxTHSQWNKPSKPKTNMKHMxxxxxxxx 
-xxxxxxxxxxxxxxxRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV 
-NITIKQHxxxxxxxxxxxxxxDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSxxxx 
+MANLGCWMLVLFVATWSDLGLCKKRPKPGGWNTGGSRYPGQGSPGGNRYxxxxxxxxxxx
+xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxTHSQWNKPSKPKTNMKHMxxxxxxxx
+xxxxxxxxxxxxxxxRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCV
+NITIKQHxxxxxxxxxxxxxxDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSxxxx
 xxxxxxxxxxxxG
 
 
 
-SEE ALSO    
+SEE ALSO
 --------
 
 segn, blast, saps, xnu
 
 
-AUTHORS     
+AUTHORS
 -------
 
 John Wootton:     wootton at ncbi.nlm.nih.gov
@@ -262,7 +262,7 @@
 U.S.A.
 
 
-PRIMARY REFERENCE    
+PRIMARY REFERENCE
 -----------------
 
 Wootton, J.C., Federhen, S. (1993)  Statistics of local complexity
-------------- next part --------------
A non-text attachment was scrubbed...
Name: series.sig
Type: application/octet-stream
Size: 543 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/debian-med-packaging/attachments/20120705/bb193822/attachment-0007.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: autotools.sig
Type: application/octet-stream
Size: 543 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/debian-med-packaging/attachments/20120705/bb193822/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: example.sig
Type: application/octet-stream
Size: 543 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/debian-med-packaging/attachments/20120705/bb193822/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: genwin.c.sig
Type: application/octet-stream
Size: 543 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/debian-med-packaging/attachments/20120705/bb193822/attachment-0010.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makefile.sig
Type: application/octet-stream
Size: 543 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/debian-med-packaging/attachments/20120705/bb193822/attachment-0011.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seg.c.sig
Type: application/octet-stream
Size: 543 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/debian-med-packaging/attachments/20120705/bb193822/attachment-0012.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seg.pod.sig
Type: application/octet-stream
Size: 543 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/debian-med-packaging/attachments/20120705/bb193822/attachment-0013.obj>


More information about the Debian-med-packaging mailing list