[med-svn] r6054 - trunk/packages/transtermhp/trunk/debian

Tue Feb 22 10:43:09 UTC 2011

Author: malex-guest
Date: 2011-02-22 10:43:06 +0000 (Tue, 22 Feb 2011)
New Revision: 6054

Modified:
   trunk/packages/transtermhp/trunk/debian/README.Debian
   trunk/packages/transtermhp/trunk/debian/README.source
   trunk/packages/transtermhp/trunk/debian/rules
Log:
cleanup of README.source/Debian 


Modified: trunk/packages/transtermhp/trunk/debian/README.Debian
===================================================================

--- trunk/packages/transtermhp/trunk/debian/README.Debian	2011-02-22 08:43:36 UTC (rev 6053)
+++ trunk/packages/transtermhp/trunk/debian/README.Debian	2011-02-22 10:43:06 UTC (rev 6054)
@@ -1,387 +1,6 @@
 transtermhp for Debian
 ----------------------
-TransTermHP Version 2.07
 
-CONTENTS
-    0. LICENSE & CREDITS
-    1. INSTALLATION
-    2. TRANSTERM USAGE
-    3. FORMAT OF THE TRANSTERM OUTPUT
-    4. TRANSTERM COMMAND LINE OPTIONS
-    5. RECALIBRATING USING DIFFERENT PARAMETERS
-    6. FORMAT OF THE EXPTERMS.DAT FILE
-    7. PORTING NOTES
-    8. 2NDSCORE PROGRAM
-    9. FORMAT OF .BAG FILES
-    10. USING TRANSTERM WITHOUT GENOME ANNOTATIONS
+Helper scripts are located in the /usr/share/doc/transtermhp/examples directory.
 
-0. LICENSE & CREDITS
-
-TransTermHP v. 2.0 is a complete rewrite by Carl Kingsford of TransTerm v. 1.0,
-originally written by Maria D. Ermolaeva. The first TransTermHP was described in
-the paper:
-
- [1] Maria D. Ermolaeva, Hanif G. Khalak, Owen White, Hamilton O. Smith and
-     Steven L. Salzberg. Prediction of Transcription Terminators in Bacterial
-     Genomes. J Mol Biol 301, (1), 27-33 (2000)
-
-TransTermHP v 2.0 is free software and is distributed under the GNU Public
-License. See the file LICENSE.txt included with TransTermHP for complete
-details.
-
-
-1. INSTALLATION
-
-At present, TransTermHP has only been tested on UNIX-like systems with the
-GCC/G++ compiler. To compile TransTermHP on such a system, "cd" into the
-TransTermHP src directory, and type:
-
-    make clean transterm
-
-If there are no errors reported, there should be a "transterm" executable file
-in the same directory. You can move this executable anyplace that is
-convenient. To save space, you can type:
-
-    make no_obj
-
-to remove all the .o files that were created during compilation.
-
-If you want to use TransTermHP on a non-UNIX-like system, see 'PORTING NOTES'
-below for some tips.
-
-
-2. TRANSTERM USAGE
-
-The standard usage of TransTermHP is:
-
-    transterm -p expterm.dat seq.fasta annotation.ptt > output.tt
-
-Any number of fasta and annotation files can be listed but fasta files should
-come before annotation files. The type of the file is determined by the
-extension:
-
-    .ptt               a GenBank ptt annotation file
-    .coords or .crd    a simple annotation file
-
-Each line of a .coords or .crd file has the format:
-
-    gene_name  start  end  chrom_id
-
-The chrom_id specifies which sequence the annotation should apply to. For a
-.ptt file, the chrom_id is taken to be the filename with the path and
-extension removed. A filename with any other extension is assumed to be a
-fasta file. 
-
-When processing an annotation for a chromosom with id = ID, the first word of
-the '>' lines of the input sequences are searched for ID.  Because there is no
-good standard for how the '>' line is formated, several heuristics are tried
-to find ID in the '>' line. In the order tried, they are:
-
-    >ID
-    >junk|cmr:ID|junk or junk|ID|junk
-    >junk|gi|ID|junk or >junk|gi|ID.junk|junk
-    >junk:ID
-
-The option '-p expterm.dat' uses the newest confidence scheme, where
-expterm.dat is the path to the file of that name supplied with TransTermHP. If
-'-p expterm.dat' is omited, the version 1.0 confidence scheme is used. See
-section 'COMMAND LINE OPTIONS' for more detail.
-
-
-3. FORMAT OF THE TRANSTERM OUTPUT
-
-The organism's genes are listed sorted by their end coordinate and terminators
-are output between them. A terminator entry looks like this:
-
-    TERM 19  15310 - 15327  -      F     99      -12.7 -4.0 |bidir
-    (name)   (start - end)  (sense)(loc) (conf) (hp) (tail) (notes)
-
-where 'conf' is the overall confidence score, 'hp' is the hairpin score, and
-'tail' is the tail score. 'Conf' (which ranges from 0 to 100) is what you
-probably want to use to assess the quality of a terminator. Higher is better.
-The confidence, hp score, and tail scores are described in the paper cited
-above.  'Loc' gives type of region the terminator is in:
-
-    'G' = in the interior of a gene (at least 50bp from an end),
-    'F' = between two +strand genes,
-    'R' = between two -strand genes,
-    'T' = between the ends of a +strand gene and a -strand gene,
-    'H' = between the starts of a +strand gene and a -strand gene,
-    'N' = none of the above (for the start and end of the DNA)
-
-Because of how overlapping genes are handled, these designations are not
-exclusive. 'G', 'F', or 'R' can also be given in lowercase, indicating that
-the terminator is on the opposite strand as the region.  Unless the
---all-context option is given, only candidate terminators that appear to be in
-an appropriate genome context (e.g. T, F, R) are output. 
-
-Following the TERM line is the sequence of the hairpin and the 5' and 3'
-tails, always written 5' to 3'.
-
-
-4. TRANSTERM COMMAND LINE OPTIONS
-
-You can also set how large a hairpin must be to be considered:
-
-    --min-stem=n    Stem must be n nucleotides long
-    --min-loop=n    Loop portion of the hairpin must be at least n long
-
-You can also set the maximum size of the hairpin that will be found:
-
-    --max-len=n     Total extent of hairpin <= n NT long
-    --max-loop=n    The loop portion can be no longer than n
-
-The maximum length is the total length for the hairpin portion (2 stems, 1
-loop) and does not include the U-tail. It's measured in nuceotides in the
-input sequence, so because of gaps, the actual structure may be longer than
-max-len.  Max-len must be less than the compiled-in constant REALLY_MAX_UP
-(which by default is 1000). To increase the size of structures found recompile
-after increasing this constant.
-
-TransTermHP assigns a score to the hairpin and tail portions of potential
-terminators. Lower scores are considered better. Many of the constants used in
-scoring hairpins can be set from the command line:
-
-    --gc=f       Score of a G-C pair
-    --au=f       Score of an A-U pair
-    --gu=f       Score of a G-U pair
-    --mm=f       Score of any other pair
-    --gap=f      Score of a gap in the hairpin
-
-The cost of loops of various lengths can be set using:
-
-    --loop-penalty=f1,f2,f3,f4,f5,...fn
-
-where f1 is the cost of a loop of length --min-loop, f2 is the cost of a loop
-of length --min-loop+1, as so on. If there are too few terms to cover up to
-max-loop, the last term is repeated. Thus --loop-penalty=0,2 would assign cost
-0 to any loop of length min-loop, and 2 to any longer loop (up to max-loop,
-after which longer loops are given infinite scores). Extra terms are ignored.
-
-Note that if you are using the --pval-conf confidence scheme (see below), you
-must regenerate the expterm.dat file if you change any of the above constants.
-
-To weed out any potential terminator with tail or hairpin scores that are too
-large, you can use the following options:
-
-    --max-hp-score=f    Maximum allowable hairpin score
-    --max-tail-score=f  Maximum allowable tail score
-
-Terminator hairpins must be adjacent to a "U-rich" region. You can adjust the
-constants the define what constitutes a U-rich region. Using the options:
-
-    --uwin-size=s
-    --uwin-require=r
-
-requires that there are at least r 'U' nucleotides in the s-nucleotide-long
-window adjacent to the hairpin. Again, if you change these constants, you
-should regenerate expterms.dat.
-
-Before the main output, TransTermHP will output the values of the above options
-in a format suitable to be used on the command line.
-
-In addition to the tail and hairpin scores, each possible terminator is
-assigned a confidence --- a value between 0 and 100 that indicates how likely
-it is that the sequence is a terminator. The scoring scheme needs a background
-file (supplied with TransTermHP) that is specified using:
-
-    --pval-conf expterms.dat
-
-This will use the distribution in the file expterms.dat as the background. (You can
-abreivate this as "-p expterms.dat".) Though the supplied expterms.dat file is
-derived from random sequences, any background distribution can be used by
-supplying your own expterms.dat file.  See below for the format of
-expterms.dat.  The values in expterms.dat depend on the scoring constants,
-definition of u-rich regions, and the maximum allowed tail and hp scores.
-Thus, if you change any of these constants using the options above, you should
-regenerate expterms.dat.
- 
-The main output of TransTermHP is a list of terminators interleaved between a
-listing of the gene annotations that were provided as input. This output can
-be customized in a few ways:
-
-    -S              Don't output the terminator sequences
-    --min-conf=n    Only output terminators with confidence >= n (can
-                    abbreviate this as -c n; default is 76.)
-
-Additional analysis output can be obtained with the following options:
-
-    --bag-output file.bag  Output the Best terminator After Gene
-    --t2t-perf file.t2t    Output a summary of which tail-to-tail regions
-                           have good terminators 
-
-
-5. RECALIBRATING USING DIFFERENT PARAMETERS
-
-As mentioned above, if you change any of the basic scoring function and search
-parameters and are using the version 2.0 confidence scheme (recommended) then
-you have to recompute the values in the expterm.dat file. If you have python
-installed this is easy (though perhaps time consuming). You can issue the
-command:
-    
-    % calibrate.sh newexpterms.dat [OPTIONS TO TRANSTERM]
-
-where "[OPTIONS TO TRANSTERM]" are TransTermHP options (discussed above) that
-set the parameters to what you want them to be. After calibrate.sh finishes,
-newexpterms.dat will be in the current directory and can serve as an argument
-to -p when using the same parameters you passed to calibrate.sh. 
-
-Note that for the newexpterms.dat to be valid, you must supply the same basic
-parameters to TransTermHP on subsequent runs. TransTerm (or newexpterms.dat)
-will not remember these parameters for you. The best way to handle this is to
-make a shell script wrapper around transterm that always passes in your new
-parameters.
-
-Output formating parameters do not require regeneration of expterms.dat ---
-see discussion above for which parameters expterm.dat depends on.
-
-
-6. FORMAT OF THE EXPTERMS.DAT FILE
-
-The 'pval-conf' confidence scheme, selected with the option "--pval-conf
-expterms.dat" (or '-p expterms.dat') computes the confidence of a terminator
-with HP energy E and tail energy T as follows.  First, the ranges of HP
-energies and tail energies are evenly divided into bins, and the appropriate
-bins e and t are found for E and T. Then the confidence is computed as
-described in [2].
-
-The first line of expterms.dat contains 6 numbers:
-
-   seqlen  num_bins  
-
-The (low_hp, high_hp) and (low_tail, high_tail) ranges give the bounds on the
-hairpin and tail scores. The integer num_bins gives the number of
-equally-sized bins into which those ranges are divided. Seqlen gives the
-length of the random sequence that was used to generate the data in the rest
-of the file.
-
-Following this line are any number of (at, R, M) triples, where 'at' is the AT
-content, R is a 4-tuple (low_hp, high_hp, low_tail, high_tail) giving the
-range of the HP and tail scores observed in random sequences of this AT
-content, and M is the distribution matrix.  These (at, R, M) triples are
-formated as follows:
-
-   at  low_hp  high_hp  low_tail  high_tail
-   n11 n12 n13 n14 ... n1,num_bins
-   n21  ...
-   ...
-   n_num_bins,1 ...
-
-The mu_r(e,t) term is computed by selecting the matrix with the at value
-closest to the computed %AT of the region r. If the total length of region r
-sequence is L_r, then
-
-  mu_r(e,t) = n_t_e * L_r/seqlen
-
-where n_t_e is the entry in the t-th row and e-th column of the selected
-matrix, and seqlen is the first number in the first line of the file.
-
-
-7. PORTING NOTES
-
-If you want to run TransTermHP on a non-UNIX-like system, you should take note
-of the following:
-
-* gene-reader.cc assumes that the filename extension separators is "." and the
-  path separator is "/".
-
-* getopt_long() is used to process the command line arguments.
-
-
-8. 2NDSCORE PROGRAM
-
-The package also comes with a program '2ndscore' which will find the best
-hairpin anchored at each position. The basic usage is:
-
-    2ndscore in.fasta > out.hairpins
-
-For every position in the sequence this will output a line:
-
-   -0.6  52 ..  62      TTCCTAAAGGTTCCA  GCG CAAAA TGC  CATAAGCACCACATT
- (score) (start .. end) (left context)   (hairpin)      (right contenxt)
-
-For positions near the ends of the sequences, the context may be padded with
-'x' characters. If no hairpin can be found, the score will be 'None'.
-
-Multiple fasta files can be given and multiple sequences can be in each fasta
-file. The output for each sequence will be separated by a line starting with
-'>' and containing the FASTA description of the sequence.
-
-Because the hairpin scores of the plus-strand and minus-strand may differ (due
-to GU binding in RNA), by default 2ndscore outputs two sets of hairpins for
-every sequence: the FORWARD hairpins and the REVERSE hairpins. All the forward
-hairpins are output first, and are identified by having the word 'FORWARD' at
-the end of the '>' line preceding them. Similarly, the REVERSE hairpins are
-listed after a '>' line ending with 'REVERSE'. If you want to search only one
-or the other strand, you can use:
-
-    --no-fwd    Don't print the FORWARD hairpins
-    --no-rvs    Don't print the REVERSE hairpins
-
-You can set the energy function used, just as with transterm with the --gc,
---au, --gu, --mm, --gap options. The --min-loop, --max-loop, and --max-len
-options are also supported.
-
-9. FORMAT OF THE .BAG FILES
-
-The columns for the .bag files are, in order:
-
-	1. gene_name	
-	2. terminator_start
-	3. terminator_end
-	4. hairpin_score
-	5. tail_score	
-	6. terminator_sequence
-
-    7. terminator_confidence: a combination of the hairpin and tail score that
-       takes into account how likely such scores are in a random sequence. This
-       is the main "score" for the terminator and is computed as described in
-       the paper.
-
-    8. APPROXIMATE_distance_from_end_of_gene: The *approximate* number of base
-       pairs between the end of the gene and the start of the terminator. This
-       is approximate in several ways: First, (and most important) TransTermHP
-       doesn't always use the real gene ends. Depending on the options you give
-       it may trim some off the ends of genes to handle terminators that
-       partially overlap with genes. Second, where the terminator "begins"
-       isn't that well defined.  This field is intended only for a sanity check
-       (terminators reported to be the best near the ends of genes shouldn't be
-       _too far_ from the end of the gene).
-
-
-10. USING TRANSTERM WITHOUT GENOME ANNOTATIONS
-
-TransTermHP uses known gene information for only 3 things: (1) tagging the
-putative terminators as either "inside genes" or "intergenic," (2) choosing the
-background GC-content percentage to compute the scores, because genes often
-have different GC content than the intergenic regions, and (3) producing
-slightly more readable output. Items (1) and (3) are not really necessary, and
-(2) has no effect if your genes have about the same GC-content as your
-intergenic regions.
-
-Unfortunately, TransTermHP doesn't yet have a simple option to run without an
-annotation file (either .ptt or .coords), and requires at least 2 genes to be
-present. The solution is to create fake, small genes that flank each
-chromosome. To do this, make a fake.coords file that contains only these two
-lines:
-
-	fakegene1	1 2	chome_id
-	fakegene2	L-1 L	chrom_id
-
-where L is the length of the input sequence and L-1 is 1 less than the length
-of the input sequence. "chrom_id" should be the word directly following the ">"
-in the .fasta file containing your sequence. (If, for example, your .fasta file
-began with ">seq1", then chrom_id = seq1).
-
-This creates a "fake" annotation with two 1-base-long genes flanking the
-sequence in a tail-to-tail arrangement: --> <--. TransTermHP can then be run
-with:
-
-	transterm -p expterm.dat sequence.fasta fake.coords
-
-If the G/C content of your intergenic regions is about the same as your genes,
-then this won't have too much of an effect on the scores terminators receive.
-On the other hand, this use of TransTermHP hasn't been tested much at all, so
-it's hard to vouch for its accuracy.
-
- -- Alex Mestiashvili <alex at biotec.tu-dresden.de>  Sat, 19 Feb 2011 12:36:10 +0000
+ -- Alex Mestiashvili <alex at biotec.tu-dresden.de>  Sat, 19 Feb 2011 12:37:10 +0000

Modified: trunk/packages/transtermhp/trunk/debian/README.source
===================================================================
--- trunk/packages/transtermhp/trunk/debian/README.source	2011-02-22 08:43:36 UTC (rev 6053)
+++ trunk/packages/transtermhp/trunk/debian/README.source	2011-02-22 10:43:06 UTC (rev 6054)
@@ -1,17 +1,9 @@
 transtermhp for Debian
 ----------------------
 
-7. PORTING NOTES
+The source is distributed as a .zip file.
+To get the source code suitable for debian scripts one can run  ./debian/rules get-orig-source 
+In this case the source is cleaned from the binaries .
+Another way is to use uscan tool .
 
-If you want to run TransTermHP on a non-UNIX-like system, you should take note
-of the following:
-
-* gene-reader.cc assumes that the filename extension separators is "." and the
-  path separator is "/".
-
-* getopt_long() is used to process the command line arguments.
-
-
-
-
-
+-- Alex Mestiashvili <alex at biotec.tu-dresden.de>  Sat, 19 Feb 2011 12:36:10 +0000

Modified: trunk/packages/transtermhp/trunk/debian/rules
===================================================================
--- trunk/packages/transtermhp/trunk/debian/rules	2011-02-22 08:43:36 UTC (rev 6053)
+++ trunk/packages/transtermhp/trunk/debian/rules	2011-02-22 10:43:06 UTC (rev 6054)
@@ -13,7 +13,7 @@
 	dh $@ 
 
 # test: target in Makefile is broken.
-# don't forget to notify upstream .
+# don't forget to notify upstream.
 override_dh_auto_test:
 
 get-orig-source: