[med-svn] [aragorn] 01/12: Imported Upstream version 1.2.37
Sascha Steinbiss
satta at debian.org
Sun Jul 3 07:55:15 UTC 2016
This is an automated email from the git hooks/post-receive script.
satta pushed a commit to branch master
in repository aragorn.
commit 39fb3f7a4d4ff990d2abaee52c2806e92b8526dd
Author: Sascha Steinbiss <satta at debian.org>
Date: Sun Jul 3 07:27:49 2016 +0000
Imported Upstream version 1.2.37
---
aragorn.1 | 390 ----
aragorn1.2.36.c => aragorn1.2.37.c | 4196 +++++++++++++++++++++++-------------
manpage.1.src | 273 +++
3 files changed, 2997 insertions(+), 1862 deletions(-)
diff --git a/aragorn.1 b/aragorn.1
deleted file mode 100644
index 617405f..0000000
--- a/aragorn.1
+++ /dev/null
@@ -1,390 +0,0 @@
-'\" t
-.\" Title: aragorn
-.\" Author: [see the "AUTHORS" section]
-.\" Generator: DocBook XSL Stylesheets v1.76.1 <http://docbook.sf.net/>
-.\" Date: 02/24/2013
-.\" Manual: \ \&
-.\" Source: \ \&
-.\" Language: English
-.\"
-.TH "ARAGORN" "1" "02/24/2013" "\ \&" "\ \&"
-.\" -----------------------------------------------------------------
-.\" * Define some portability stuff
-.\" -----------------------------------------------------------------
-.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.\" http://bugs.debian.org/507673
-.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
-.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.ie \n(.g .ds Aq \(aq
-.el .ds Aq '
-.\" -----------------------------------------------------------------
-.\" * set default formatting
-.\" -----------------------------------------------------------------
-.\" disable hyphenation
-.nh
-.\" disable justification (adjust text to left margin only)
-.ad l
-.\" -----------------------------------------------------------------
-.\" * MAIN CONTENT STARTS HERE *
-.\" -----------------------------------------------------------------
-.SH "NAME"
-aragorn \- detect tRNA genes in nucleotide sequences
-.SH "SYNOPSIS"
-.sp
-\fBaragorn\fR [\fIOPTION\fR]\&... \fIFILE\fR
-.SH "OPTIONS"
-.PP
-\fB\-m\fR
-.RS 4
-Search for tmRNA genes\&.
-.RE
-.PP
-\fB\-t\fR
-.RS 4
-Search for tRNA genes\&. By default, all are detected\&. If one of
-\fB\-m\fR
-or
-\fB\-t\fR
-is specified, then the other is not detected unless specified as well\&.
-.RE
-.PP
-\fB\-mt\fR
-.RS 4
-Search for Metazoan mitochondrial tRNA genes\&. tRNA genes with introns not detected\&.
-\fB\-i\fR,
-\fB\-sr\fR
-switchs ignored\&. Composite Metazoan mitochondrial genetic code used\&.
-.RE
-.PP
-\fB\-mtmam\fR
-.RS 4
-Search for Mammalian mitochondrial tRNA genes\&.
-\fB\-i\fR,
-\fB\-sr\fR
-switchs ignored\&.
-\fB\-tv\fR
-switch set\&. Mammalian mitochondrial genetic code used\&.
-.RE
-.PP
-\fB\-mtx\fR
-.RS 4
-Same as
-\fB\-mt\fR
-but low scoring tRNA genes are not reported\&.
-.RE
-.PP
-\fB\-mtd\fR
-.RS 4
-Overlapping metazoan mitochondrial tRNA genes on opposite strands are reported\&.
-.RE
-.PP
-\fB\-gc\fR[\fInum\fR]
-.RS 4
-Use the GenBank transl_table = [\fInum\fR] genetic code\&. Individual modifications can be appended using
-\fI,BBB\fR=<aa> B = A,C,G, or T\&. <aa> is the three letter code for an amino\-acid\&. More than one modification can be specified\&. eg
-\fB\-gcvert\fR,aga=Trp,agg=Trp uses the Vertebrate Mitochondrial code and the codons AGA and AGG changed to Tryptophan\&.
-.RE
-.PP
-\fB\-gcstd\fR
-.RS 4
-Use standard genetic code\&.
-.RE
-.PP
-\fB\-gcmet\fR
-.RS 4
-Use composite Metazoan mitochondrial genetic code\&.
-.RE
-.PP
-\fB\-gcvert\fR
-.RS 4
-Use Vertebrate mitochondrial genetic code\&.
-.RE
-.PP
-\fB\-gcinvert\fR
-.RS 4
-Use Invertebrate mitochondrial genetic code\&.
-.RE
-.PP
-\fB\-gcyeast\fR
-.RS 4
-Use Yeast mitochondrial genetic code\&.
-.RE
-.PP
-\fB\-gcprot\fR
-.RS 4
-Use Mold/Protozoan/Coelenterate mitochondrial genetic code\&.
-.RE
-.PP
-\fB\-gcciliate\fR
-.RS 4
-Use Ciliate genetic code\&.
-.RE
-.PP
-\fB\-gcflatworm\fR
-.RS 4
-Use Echinoderm/Flatworm mitochondrial genetic code
-.RE
-.PP
-\fB\-gceuplot\fR
-.RS 4
-Use Euplotid genetic code\&.
-.RE
-.PP
-\fB\-gcbact\fR
-.RS 4
-Use Bacterial/Plant Chloroplast genetic code\&.
-.RE
-.PP
-\fB\-gcaltyeast\fR
-.RS 4
-Use alternative Yeast genetic code\&.
-.RE
-.PP
-\fB\-gcascid\fR
-.RS 4
-Use Ascidian Mitochondrial genetic code\&.
-.RE
-.PP
-\fB\-gcaltflat\fR
-.RS 4
-Use alternative Flatworm Mitochondrial genetic code\&.
-.RE
-.PP
-\fB\-gcblep\fR
-.RS 4
-Use Blepharisma genetic code\&.
-.RE
-.PP
-\fB\-gcchloroph\fR
-.RS 4
-Use Chlorophycean Mitochondrial genetic code\&.
-.RE
-.PP
-\fB\-gctrem\fR
-.RS 4
-Use Trematode Mitochondrial genetic code\&.
-.RE
-.PP
-\fB\-gcscen\fR
-.RS 4
-Use Scenedesmus obliquus Mitochondrial genetic code\&.
-.RE
-.PP
-\fB\-gcthraust\fR
-.RS 4
-Use Thraustochytrium Mitochondrial genetic code\&.
-.RE
-.PP
-\fB\-tv\fR
-.RS 4
-Do not search for mitochondrial TV replacement loop tRNA genes\&. Only relevant if
-\fB\-mt\fR
-used\&.
-.RE
-.PP
-\fB\-c7\fR
-.RS 4
-Search for tRNA genes with 7 base C\-loops only\&.
-.RE
-.PP
-\fB\-i\fR
-.RS 4
-Search for tRNA genes with introns in anticodon loop with maximum length 3000 bases\&. Minimum intron length is 0 bases\&. Ignored if
-\fB\-m\fR
-is specified\&.
-.RE
-.PP
-\fB\-i\fR[\fImax\fR]
-.RS 4
-Search for tRNA genes with introns in anticodon loop with maximum length [\fImax\fR] bases\&. Minimum intron length is 0 bases\&. Ignored if
-\fB\-m\fR
-is specified\&.
-.RE
-.PP
-\fB\-i\fR[\fImin\fR],[\fImax\fR]
-.RS 4
-Search for tRNA genes with introns in anticodon loop with maximum length [\fImax\fR] bases, and minimum length [\fImin\fR] bases\&. Ignored if
-\fB\-m\fR
-is specified\&.
-.RE
-.PP
-\fB\-io\fR
-.RS 4
-Same as
-\fB\-i\fR, but allow tRNA genes with long introns to overlap shorter tRNA genes\&.
-.RE
-.PP
-\fB\-if\fR
-.RS 4
-Same as
-\fB\-i\fR, but fix intron between positions 37 and 38 on C\-loop (one base after anticodon)\&.
-.RE
-.PP
-\fB\-ifo\fR
-.RS 4
-Same as
-\fB\-if\fR
-and
-\fB\-io\fR
-combined\&.
-.RE
-.PP
-\fB\-ir\fR
-.RS 4
-Same as
-\fB\-i\fR, but report tRNA genes with minimum length [\fImin\fR] bases rather than search for tRNA genes with minimum length [\fImin\fR] bases\&. With this switch, [\fImin\fR] acts as an output filter, minimum intron length for searching is still 0 bases\&.
-.RE
-.PP
-\fB\-c\fR
-.RS 4
-Assume that each sequence has a circular topology\&. Search wraps around each end\&. Default setting\&.
-.RE
-.PP
-\fB\-l\fR
-.RS 4
-Assume that each sequence has a linear topology\&. Search does not wrap\&.
-.RE
-.PP
-\fB\-d\fR
-.RS 4
-Double\&. Search both strands of each sequence\&. Default setting\&.
-.RE
-.PP
-\fB\-s\fR or \fB\-s+\fR
-.RS 4
-Single\&. Do not search the complementary (antisense) strand of each sequence\&.
-.RE
-.PP
-\fB\-sc\fR or \fB\-s\-\fR
-.RS 4
-Single complementary\&. Do not search the sense strand of each sequence\&.
-.RE
-.PP
-\fB\-ps\fR
-.RS 4
-Lower scoring thresholds to 95% of default levels\&.
-.RE
-.PP
-\fB\-ps\fR[\fInum\fR]
-.RS 4
-Change scoring thresholds to [\fInum\fR] percent of default levels\&.
-.RE
-.PP
-\fB\-rp\fR
-.RS 4
-Flag possible pseudogenes (score < 100 or tRNA anticodon loop <> 7 bases long)\&. Note that genes with score < 100 will not be detected or flagged if scoring thresholds are not also changed to below 100% (see \-ps switch)\&.
-.RE
-.PP
-\fB\-seq\fR
-.RS 4
-Print out primary sequence\&.
-.RE
-.PP
-\fB\-br\fR
-.RS 4
-Show secondary structure of tRNA gene primary sequence using round brackets\&.
-.RE
-.PP
-\fB\-fasta\fR
-.RS 4
-Print out primary sequence in fasta format\&.
-.RE
-.PP
-\fB\-fo\fR
-.RS 4
-Print out primary sequence in fasta format only (no secondary structure)\&.
-.RE
-.PP
-\fB\-fon\fR
-.RS 4
-Same as
-\fB\-fo\fR, with sequence and gene numbering in header\&.
-.RE
-.PP
-\fB\-fos\fR
-.RS 4
-Same as
-\fB\-fo\fR, with no spaces in header\&.
-.RE
-.PP
-\fB\-fons\fR
-.RS 4
-Same as
-\fB\-fo\fR, with sequence and gene numbering, but no spaces\&.
-.RE
-.PP
-\fB\-w\fR
-.RS 4
-Print out in Batch mode\&.
-.RE
-.PP
-\fB\-ss\fR
-.RS 4
-Use the stricter canonical 1\-2 bp spacer1 and 1 bp spacer2\&. Ignored if
-\fB\-mt\fR
-set\&. Default is to allow 3 bp spacer1 and 0\-2 bp spacer2, which may degrade selectivity\&.
-.RE
-.PP
-\fB\-v\fR
-.RS 4
-Verbose\&. Prints out information during search to STDERR\&.
-.RE
-.PP
-\fB\-a\fR
-.RS 4
-Print out tRNA domain for tmRNA genes\&.
-.RE
-.PP
-\fB\-a7\fR
-.RS 4
-Restrict tRNA astem length to a maximum of 7 bases
-.RE
-.PP
-\fB\-aa\fR
-.RS 4
-Display message if predicted iso\-acceptor species does not match species in sequence name (if present)\&.
-.RE
-.PP
-\fB\-j\fR
-.RS 4
-Display 4\-base sequence on 3\*(Aq end of astem regardless of predicted amino\-acyl acceptor length\&.
-.RE
-.PP
-\fB\-jr\fR
-.RS 4
-Allow some divergence of 3\*(Aq amino\-acyl acceptor sequence from NCCA\&.
-.RE
-.PP
-\fB\-jr4\fR
-.RS 4
-Allow some divergence of 3\*(Aq amino\-acyl acceptor sequence from NCCA, and display 4 bases\&.
-.RE
-.PP
-\fB\-q\fR
-.RS 4
-Dont print configuration line (which switchs and files were used)\&.
-.RE
-.PP
-\fB\-rn\fR
-.RS 4
-Repeat sequence name before summary information\&.
-.RE
-.PP
-\fB\-O\fR [\fIoutfile\fR]
-.RS 4
-Print output to
-\fI\&. If [\*(Aqoutfile\fR] already exists, it is overwritten\&. By default all output goes to stdout\&.
-.RE
-.SH "DESCRIPTION"
-.sp
-aragorn detects tRNA, mtRNA, and tmRNA genes\&. A minimum requirement is at least a 32 bit compiler architecture (variable types int and unsigned int are at least 4 bytes long)\&.
-.sp
-[\fIFILE\fR] is assumed to contain one or more sequences in FASTA format\&. Results of the search are printed to STDOUT\&. All switches are optional and case\-insensitive\&. Unless \-i is specified, tRNA genes containing introns are not detected\&.
-.SH "AUTHORS"
-.sp
-Bjorn Canback <bcanback at acgt\&.se>, Dean Laslett <gaiaquark at gmail\&.com>
-.SH "REFERENCES"
-.sp
-Laslett, D\&. and Canback, B\&. (2004) ARAGORN, a program for the detection of transfer RNA and transfer\-messenger RNA genes in nucleotide sequences Nucleic Acids Research, 32;11\-16
-.sp
-Laslett, D\&. and Canback, B\&. (2008) ARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences Bioinformatics, 24(2); 172\-175\&.
diff --git a/aragorn1.2.36.c b/aragorn1.2.37.c
similarity index 69%
rename from aragorn1.2.36.c
rename to aragorn1.2.37.c
index dea1d7c..8db114b 100644
--- a/aragorn1.2.36.c
+++ b/aragorn1.2.37.c
@@ -1,20 +1,24 @@
/*
---------------------------------------------------------------
-ARAGORN v1.2.36 Dean Laslett
+ARAGORN v1.2.37 Dean Laslett
---------------------------------------------------------------
ARAGORN (together with ARWEN at last)
Detects tRNA, mtRNA, and tmRNA genes in nucleotide sequences
- Copyright (C) 2003-2015 Dean Laslett
+ Copyright (C) 2003-2018 Dean Laslett
- Please, report bugs and suggestions of improvements to the authors
+ A minimum requirement is at least a 32 bit compiler architecture
+ (variable types int and unsigned int are at least 4 bytes long).
+ Please report bugs and suggestions of improvements to the authors.
- E-mail: Bj�rn Canb�ck: bcanback at acgt.se
- Dean Laslett: gaiaquark at gmail.com
+ E-mail: Dean Laslett: gaiaquark at gmail.com
+ Bj�rn Canb�ck: bcanback at acgt.se
- Version 1.2.36 February 15th, 2013.
- Thanks to Sascha Steinbiss for fixing more bugs
+ Version 1.2.37 Oct 15th, 2014.
+ Thanks to Francisco Ossandon for finding many bugs and testing
+ Thanks to Haruo Suzuki for finding bugs
+ Thanks to Sascha Steinbiss for fixing bugs
Please reference the following papers if you use this
@@ -321,148 +325,10 @@ DAMAGES.
END OF TERMS AND CONDITIONS
-*/
-
-
-
-/*
---------------------------------------------------------------
-ARAGORN v1.2.36 Dean Laslett
+ARAGORN v1.2.37 Dean Laslett
---------------------------------------------------------------
-
-
-aragorn detects tRNA, mtRNA, and tmRNA genes.
-A minimum requirement is at least a 32 bit compiler architecture
-(variable types int and unsigned int are at least 4 bytes long).
-
-Usage:
-aragorn -v -s -d -c -l -j -a -q -rn -w -ifro<min>,<max> -t -m -mt
- -gc -tv -seq -br -fasta -fo -o <outfile> <filename>
-
-<filename> is assumed to contain one or more sequences
-in FASTA format. Results of the search are printed to
-STDOUT. All switches are optional and case-insensitive.
-Unless -i is specified, tRNA genes containing introns
-are not detected.
-
- -m Search for tmRNA genes.
- -t Search for tRNA genes.
- By default, all are detected. If one of
- -m or -t is specified, then the other
- is not detected unless specified as well.
- -mt Search for Metazoan mitochondrial tRNA genes.
- tRNA genes with introns not detected. -i,-sr switchs
- ignored. Composite Metazoan mitochondrial
- genetic code used.
- -mtmam Search for Mammalian mitochondrial tRNA
- genes. -i,-sr switchs ignored. -tv switch set.
- Mammalian mitochondrial genetic code used.
- -mtx Same as -mt but low scoring tRNA genes are
- not reported.
- -mtd Overlapping metazoan mitochondrial tRNA genes
- on opposite strands are reported.
- -gc<num> Use the GenBank transl_table = <num> genetic code.
- -gcstd Use standard genetic code.
- -gcmet Use composite Metazoan mitochondrial genetic code.
- -gcvert Use Vertebrate mitochondrial genetic code.
- -gcinvert Use Invertebrate mitochondrial genetic code.
- -gcyeast Use Yeast mitochondrial genetic code.
- -gcprot Use Mold/Protozoan/Coelenterate mitochondrial genetic code.
- -gcciliate Use Ciliate genetic code.
- -gcflatworm Use Echinoderm/Flatworm mitochondrial genetic code
- -gceuplot Use Euplotid genetic code.
- -gcbact Use Bacterial/Plant Chloroplast genetic code.
- -gcaltyeast Use alternative Yeast genetic code.
- -gcascid Use Ascidian Mitochondrial genetic code.
- -gcaltflat Use alternative Flatworm Mitochondrial genetic code.
- -gcblep Use Blepharisma genetic code.
- -gcchloroph Use Chlorophycean Mitochondrial genetic code.
- -gctrem Use Trematode Mitochondrial genetic code.
- -gcscen Use Scenedesmus obliquus Mitochondrial genetic code.
- -gcthraust Use Thraustochytrium Mitochondrial genetic code.
- Individual modifications can be appended using
- ,BBB=<aa> B = A,C,G, or T. <aa> is the three letter
- code for an amino-acid. More than one modification
- can be specified. eg -gcvert,aga=Trp,agg=Trp uses
- the Vertebrate Mitochondrial code and the codons
- AGA and AGG changed to Tryptophan.
- -tv Do not search for mitochondrial TV replacement
- loop tRNA genes. Only relevant if -mt used.
- -c7 Search for tRNA genes with 7 base C-loops only.
- -i Search for tRNA genes with introns in
- anticodon loop with maximum length 3000
- bases. Minimum intron length is 0 bases.
- Ignored if -m is specified.
- -i<max> Search for tRNA genes with introns in
- anticodon loop with maximum length <max>
- bases. Minimum intron length is 0 bases.
- Ignored if -m is specified.
- -i<min>,<max> Search for tRNA genes with introns in
- anticodon loop with maximum length <max>
- bases, and minimum length <min> bases.
- Ignored if -m is specified.
- -io Same as -i, but allow tRNA genes with long
- introns to overlap shorter tRNA genes.
- -if Same as -i, but fix intron between positions
- 37 and 38 on C-loop (one base after anticodon).
- -ifo Same as -if and -io combined.
- -ir Same as -i, but report tRNA genes with minimum
- length <min> bases rather than search for
- tRNA genes with minimum length <min> bases.
- With this switch, <min> acts as an output filter,
- minimum intron length for searching is still 0 bases.
- -c Assume that each sequence has a circular
- topology. Search wraps around each end.
- Default setting.
- -l Assume that each sequence has a linear
- topology. Search does not wrap.
- -d Double. Search both strands of each
- sequence. Default setting.
- -s or -s+ Single. Do not search the complementary
- (antisense) strand of each sequence.
- -sc or -s- Single complementary. Do not search the sense
- strand of each sequence.
- -ps Lower scoring thresholds to 95% of default levels.
- -ps<num> Change scoring thresholds to <num> percent of default levels.
- -rp Flag possible pseudogenes (score < 100 or tRNA anticodon
- loop <> 7 bases long). Note that genes with score < 100
- will not be detected or flagged if scoring thresholds are not
- also changed to below 100% (see -ps switch).
- -seq Print out primary sequence.
- -br Show secondary structure of tRNA gene primary sequence
- using round brackets.
- -fasta Print out primary sequence in fasta format.
- -fo Print out primary sequence in fasta format only
- (no secondary structure).
- -fon Same as -fo, with sequence and gene numbering in header.
- -fos Same as -fo, with no spaces in header.
- -fons Same as -fo, with sequence and gene numbering, but no spaces.
- -w Print out in Batch mode.
- -ss Use the stricter canonical 1-2 bp spacer1 and
- 1 bp spacer2. Ignored if -mt set. Default is to
- allow 3 bp spacer1 and 0-2 bp spacer2, which may
- degrade selectivity.\n");
- -v Verbose. Prints out information during
- search to STDERR.
- -a Print out tRNA domain for tmRNA genes.
- -a7 Restrict tRNA astem length to a maximum of 7 bases
- -aa Display message if predicted iso-acceptor species
- does not match species in sequence name (if present).
- -j Display 4-base sequence on 3' end of astem
- regardless of predicted amino-acyl acceptor length.
- -jr Allow some divergence of 3' amino-acyl acceptor
- sequence from NCCA.
- -jr4 Allow some divergence of 3' amino-acyl acceptor
- sequence from NCCA, and display 4 bases.
- -q Dont print configuration line (which switchs
- and files were used).
- -rn Repeat sequence name before summary information.
- -O <outfile> Print output to <outfile>. If <outfile>
- already exists, it is overwritten. By default
- all output goes to stdout.
-
-
*/
@@ -476,12 +342,14 @@ are not detected.
#endif
+#define NOCHAR '\0'
#define DLIM '\n'
#define STRLEN 4001
#define STRLENM1 4000
#define SHORTSTRLEN 51
#define SHORTSTRLENM1 50
#define KEYLEN 15
+#define NHELPLINE 173
#define INACTIVE 2.0e+35
#define IINACTIVE 2000000001L
#define ITHRESHOLD 2000000000L
@@ -504,7 +372,7 @@ are not detected.
#define MAXGCMOD 16
#define MAMMAL_MT 2
-#define NGENECODE 24
+#define NGENECODE 26
#define METAZOAN_MT 0
#define STANDARD 1
#define VERTEBRATE_MT 2
@@ -563,7 +431,7 @@ are not detected.
#define SLANTDL 8
#define SLANT 5
-#define MATX 42 /* 41 */
+#define MATX 42
#define MATY 34
@@ -582,9 +450,9 @@ are not detected.
#define MINTRNALEN (MINCTRNALEN + 1)
#define MAXTRNALEN (MAXCTRNALEN + ASTEM2_EXT)
#define MAXETRNALEN (MAXTRNALEN + MAXINTRONLEN)
-#define VARMAX 26 /* 25 */
+#define VARMAX 26
#define VARMIN 3
-#define VARDIFF 23 /* 22 */ /* VARMAX - VARMIN */
+#define VARDIFF 23 /* VARMAX - VARMIN */
#define MINTPTSDIST 50
#define MAXTPTSDIST 321
#define TPWINDOW (MAXTPTSDIST - MINTPTSDIST + 1)
@@ -606,6 +474,7 @@ are not detected.
#define TSWEEP 1000
#define WRAP 2*MAXETRNALEN
#define NPTAG 33
+#define MAXAGENELEN (MAXETRNALEN + MAXTMRNALEN)
/*
NOTE: If MAXPPINTRONDIST is increased, then validity of MAXTMRNALEN
@@ -624,21 +493,22 @@ must remain equal to or more than 2*MAXTMRNALEN and TSWEEP.
#define CLOOP 3
#define VAR 4
-#define NA MAXINTRONLEN
-#define ND 100
-#define NT 200
-#define NH 2000
-#define NTH 3000
-#define NC 5000
-#define NGFT 5000 /* 100 */
-#define NTAG 474 /* 367 */
-#define LSEQ 20000
-#define ATBOND 2.5
-#define mtNA 1500
-#define mtND 150
-#define mtNTH 3000 /* 750 */
-#define mtNTM 3
-#define mtNCDS 200 /* 500,20 */
+#define NA MAXINTRONLEN
+#define ND 100
+#define NT 200
+#define NH 2000
+#define NTH 3000
+#define NC 5000
+#define NGFT 5000
+#define NTAG 1273
+#define NTAGMAX 1300
+#define LSEQ 20000
+#define ATBOND 2.5
+#define mtNA 1500
+#define mtND 150
+#define mtNTH 3000
+#define mtNTM 3
+#define mtNCDS 200
#define mtNCDSCODON 6000
#define mtGCBOND 0.0
#define mtATBOND -0.5
@@ -671,14 +541,14 @@ must remain equal to or more than 2*MAXTMRNALEN and TSWEEP.
#define srpDMAXLEN 300
#define srpDMINLEN 100
#define srpNH 200
-#define srpNS 500 /* 100 */
+#define srpNS 500
#define srpMAXHPL 14
#define srpMAXSP 6
-#define srpMAXSTEM 6500 /* 6000 */
+#define srpMAXSTEM 6500
#define srpDISPMAX 4*srpMAXLEN
#define srpMAXSPACER 12
#define srpMAXNISTEMS 10
-#define srpNESTMAX 2 /* 3 */
+#define srpNESTMAX 2
#define cdsMAXLEN 3000
#define NCDS 200
@@ -691,24 +561,32 @@ typedef struct { long start;
long antistart;
long antistop;
int genetype;
+ int pseudogene;
+ int permuted;
+ int detected;
char species[SHORTSTRLEN]; } annotated_gene;
typedef struct { char filename[80];
FILE *f;
char seqname[STRLEN];
+ int bugmode;
int datatype;
double gc;
+ long filepointer;
long ps;
long psmax;
long seqstart;
+ long seqstartoff;
long nextseq;
- int ns;
+ long nextseqoff;
+ int ns,nf;
+ long aseqlen;
int nagene[NS];
annotated_gene gene[NGFT]; } data_set;
-typedef struct { char name[80];
+typedef struct { char name[100];
int seq[MAXTRNALEN+1];
int eseq[MAXETRNALEN+1];
int *ps;
@@ -736,7 +614,9 @@ typedef struct { char name[80];
double energy;
int asst;
int tps;
- int tpe; } gene;
+ int tpe;
+ int annotation;
+ int annosc; } gene;
typedef struct { int *pos;
int stem;
@@ -818,11 +698,14 @@ typedef struct { int *pos;
int win; } cds_codon;
+typedef struct { char name[50];
+ char tag[50]; } tmrna_tag_entry;
-
-typedef struct { FILE *f;
+typedef struct { char genetypename[NS][10];
+ FILE *f;
int batch;
+ int batchfullspecies;
int repeatsn;
int trna;
int tmrna;
@@ -876,16 +759,20 @@ typedef struct { FILE *f;
int ngene[NS];
int nps;
int annotated;
+ int dispmatch;
+ int updatetmrnatags;
+ int tagend;
+ int trnalenmisthresh;
+ int tmrnalenmisthresh;
int nagene[NS];
- int natfn;
- int natfp;
+ int nafn[NS];
+ int nafp[NS];
int natfpd;
int natfptv;
- int nacdsfn;
- int nacdsfp;
int lacds;
int ldcds;
long nabase;
+ double pseudogenethresh;
double trnathresh;
double ttscanthresh;
double ttarmthresh;
@@ -910,6 +797,8 @@ typedef struct { FILE *f;
+
+
/* Basepair matching matrices */
int lbp[3][6][6] =
@@ -1236,7 +1125,7 @@ typedef struct { FILE *f;
"Thr","Tyr","Asp","His","Asn",
"Met","Trp","Glu","Gln","Lys",
"Stop",
- "seC",
+ "SeC",
"Pyl",
"(Arg|Stop|Ser|Gly)",
"(Ile|Met)",
@@ -1245,8 +1134,13 @@ typedef struct { FILE *f;
char ambig_aaname[4] = "???";
+/*
+aamap based on NCBI genetic code table (downloaded 26-Apr-2014)
+ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt
+*/
+
int aamap[NGENECODE][64] = {
- /* composite metazoan mt */
+ /* 0. composite metazoan mt */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1263,7 +1157,7 @@ typedef struct { FILE *f;
25,Gly,Arg,23,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,26 },
- /* standard */
+ /* 1. standard */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1280,8 +1174,8 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* vertebrate mt */
- { Phe,Val,Leu,Ile,
+ /* 2. vertebrate mt */
+ { Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
Tyr,Asp,His,Asn,
@@ -1297,7 +1191,7 @@ typedef struct { FILE *f;
Trp,Gly,Arg,Stop,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* yeast mt */
+ /* 3. yeast mt */
{ Phe,Val,Thr,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1314,7 +1208,7 @@ typedef struct { FILE *f;
Trp,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* mold, protozoan, and coelenterate mt */
+ /* 4. mold, protozoan, and coelenterate mt */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1331,7 +1225,7 @@ typedef struct { FILE *f;
Trp,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* invertebrate mt */
+ /* 5. invertebrate mt */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1348,7 +1242,7 @@ typedef struct { FILE *f;
Trp,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* ciliate */
+ /* 6. ciliate */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1365,7 +1259,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Gln,Glu,Gln,Lys },
- /* deleted -> standard */
+ /* 7. deleted -> standard */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1382,7 +1276,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* deleted -> standard */
+ /* 8. deleted -> standard */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1399,7 +1293,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* echinoderm and flatworm mt */
+ /* 9. echinoderm and flatworm mt */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1416,7 +1310,7 @@ typedef struct { FILE *f;
Trp,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Asn },
- /* euplotid */
+ /* 10. euplotid */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1433,12 +1327,12 @@ typedef struct { FILE *f;
Cys,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* bacterial and plant chloroplast */
+ /* 11. bacterial and plant chloroplast */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
Tyr,Asp,His,Asn,
- Leu,Val,Ser,Met,
+ Leu,Val,Leu,Met,
Trp,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Pyl,Glu,Gln,Lys,
@@ -1450,7 +1344,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* alternate yeast */
+ /* 12. alternate yeast */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1467,7 +1361,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* ascidian mt */
+ /* 13. ascidian mt */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1484,7 +1378,7 @@ typedef struct { FILE *f;
Trp,Gly,Arg,Gly,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* alternate flatworm mt */
+ /* 14. alternate flatworm mt */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1501,7 +1395,7 @@ typedef struct { FILE *f;
Trp,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
Tyr,Glu,Gln,Asn },
- /* blepharisma */
+ /* 15. blepharisma */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1518,7 +1412,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* chlorophycean mt */
+ /* 16. chlorophycean mt */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1535,7 +1429,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* deleted -> standard */
+ /* 17. deleted -> standard */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1552,7 +1446,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* deleted -> standard */
+ /* 18. deleted -> standard */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1569,7 +1463,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* deleted -> standard */
+ /* 19. deleted -> standard */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1586,7 +1480,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* deleted -> standard */
+ /* 20. deleted -> standard */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1603,7 +1497,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* trematode mt */
+ /* 21. trematode mt */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1620,7 +1514,7 @@ typedef struct { FILE *f;
Trp,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* scenedesmus obliquus mt*/
+ /* 22. scenedesmus obliquus mt*/
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1637,7 +1531,7 @@ typedef struct { FILE *f;
SeC,Gly,Arg,Arg,
Stop,Ala,Pro,Thr,
Stop,Glu,Gln,Lys },
- /* thraustochytrium mt */
+ /* 23. thraustochytrium mt */
{ Phe,Val,Leu,Ile,
Cys,Gly,Arg,Ser,
Ser,Ala,Pro,Thr,
@@ -1653,6 +1547,40 @@ typedef struct { FILE *f;
Stop,Val,Leu,Ile,
SeC,Gly,Arg,Arg,
Ser,Ala,Pro,Thr,
+ Stop,Glu,Gln,Lys },
+ /* 24. Pterobranchia mt */
+ { Phe,Val,Leu,Ile,
+ Cys,Gly,Arg,Ser,
+ Ser,Ala,Pro,Thr,
+ Tyr,Asp,His,Asn,
+ Leu,Val,Leu,Met,
+ Trp,Gly,Arg,Lys,
+ Ser,Ala,Pro,Thr,
+ Pyl,Glu,Gln,Lys,
+ Phe,Val,Leu,Ile,
+ Cys,Gly,Arg,Ser,
+ Ser,Ala,Pro,Thr,
+ Tyr,Asp,His,Asn,
+ Leu,Val,Leu,Ile,
+ Trp,Gly,Arg,Ser,
+ Ser,Ala,Pro,Thr,
+ Stop,Glu,Gln,Lys },
+ /* 25. Gracilibacteria */
+ { Phe,Val,Leu,Ile,
+ Cys,Gly,Arg,Ser,
+ Ser,Ala,Pro,Thr,
+ Tyr,Asp,His,Asn,
+ Leu,Val,Leu,Met,
+ Trp,Gly,Arg,Arg,
+ Ser,Ala,Pro,Thr,
+ Pyl,Glu,Gln,Lys,
+ Phe,Val,Leu,Ile,
+ Cys,Gly,Arg,Ser,
+ Ser,Ala,Pro,Thr,
+ Tyr,Asp,His,Asn,
+ Leu,Val,Leu,Ile,
+ Gly,Gly,Arg,Arg,
+ Ser,Ala,Pro,Thr,
Stop,Glu,Gln,Lys } };
@@ -1661,6 +1589,1466 @@ typedef struct { FILE *f;
/* POINTERS TO DETECTED GENES */
gene *ts;
+
+
+/* HELP MENU */
+
+char helpmenu[NHELPLINE][81] =
+{
+"----------------------------",
+"ARAGORN v1.2.37 Dean Laslett",
+"----------------------------\n",
+"Please reference the following papers if you use this",
+"program as part of any published research.\n",
+"Laslett, D. and Canback, B. (2004) ARAGORN, a",
+"program for the detection of transfer RNA and transfer-messenger",
+"RNA genes in nucleotide sequences",
+"Nucleic Acids Research, 32;11-16\n",
+"Laslett, D. and Canback, B. (2008) ARWEN: a",
+"program to detect tRNA genes in metazoan mitochondrial",
+"nucleotide sequences",
+"Bioinformatics, 24(2); 172-175.\n\n",
+"ARAGORN detects tRNA, mtRNA, and tmRNA genes.\n",
+"Usage:",
+"aragorn -v -e -s -d -c -l -j -a -q -rn -w -ifro<min>,<max> -t -mt -m",
+" -rp -ps -gc -tv -seq -br -fasta -fo -o <outfile> <filename>\n",
+"<filename> is assumed to contain one or more sequences",
+"in FASTA or GENBANK format. Results of the search are printed",
+"to STDOUT. All switches are optional and case-insensitive.",
+"Unless -i is specified, tRNA genes containing introns",
+"are not detected.\n",
+" -m Search for tmRNA genes.",
+" -t Search for tRNA genes.",
+" By default, all are detected. If one of",
+" -m or -t is specified, then the other",
+" is not detected unless specified as well.",
+" -mt Search for Metazoan mitochondrial tRNA genes.",
+" tRNA genes with introns not detected. -i,-sr switchs",
+" ignored. Composite Metazoan mitochondrial",
+" genetic code used.",
+" -mtmam Search for Mammalian mitochondrial tRNA",
+" genes. -i switch ignored. -tv switch set.",
+" Mammalian mitochondrial genetic code used.",
+" -mtx Same as -mt but low scoring tRNA genes are",
+" not reported.",
+" -mtd Overlapping metazoan mitochondrial tRNA genes",
+" on opposite strands are reported.",
+" -gc<num> Use the GenBank transl_table = <num> genetic code.",
+" -gcstd Use standard genetic code.",
+" -gcmet Use composite Metazoan mitochondrial genetic code.",
+" -gcvert Use Vertebrate mitochondrial genetic code.",
+" -gcinvert Use Invertebrate mitochondrial genetic code.",
+" -gcyeast Use Yeast mitochondrial genetic code.",
+" -gcprot Use Mold/Protozoan/Coelenterate mitochondrial genetic code.",
+" -gcciliate Use Ciliate genetic code.",
+" -gcflatworm Use Echinoderm/Flatworm mitochondrial genetic code",
+" -gceuplot Use Euplotid genetic code.",
+" -gcbact Use Bacterial/Plant chloroplast genetic code.",
+" -gcaltyeast Use alternative Yeast genetic code.",
+" -gcascid Use Ascidian mitochondrial genetic code.",
+" -gcaltflat Use alternative Flatworm mitochondrial genetic code.",
+" -gcblep Use Blepharisma genetic code.",
+" -gcchloroph Use Chlorophycean mitochondrial genetic code.",
+" -gctrem Use Trematode mitochondrial genetic code.",
+" -gcscen Use Scenedesmus obliquus mitochondrial genetic code.",
+" -gcthraust Use Thraustochytrium mitochondrial genetic code.",
+" -gcptero Use Pterobranchia mitochondrial genetic code.",
+" -gcgrac Use Gracilibacteria genetic code.",
+" Individual modifications can be appended using",
+" ,BBB=<aa> B = A,C,G, or T. <aa> is the three letter",
+" code for an amino-acid. More than one modification",
+" can be specified. eg -gcvert,aga=Trp,agg=Trp uses",
+" the Vertebrate Mitochondrial code and the codons",
+" AGA and AGG changed to Tryptophan.",
+" -c Assume that each sequence has a circular",
+" topology. Search wraps around each end.",
+" Default setting.",
+" -l Assume that each sequence has a linear",
+" topology. Search does not wrap.",
+" -d Double. Search both strands of each",
+" sequence. Default setting.",
+" -s or -s+ Single. Do not search the complementary",
+" (antisense) strand of each sequence.",
+" -sc or -s- Single complementary. Do not search the sense",
+" strand of each sequence.",
+" -i Search for tRNA genes with introns in",
+" anticodon loop with maximum length 3000",
+" bases. Minimum intron length is 0 bases.",
+" Ignored if -m is specified.",
+" -i<max> Search for tRNA genes with introns in",
+" anticodon loop with maximum length <max>",
+" bases. Minimum intron length is 0 bases.",
+" Ignored if -m is specified.",
+" -i<min>,<max> Search for tRNA genes with introns in",
+" anticodon loop with maximum length <max>",
+" bases, and minimum length <min> bases.",
+" Ignored if -m is specified.",
+" -io Same as -i, but allow tRNA genes with long",
+" introns to overlap shorter tRNA genes.",
+" -if Same as -i, but fix intron between positions",
+" 37 and 38 on C-loop (one base after anticodon).",
+" -ifo Same as -if and -io combined.",
+" -ir Same as -i, but report tRNA genes with minimum",
+" length <min> bases rather than search for",
+" tRNA genes with minimum length <min> bases.",
+" With this switch, <min> acts as an output filter,",
+" minimum intron length for searching is still 0 bases.",
+" -tv Do not search for mitochondrial TV replacement",
+" loop tRNA genes. Only relevant if -mt used.",
+" -c7 Search for tRNA genes with 7 base C-loops only.",
+" -ss Use the stricter canonical 1-2 bp spacer1 and",
+" 1 bp spacer2. Ignored if -mt set. Default is to",
+" allow 3 bp spacer1 and 0-2 bp spacer2, which may",
+" degrade selectivity.",
+" -j Display 4-base sequence on 3' end of astem",
+" regardless of predicted amino-acyl acceptor length.",
+" -jr Allow some divergence of 3' amino-acyl acceptor",
+" sequence from NCCA.",
+" -jr4 Allow some divergence of 3' amino-acyl acceptor",
+" sequence from NCCA, and display 4 bases.",
+" -e Print out score for each reported gene.",
+" -ps Lower scoring thresholds to 95% of default levels.",
+" -ps<num> Change scoring thresholds to <num> percent of default levels.",
+" -rp Flag possible pseudogenes (score < 100 or tRNA anticodon",
+" loop <> 7 bases long). Note that genes with score < 100",
+" will not be detected or flagged if scoring thresholds are not",
+" also changed to below 100% (see -ps switch).",
+" -rp<num> Flag possible pseudogenes and change score threshold to <num>",
+" percent of default levels.",
+" -seq Print out primary sequence.",
+" -br Show secondary structure of tRNA gene primary sequence",
+" using round brackets.",
+" -fasta Print out primary sequence in fasta format.",
+" -fo Print out primary sequence in fasta format only",
+" (no secondary structure).",
+" -fon Same as -fo, with sequence and gene numbering in header.",
+" -fos Same as -fo, with no spaces in header.",
+" -fons Same as -fo, with sequence and gene numbering, but no spaces.",
+" as (<species>|<species>) instead of ???",
+" -v Verbose. Prints out information during",
+" search to STDERR.",
+" -a Print out tRNA domain for tmRNA genes.",
+" -a7 Restrict tRNA astem length to a maximum of 7 bases",
+" -aa Display message if predicted iso-acceptor species",
+" does not match species in sequence name (if present).",
+" -amt<num> Change annotated tRNA length mismatch reporting threshold to",
+" <num> bases when searching GENBANK files. Default is 10 bases.",
+" -amm<num> Change annotated tmRNA length mismatch reporting threshold to",
+" <num> bases when searching GENBANK files. Default is 30 bases.",
+" -q Dont print configuration line (which switches",
+" and files were used).",
+" -rn Repeat sequence name before summary information.",
+" -o <outfile> Print output to <outfile>. If <outfile>",
+" already exists, it is overwritten. By default",
+" all output goes to stdout.",
+" -w Print out in batch mode.",
+" -wa Same as -w, but for 6 or 8 base anticodon",
+" loops, print possible iso-acceptor species",
+" For tRNA genes, batch mode output is in the form:\n",
+" Sequence name",
+" N genes found",
+" 1 tRNA-<species> [locus 1] <Apos> (nnn)",
+" i(<intron position>,<intron length>)",
+" . ",
+" . ",
+" N tRNA-<species> [Locus N] <Apos> (nnn)",
+" i(<intron position>,<intron length>)\n",
+" N is the number of genes found",
+" <species> is the tRNA iso-acceptor species",
+" <Apos> is the tRNA anticodon relative position",
+" (nnn) is the tRNA anticodon base triplet",
+" i means the tRNA gene has a C-loop intron\n",
+" For tmRNA genes, output is in the form:\n",
+" n tmRNA(p) [Locus n] <tag offset>,<tag end offset>",
+" <tag peptide>\n",
+" p means the tmRNA gene is permuted",
+" -wunix Get around problem with some windows gcc compilers",
+" (found so far in Strawberry Perl and Active Perl)",
+" when reading Unix files.",
+" Execution speed may be slower for large files.",
+" Execution speed will be a lot slower for files",
+" with many small sequences."
+};
+
+
+
+/* tmRNA TAG PEPTIDE DATABASE */
+
+tmrna_tag_entry tagdatabase[NTAGMAX] =
+ { { "Acaryochloris marina","ANNIVSFARQRTATAVA"},
+ { "Accumulibacter phosphatis","ANDERFALAA"},
+ { "Acetobacter pasteurianus","ANDNTEVLAVAA"},
+ { "Acetobacterium woodii","AKTEKSYGLALAA"},
+ { "Acetohalobium arabaticum","ANDNSYALAAA"},
+ { "Achromobacter xylosoxidans","ANDERFALAA"},
+ { "Acidaminococcus fermentans","ADDSYALAA"},
+ { "Acidaminococcus sp. D21","AEDSYALAA"},
+ { "Acidimicrobium ferrooxidans","AEPELALAA"},
+ { "Acidiphilium cryptum","ANDNFEALAVAA"},
+ { "Acidithiobacillus caldus","ANDSNYALAA"},
+ { "Acidithiobacillus ferrivorans","ANDSNYALAA"},
+ { "Acidithiobacillus ferrooxidans","ANDSNYALAA"},
+ { "Acidobacterium capsulatum","ANNNLALAA"},
+ { "Acidobacterium Ellin6076","ANTQFAYAA"},
+ { "Acidothermus cellulolyticus","ANSSRADFALAA"},
+ { "Acidovorax avenae","ANDERFALAA"},
+ { "Acidovorax citrulli","ANDERFALAA"},
+ { "Acidovorax sp. JS42","ANDERFALAA"},
+ { "Acidovorax sp. KKS102","ANDERFALAA"},
+ { "Acinetobacter ADP1","ANDETYALAA"},
+ { "Acinetobacter baumannii","ANDETYALAA"},
+ { "Acinetobacter oleivorans","ANDETYALAA"},
+ { "Acinetobacter sp. ADP1","ANDETYALAA"},
+ { "Acinetobacter sp. SH024","ANDETYALAA"},
+ { "Actinobacillus actinomycetemcomitans","ANDEQYALAA"},
+ { "Actinobacillus pleuropneumoniae","ANDEQYALAA"},
+ { "Actinobacillus succinogenes","ANDEQYALAA"},
+ { "Actinobacillus suis","ANDEQYALAA"},
+ { "Actinomyces naeslundii","ADNTRTDFALAA"},
+ { "Actinoplanes missouriensis","AKDNSRADFALAA"},
+ { "Actinoplanes sp. SE50/110","ANSKFDADQYALAA"},
+ { "Actinosynnema mirum","AKSNDQRAFALAA"},
+ { "Advenella kashmirensis","ANDESYALAA"},
+ { "Aequorivita sublithincola","GENNYALAA"},
+ { "Aerococcus urinae","DKNESQSLAFAA"},
+ { "Aeromonas hydrophila 1","ANDENYALAA"},
+ { "Aeromonas hydrophila 2","ANDENYALAA"},
+ { "Aeromonas salmonicida","ANDENYALAA"},
+ { "Aeromonas veronii","ANDENYALAA"},
+ { "Aggregatibacter actinomycetemcomitans","ANDEQYALAA"},
+ { "Aggregatibacter aphrophilus","ANDEQYALAA"},
+ { "Agrobacterium fabrum","ANDNNAKEYALAA"},
+ { "Agrobacterium radiobacter","ANDNYAEARLAA"},
+ { "Agrobacterium sp. H13-3","ANDNNAKEYALAA"},
+ { "Agrobacterium tumefaciens 1","ANDNNAKEYALAA"},
+ { "Agrobacterium tumefaciens 2","ANDNNAKECALAA"},
+ { "Agrobacterium vitis","ANDNNAQGYAVAA"},
+ { "Akkermansia muciniphila","AESNDLALAA"},
+ { "Alcaligenes faecalis","ANDERFALAA"},
+ { "Alcaligenes viscolactis","ANDERFALAA"},
+ { "Alcanivorax borkumensis","ANDDSYALAA"},
+ { "Alcanivorax dieselolei","ANDDTYALAA"},
+ { "Alicycliphilus denitrificans","ANDERFALAA"},
+ { "Alicyclobacillus acidocaldarius","GKANRFTTQNKLALAA"},
+ { "Aliivibrio salmonicida","ANDENYALAA"},
+ { "Alistipes finegoldii","GNNSYALAA"},
+ { "Alkalilimnicola ehrlichii","ANDENYALAA"},
+ { "Alkaliphilus metalliredigenes","ANDNYSLAAA"},
+ { "Alkaliphilus metalliredigens","ANDNYSLAAA"},
+ { "Alkaliphilus oremlandii","ANDNYALAA"},
+ { "Allochromatium vinosum","ANDDNYALAA"},
+ { "alpha proteobacterium","ANESYALAA"},
+ { "Alphaproteobacteria SAR-1","ANDELALAA"},
+ { "Alteromonas macleodii","ANDETYALAA"},
+ { "Alteromonas sp. SN2","ANDENYALAA"},
+ { "Aminobacterium colombiense","VNNNNYALAA"},
+ { "Ammonifex degensii","ANNERVALAA"},
+ { "Amoebophilus asiaticus","GNNQVALAA"},
+ { "Amphibacillus xylanus","GKTNNYSLAAA"},
+ { "Amycolatopsis mediterranei","ADSSQREFALAA"},
+ { "Amycolicicoccus subflavus","ADNAQRSQSDFALAA"},
+ { "Anabaena variabilis","ANNIVKFARKDALVAA"},
+ { "Anaerobaculum mobile","ANENYALAA"},
+ { "Anaerococcus prevotii","ANNNSEANFALAA"},
+ { "Anaerolinea thermophila","VRKSGCRSGRSRTERKRAFGP"},
+ { "Anaeromyxobacter dehalogenans","ANEPMALAA"},
+ { "Anaeromyxobacter sp. Fw109-5","ANEPMALAA"},
+ { "Anaeromyxobacter sp. K","ANEPMALAA"},
+ { "Anaplasma centrale","ANDDFVAANDNMETAFVAAA"},
+ { "Anaplasma marginale","ANDDFVAANDNMETAFVAAA"},
+ { "Anaplasma phagocytophilum","ANDDFVAANDNVETAFVAAA"},
+ { "Anoxybacillus flavithermus","GKENYALAA"},
+ { "Aquifex aeolicus","APEAELALAA"},
+ { "Arcanobacterium haemolyticum","ANKQKSDFALAA"},
+ { "Arcobacter butzleri","ANNTNYAPAYAKAA"},
+ { "Arcobacter nitrofigilis","ANNTNYAPAYAKVA"},
+ { "Arcobacter sp. L","ANNTNYAPAYAKAA"},
+ { "Aromatoleum aromaticum","ANDERFAVAA"},
+ { "Arthrobacter arilaitensis","AESKRTDFALAA"},
+ { "Arthrobacter aurescens","AESKRTDFALAA"},
+ { "Arthrobacter chlorophenolicus","AESKRTDFALAA"},
+ { "Arthrobacter FB24","AKQTRTDFALAA"},
+ { "Arthrobacter phenanthrenivorans","AESKRTDFALAA"},
+ { "Arthrobacter sp. FB24","AKQTRTDFALAA"},
+ { "Arthrobacter sp. Rue61a","AESKRTDFALAA"},
+ { "Arthromitus sp. SFB-mouse-Japan","DKNYSLQAA"},
+ { "Arthromitus sp. SFB-rat-Yit","DKNYSLQAA"},
+ { "Azoarcus BH72","ANDERFALAA"},
+ { "Azoarcus EbN1","ANDERFAVAA"},
+ { "Azoarcus sp. BH72","ANDERFALAA"},
+ { "Azobacteroides pseudotrichonymphae","GENFYALAA"},
+ { "Azorhizobium caulinodans","ANDNYAPVAVAA"},
+ { "Azospira oryzae","ANDERFAIAA"},
+ { "Azospirillum brasilense","ANDNVAPVAVAA"},
+ { "Azospirillum lipoferum","ANDNVAQARLAA"},
+ { "Azospirillum sp. B510","ANDNVAQARLAA"},
+ { "Azotobacter vinelandii","ANDDNYALAA"},
+ { "Bacillus amyloliquefaciens","GKTKSFNQNLALAA"},
+ { "Bacillus anthracis","GKQNNLSLAA"},
+ { "Bacillus atrophaeus","GKTKSFNQNLALAA"},
+ { "Bacillus cellulosilyticus","GKQEDNFAFAA"},
+ { "Bacillus cereus","GKQNNLSLAA"},
+ { "Bacillus clausii","GKENNNFALAA"},
+ { "Bacillus coagulans","GKSNTKLALAA"},
+ { "Bacillus cytotoxicus","GKQQNNFALAA"},
+ { "Bacillus halodurans","GKENNNFALAA"},
+ { "Bacillus licheniformis","GKSNQNLALAA"},
+ { "Bacillus megaterium","GKSNNNFALAA"},
+ { "Bacillus phage","AKLNITNNELQVA"},
+ { "Bacillus pumilus","GKTKSFNQNLALAA"},
+ { "Bacillus selenitireducens","GKQDNDFALAAA"},
+ { "Bacillus stearothermophilus","GKQNYALAA"},
+ { "Bacillus subtilis","GKTNSFNQNVALAA"},
+ { "Bacillus thuringiensis","GKQNNLSLAA"},
+ { "Bacillus weihenstephanensis","GKQNNLSLAA"},
+ { "Bacillusphage G","AKLNITNNELQVA"},
+ { "Bacteriovorax marinus","AESNFAPAMAA"},
+ { "Bacteroides fragilis","GETNYALAA"},
+ { "Bacteroides helcogenes","GENNYALAA"},
+ { "Bacteroides salanitronis","GNENYALAA"},
+ { "Bacteroides thetaiotaomicron","GETNYALAA"},
+ { "Bacteroides vulgatus","GNENYALAA"},
+ { "Bartonella bacilliformis","ANDNYAEARLAA"},
+ { "Bartonella clarridgeiae","ANDNYAEARLIAA"},
+ { "Bartonella grahamii","ANDNYAEARLAA"},
+ { "Bartonella henselae","ANDNYAEARLAA"},
+ { "Bartonella quintana","ANDNYAEARLAA"},
+ { "Bartonella tribocorum","ANDNYAEARLAA"},
+ { "Baumannia cicadellinicola","ANNSQYESVALAA"},
+ { "Bdellovibrio bacteriovorus","GNDYALAA"},
+ { "Beijerinckia indica","ANDNYAPVAVAA"},
+ { "Belliella baltica","GESNYAMAA"},
+ { "Beutenbergia cavernae","ADSKRTDFALAA"},
+ { "Bifidobacterium adolescentis","AKSNRTEFALAA"},
+ { "Bifidobacterium animalis","AKSNRTEFALAA"},
+ { "Bifidobacterium asteroides","AKSNRTEFALAA"},
+ { "Bifidobacterium bifidum","AKSNRTEFALAA"},
+ { "Bifidobacterium breve","AKSNRTEFALAA"},
+ { "Bifidobacterium dentium","AKSNRTEFALAA"},
+ { "Bifidobacterium longum","AKSNRTEFALAA"},
+ { "Blastococcus saxobsidens","ADSNRADYALAA"},
+ { "Blattabacterium sp. (Blaberus giganteus)","GEKEYAFAA"},
+ { "Blattabacterium sp. (Blattella germanica) Bge","GEQQYAFAA"},
+ { "Blattabacterium sp. (Cryptocercus punctulatus)","GEKQYAFAA"},
+ { "Blattabacterium sp. (Mastotermes darwiniensis)","GEKQYAFAA"},
+ { "Blattabacterium sp. (Periplaneta americana)","GEKQYAFAA"},
+ { "Blochmannia floridanus","AKNKYNEPVALAA"},
+ { "Blochmannia pennsylvanicus","ANNTTYRESVALAA"},
+ { "Blochmannia vafer","ANYNYNESAALAA"},
+ { "Bolidomonas pacifica chloroplast","ANNILAFNRKSLSFA"},
+ { "Bordetella avium","ANDERFALAA"},
+ { "Bordetella bronchiseptica","ANDERFALAA"},
+ { "Bordetella parapertussis","ANDERFALAA"},
+ { "Bordetella pertussis","ANDERFALAA"},
+ { "Bordetella petrii","ANDERFALAA"},
+ { "Borrelia afzelii","AKNNNFTSSNLVMAA"},
+ { "Borrelia bissettii","AKNNNFTSSNLVMAA"},
+ { "Borrelia burgdorferi","AKNNNFTSSNLVMAA"},
+ { "Borrelia crocidurae","AKNNNFTSSDLVMAA"},
+ { "Borrelia duttonii","AKNNNFTSSDLVMAA"},
+ { "Borrelia garinii","AKNNNFTSSNLVMAA"},
+ { "Borrelia hermsii","ARNNNFTSSNLVMAA"},
+ { "Borrelia recurrentis","AKNNNFTSSDLVMAA"},
+ { "Borrelia turicatae","AKNNNFTSSNLVMAA"},
+ { "Brachybacterium faecium","AEPKRTDFALAA"},
+ { "Brachyspira hyodysenteriae","ADEYALAA"},
+ { "Brachyspira intermedia","ADEYALAA"},
+ { "Brachyspira murdochii","ADEYALAA"},
+ { "Brachyspira pilosicoli","ADEYALAA"},
+ { "Bradyrhizobium japonicum","ANDNFAPVAQAA"},
+ { "Bradyrhizobium sp. BTAi1","ANDNFAPVAQAA"},
+ { "Bradyrhizobium sp. ORS 278","ANDNFAPVAQAA"},
+ { "Bradyrhizobium sp. S23321","ANDNFAPVAQAA"},
+ { "Brevibacillus brevis","GNKQLSLAA"},
+ { "Brevibacterium linens","AKSNNRTDFALAA"},
+ { "Brucella abortus","ANDNNAQGYALAA"},
+ { "Brucella canis","ANDNNAQGYALAA"},
+ { "Brucella ceti","ANDNNAQGYALAA"},
+ { "Brucella melitensis","ANDNNAQGYALAA"},
+ { "Brucella ovis","ANDNNAQGYALAA"},
+ { "Brucella suis","ANDNNAQGYALAA"},
+ { "Buchnera aphidicola 1","ANNKQNYALAA"},
+ { "Buchnera aphidicola 2","ANNKQNYALAA"},
+ { "Buchnera aphidicola 3","AKQNQYALAA"},
+ { "Burkholderia ambifaria","ANDDTFALAA"},
+ { "Burkholderia cenocepacia","ANDDTFALAA"},
+ { "Burkholderia cepacia","ANDDTFALAA"},
+ { "Burkholderia fungorum","ANDDTFALAA"},
+ { "Burkholderia gladioli","ANDETFALAA"},
+ { "Burkholderia glumae","ANDDTFALAA"},
+ { "Burkholderia graminis","ANDDTFALAA"},
+ { "Burkholderia mallei","ANDDTFALAA"},
+ { "Burkholderia multivorans","ANDDTFALAA"},
+ { "Burkholderia phenoliruptrix","ANDDTFALAA"},
+ { "Burkholderia phymatum","ANDDTFALAA"},
+ { "Burkholderia phytofirmans","ANDETFALAA"},
+ { "Burkholderia pseudomallei","ANDDTFALAA"},
+ { "Burkholderia rhizoxinica","ANDETYALAA"},
+ { "Burkholderia sp. 383","ANDDTFALAA"},
+ { "Burkholderia sp. CCGE1001","ANDDTFALAA"},
+ { "Burkholderia sp. CCGE1002","ANDDTFALAA"},
+ { "Burkholderia sp. YI23","ANDDTFALAA"},
+ { "Burkholderia thailandensis","ANDDTFALAA"},
+ { "Burkholderia vietnamiensis","ANDDTFALAA"},
+ { "Burkholderia xenovorans","ANDDTFALAA"},
+ { "Butyrivibrio proteoclasticus","ANDNLALAA"},
+ { "Caldicellulosiruptor bescii","ADKAELALAA"},
+ { "Caldicellulosiruptor hydrothermalis","ADRTELALAA"},
+ { "Caldicellulosiruptor kristjanssonii","ADKAELALAA"},
+ { "Caldicellulosiruptor kronotskyensis","ADKAELALAA"},
+ { "Caldicellulosiruptor lactoaceticus","ADKAELALAA"},
+ { "Caldicellulosiruptor obsidiansis","AEKPQLALAA"},
+ { "Caldicellulosiruptor owensensis","AEKPQLALAA"},
+ { "Caldicellulosiruptor saccharolyticus","ADKAELALAA"},
+ { "Caldilinea aerophila","AKNTGKAFAFGTPATSVALAA"},
+ { "Caldisericum exile","ADYSYALAA"},
+ { "Calditerrivibrio nitroreducens","ANDEYALAAA"},
+ { "Campylobacter coli","ANNVKFAPAYAKAA"},
+ { "Campylobacter concisus","ANNVNFAPAYAKAA"},
+ { "Campylobacter curvus","ANNVKFAPAYAKAA"},
+ { "Campylobacter fetus 2","ANNVKFAPAYAKAA"},
+ { "Campylobacter hominis","ANNAKFAPAYAKIA"},
+ { "Campylobacter jejuni","ANNVKFAPAYAKAA"},
+ { "Campylobacter lari","ANNVKFAPAYAKAA"},
+ { "Campylobacter upsaliensis","ANNAKFAPAYAKVA"},
+ { "Candidatus atelocyanobacterium thalassa","ANNIVSFKRVAVAA"},
+ { "Capnocytophaga canimorsus","GENNYALAA"},
+ { "Capnocytophaga ochracea","GENNYALAA"},
+ { "Carboxydothermus hydrogenoformans","ANENYALAA"},
+ { "Cardinium endosymbiont","VINNSRRCKFVALRKEEEEDDELRMAA"},
+ { "Carnobacterium maltaromaticum","AKNNNNSYALAA"},
+ { "Carnobacterium sp. 17-4","DKNNNNSYALAA"},
+ { "Catenulispora acidiphila","ANKTQLKSQTAYGLAA"},
+ { "Catera virion","ATDTDATVTDAEIEAFFAEEAAALV"},
+ { "Caulobacter crescentus","ANDNFAEEFAVAA"},
+ { "Caulobacter segnis","ANDNFAEEFAVAA"},
+ { "Caulobacter sp. K31","ANDNFAEEFAIAA"},
+ { "Cellulomonas fimi","ADNKRTDFALAA"},
+ { "Cellulomonas flavigena","ADSKRTDFALAA"},
+ { "Cellulophaga algicola","GENNYALAA"},
+ { "Cellulophaga lytica","GENNYALAA"},
+ { "Cellvibrio gilvus","ADSKRTDFALAA"},
+ { "Cellvibrio japonicus","ANDDSYALAA"},
+ { "Chelativorans sp. BNC1","ANDNYAEARLAA"},
+ { "Chitinophaga pinensis","GESNYAMAA"},
+ { "Chlamydia muridarum","AEPKAECEIISFADLNDLRVAA"},
+ { "Chlamydia psittaci","AEPKAECEIISFSELSEQRLAA"},
+ { "Chlamydia trachomatis","AEPKAECEIISFADLEDLRVAA"},
+ { "Chlamydophila abortus","AEPKAKCEIISFSELSEQRLAA"},
+ { "Chlamydophila caviae","AEPKAECEIISFSDLTEERLAA"},
+ { "Chlamydophila felis","AEPKAECEIISFSDLTQERLAA"},
+ { "Chlamydophila pecorum","AEPKAECEIISFSDLLVEERVAA"},
+ { "Chlamydophila pneumoniae","AEPKAECEIISLFDSVEERLAA"},
+ { "Chlamydophila psittaci","AEPKAECEIISFSELSEQRLAA"},
+ { "Chloracidobacterium thermophilum","AETQELALAA"},
+ { "Chlorobaculum parvum","ADDYSYAMAA"},
+ { "Chlorobium chlorochromatii","ADDYSYAMAA"},
+ { "Chlorobium limicola","ADDYSYAMAA"},
+ { "Chlorobium luteolum","ADDYSYAMAA"},
+ { "Chlorobium phaeobacteroides","ADDYSYAMAA"},
+ { "Chlorobium phaeovibrioides","ADDYSYAMAA"},
+ { "Chlorobium tepidum","ADDYSYAMAA"},
+ { "Chloroflexus aggregans","ANNNARVQPRLALAA"},
+ { "Chloroflexus aurantiacus","ANTNTRAQARLALAA"},
+ { "Chloroherpeton thalassium","ADDYSYAMAA"},
+ { "Chromobacterium violaceum","ANDETYALAA"},
+ { "Chromohalobacter salexigens","ANDDNYAQGALAA"},
+ { "Chroococcidiopsis PCC6712","ANNIVKFERQAVFA"},
+ { "Citrobacter koseri","ANDENYALAA"},
+ { "Citrobacter rodentium","ANDENYALAA"},
+ { "Clavibacter michiganensis","ANNKQSSFVLAA"},
+ { "Cloacamonas acidaminovorans","ANNNYALAA"},
+ { "Clostridiales genomosp.","ANKNYSYAAA"},
+ { "Clostridium acetobutylicum","DNENNLALAA"},
+ { "Clostridium acidurici","ANDNYALAA"},
+ { "Clostridium beijerinckii","AEDNFALAA"},
+ { "Clostridium botulinum","ANDNFALAA"},
+ { "Clostridium cellulolyticum","AKNDNFALAAA"},
+ { "Clostridium cellulovorans","DENYLLAA"},
+ { "Clostridium clariflavum","AENDNYALAAA"},
+ { "Clostridium difficile","ADDNFAIAA"},
+ { "Clostridium kluyveri","ENDNLALAA"},
+ { "Clostridium lentocellum","AEDNLAIAA"},
+ { "Clostridium ljungdahlii","ENNNENLALAA"},
+ { "Clostridium perfringens","AEDNFALAA"},
+ { "Clostridium phytofermentans","ANDNLAYAA"},
+ { "Clostridium saccharolyticum","ANNNELALAA"},
+ { "Clostridium sp. BNL1100","AKNDNFALAAA"},
+ { "Clostridium sp. SY8519","AKEDNFELAMAA"},
+ { "Clostridium sticklandii","ANENYALAA"},
+ { "Clostridium tetani","ADDNFVLAA"},
+ { "Clostridium thermocellum","ANEDNYALAAA"},
+ { "Collimonas fungivorans","ANDNSYALAA"},
+ { "Colwellia psychrerythraea","ANDDTFALAA"},
+ { "Colwellia sp","ANDDTFALAA"},
+ { "Comamonas testosteroni","ANDERFALAA"},
+ { "Conexibacter woesei","ADSHEYALAA"},
+ { "Coprothermobacter proteolyticus","AEPEFALAA"},
+ { "Coraliomargarita akajimensis","GEEQFALAA"},
+ { "Corallococcus coralloides","ANDNVELALAA"},
+ { "Coriobacterium glomerans","GMAQTKIEPTRNPRARRRAQGNRISTGD"},
+ { "Corynebacterium aurimucosum","AEKNSQRDYALAA"},
+ { "Corynebacterium diphtheriae","AENTQRDYALAA"},
+ { "Corynebacterium efficiens","AEKTQRDYALAA"},
+ { "Corynebacterium glutamicum","AEKSQRDYALAA"},
+ { "Corynebacterium jeikeium","AENTQRDYALAA"},
+ { "Corynebacterium kroppenstedtii","AENTQRDYALAA"},
+ { "Corynebacterium pseudotuberculosis","AEKTQRDYALAA"},
+ { "Corynebacterium resistens","AENTQRDYALAA"},
+ { "Corynebacterium ulcerans","AEKTQRDYALAA"},
+ { "Corynebacterium urealyticum","AENTQRDYALAA"},
+ { "Corynebacterium variabile","AENTQRDYALAA"},
+ { "Coxiella burnetii","ANDSNYLQEAYA"},
+ { "Croceibacter atlanticus","GENNYALAA"},
+ { "Crocosphaera watsonii","ANNIVSFKRVAVAA"},
+ { "Cronobacter sakazakii","ANDENYALAA"},
+ { "Cronobacter turicensis","ANDENYALAA"},
+ { "Cryptobacterium curtum","DNNKSFGRQYALAA"},
+ { "Cupriavidus metallidurans","ANDERYALAA"},
+ { "Cupriavidus necator","ANDERYALAA"},
+ { "Cupriavidus taiwanensis","ANDERYALAA"},
+ { "Cyanidioschyzon merolae Chloroplast","ANQILPFSIPVKHLAV"},
+ { "Cyanidium caldarium chloroplast","ANNIIEISNIRKPALVV"},
+ { "Cyanobium gracile","ANNIVRFSRQAAPVAA"},
+ { "Cyanobium sp. PCC 6904","ANNIVRFSRQAAPVAA"},
+ { "Cyanobium sp. PCC 7009","ANNIVRFSRQAAPVAA"},
+ { "Cyanophora paradoxa chloroplast","ATNIVRFNRKAAFAV"},
+ { "Cyanothece sp. ATCC 51142","ANNIVSFKRVAVAA"},
+ { "Cyanothece sp. PCC 7424","ANNIVPFARKAAPVAA"},
+ { "Cyanothece sp. PCC 7425","ANNIVPFARKAVAVA"},
+ { "Cyanothece sp. PCC 7822","ANNIVPFARKSALVAA"},
+ { "Cyanothece sp. PCC 8801","ANNIVSFKRVAVAA"},
+ { "Cyclobacterium marinum","GESNYAMAA"},
+ { "Cycloclasticus sp. P1","ANDDNYAIAA"},
+ { "Cytophaga hutchinsonii","GEESYAMAA"},
+ { "Dechloromonas agitata","ANDEQFAIAA"},
+ { "Dechloromonas aromatica","ANDEQFAIAA"},
+ { "Dechlorosoma suillum","ANDERFAIAA"},
+ { "Deferribacter desulfuricans","ANDELALAA"},
+ { "Dehalococcoides ethenogenes","GERELVLAG"},
+ { "Dehalococcoides sp. CBDB1","GERELVLAG"},
+ { "Dehalococcoides sp. VS","GERELVLAG"},
+ { "Dehalogenimonas lykanthroporepellens","DAKEISAGLERFRRLKLEGREQKAG"},
+ { "Deinococcus deserti","GNQNYALAA"},
+ { "Deinococcus geothermalis","GNQNYALAA"},
+ { "Deinococcus gobiensis","GNQNYALAA"},
+ { "Deinococcus maricopensis","GNNNSTTFALAA"},
+ { "Deinococcus proteolyticus","GENNYALAA"},
+ { "Deinococcus radiodurans","GNQNYALAA"},
+ { "Delftia acidovorans","ANDERFALAA"},
+ { "Delftia sp. Cs1-4","ANDERFALAA"},
+ { "Denitrovibrio acetiphilus","ANNEHTLAAA"},
+ { "Desulfarculus baarsii","ADDYNYAVAA"},
+ { "Desulfatibacillum alkenivorans","ADDYNYAMAA"},
+ { "Desulfitobacterium hafniense","ANDDNYALAA"},
+ { "Desulfobacca acetoxidans","ADNYGYALAA"},
+ { "Desulfobacterium autotrophicum","ADDYNYAVAA"},
+ { "Desulfobacula toluolica","ADDYNYAVAA"},
+ { "Desulfobulbus propionicus","ADDYNYALAA"},
+ { "Desulfococcus oleovorans","ADDYNYAVAA"},
+ { "Desulfohalobium retbaense","ANDYDYALAA"},
+ { "Desulfomicrobium baculatum","ANDNYDYAMAA"},
+ { "Desulfomonile tiedjei","ANDYEYALAA"},
+ { "Desulforudis audaxviator","AKNETYALAA"},
+ { "Desulfotalea psychrophila","ADDYNYAVAA"},
+ { "Desulfotomaculum acetoxidans","ANNDYALAA"},
+ { "Desulfotomaculum carboxydivorans","ANEEYALAA"},
+ { "Desulfotomaculum kuznetsovii","ANEEYALAA"},
+ { "Desulfotomaculum reducens","ANEEYALAA"},
+ { "Desulfotomaculum ruminis","ANEEYALAA"},
+ { "Desulfovibrio aespoeensis","ANNDYDYAIAA"},
+ { "Desulfovibrio africanus","ANDYNYSLAA"},
+ { "Desulfovibrio alaskensis","ANNDYEYAMAA"},
+ { "Desulfovibrio desulfuricans","ANNDYDYAYAA"},
+ { "Desulfovibrio desulfuricans 2 (G20)","ANNDYEYAMAA"},
+ { "Desulfovibrio magneticus","ANDYDYALAA"},
+ { "Desulfovibrio salexigens","ANDNYDYAMAA"},
+ { "Desulfovibrio vulgaris","ANNYDYALAA"},
+ { "Desulfovibrio yellowstonii","ANNELALAA"},
+ { "Desulfurispirillum indicum","ANDENVLAAA"},
+ { "Desulfurivibrio alkaliphilus","ADDYAYAAAA"},
+ { "Desulfurobacterium thermolithotrophum","ANEELALAA"},
+ { "Desulfuromonas acetoxidans","ADTDVSYALAA"},
+ { "Dichelobacter nodosus","ANDDNYALAA"},
+ { "Dickeya dadantii","ANDENFAPAALAA"},
+ { "Dickeya zeae","ANDENFAPAALAA"},
+ { "Dictyoglomus thermophilum","ANTNLALAA"},
+ { "Dictyoglomus turgidum","ANTNLALAA"},
+ { "Dinoroseobacter shibae","ANDNRAPVAVAA"},
+ { "Dyadobacter fermentans","GESTYAMAA"},
+ { "Edwardsiella tarda","ANDENYALAA"},
+ { "Eggerthella lenta","GKNNTQSAPALAMAA"},
+ { "Eggerthella sp. YY7918","GKNNTQSAPALAMAA"},
+ { "Ehrlichia canis","ANDNFVFANDNNSSVAGLVAA"},
+ { "Ehrlichia chaffeensis","ANDNFVFANDNNSSANLVAA"},
+ { "Ehrlichia ruminantium 1","ANDNFVSANDNNSTANLVAA"},
+ { "Ehrlichia ruminantium 2","ANDNFVSANDNNSTANLVAA"},
+ { "Elusimicrobium minutum","GNQTELNWATA"},
+ { "Emiliania huxleyi chloroplast","ANNILNFNSKLAIA"},
+ { "Emticicia oligotrophica","GNTSYAMAA"},
+ { "Enterobacter aerogenes","ANDENYALAA"},
+ { "Enterobacter cancerogenus","ANDENYALAA"},
+ { "Enterobacter cloacae","ANDENYALAA"},
+ { "Enterobacter lignolyticus","ANDENYALAA"},
+ { "Enterobacter sakazakii","ANDENYALAA"},
+ { "Enterobacter sp. 638","ANDENYALAA"},
+ { "Enterococcus durans","AKNENNSYALAA"},
+ { "Enterococcus faecalis","AKNENNSFALAA"},
+ { "Enterococcus faecium","AKNENNSYALAA"},
+ { "Enterococcus hirae","AKNENNSYALAA"},
+ { "Erwinia amylovora","ANDENFAPAALAA"},
+ { "Erwinia billingiae","ANDENYALAA"},
+ { "Erwinia carotovora","ANDENYALAA"},
+ { "Erwinia chrysanthemi","ANDENFAPAALAA"},
+ { "Erwinia pyrifoliae","AKLKYNESVANDGEYELIAAAA"},
+ { "Erwinia sp. Ejp617","AKLYNNIPVANDGEFITPALAA"},
+ { "Erwinia tasmaniensis","ANDENFAPAALAA"},
+ { "Erysipelothrix rhusiopathiae","GNNSLQFAA"},
+ { "Erythrobacter litoralis","ANDNEALALAA"},
+ { "Escherichia coli","ANDENYALAA"},
+ { "Ethanoligenens harbinense","AKDNVIRVNFGRSEEALAA"},
+ { "Eubacterium eligens","ANDNLAYAA"},
+ { "Eubacterium limosum","AKENRSYGMALAA"},
+ { "Eubacterium rectale","AEDNLAYAA"},
+ { "Exiguobacterium sibiricum","GKTNTQLAAA"},
+ { "Exiguobacterium sp. AT1b","GKTNTQLAAA"},
+ { "Ferrimonas balearica","ANDENYALAA"},
+ { "Fervidobacterium nodosum","ANEYVPLAA"},
+ { "Fervidobacterium pennivorans","ANEYVPLAA"},
+ { "Fibrobacter succinogenes","ADENYALAA"},
+ { "Filifactor alocis","ANENNLLAA"},
+ { "Finegoldia magna","AEDNNFALAA"},
+ { "Flavobacteriaceae bacterium","GDQEFALAA"},
+ { "Flavobacterium columnare","GENNYALAA"},
+ { "Flavobacterium indicum","GENNYALAA"},
+ { "Flavobacterium johnsoniae","GENNYALAA"},
+ { "Flexibacter litoralis","GESNYAMAA"},
+ { "Flexistipes sinusarabici","ANDEFALAAA"},
+ { "Fluviicola taffensis","DNTSYALAA"},
+ { "Francisella cf.","ANDSNFAAVAKAA"},
+ { "Francisella noatunensis","ANDSNFAAVTKAA"},
+ { "Francisella novicida","ANDSNFAAVAKAA"},
+ { "Francisella philomiragia","ANDSNFAAVAKAA"},
+ { "Francisella sp. TX077308","ANDSNFAAVAKAA"},
+ { "Francisella tularensis 1","GNKKANRVAANDSNFAAVAKAA"},
+ { "Francisella tularensis 2","ANDSNFAAVAKAA"},
+ { "Frankia alni","ANKTQPVTPLYALAA"},
+ { "Frankia sp. CcI3","ANKTQPTTPTYALAA"},
+ { "Frankia sp. EAN1pec","ATKTQPASSTFALAA"},
+ { "Frankia sp. EuI1c","ANSEQSATSAYALAA"},
+ { "Frankia symbiont","ANKSQSATPRTFALAA"},
+ { "Frateuria aurantia","ANDDNYALAA"},
+ { "Fremyella diplosiphon","ANNIVKFARKEALVAA"},
+ { "Fusobacterium nucleatum 1","GNKDYALAA"},
+ { "Fusobacterium nucleatum 2","GNKEYALAA"},
+ { "Gallibacterium anatis","ANDENYALAA"},
+ { "Gallionella capsiferriformans","ANDENYALAA"},
+ { "gamma proteobacterium","ANDESYALAA"},
+ { "Gammaproteobacteria SAR-1","ANNYNYSLAA"},
+ { "Gardnerella vaginalis","AKSNRTEFALAA"},
+ { "Gemmata obscuriglobus","AEPQYSLAA"},
+ { "Gemmatimonas aurantiaca","ANNNLALAA"},
+ { "Geobacillus kaustophilus","GKQNYALAA"},
+ { "Geobacillus sp. WCH70","GKENYALAA"},
+ { "Geobacillus sp. Y4.1MC1","GKENYALAA"},
+ { "Geobacillus stearothermophilus","GKQNYALAA"},
+ { "Geobacillus thermodenitrificans","GKENYALAA"},
+ { "Geobacter bemidjiensis","ADNYDYALAA"},
+ { "Geobacter daltonii","ADNYDYALAA"},
+ { "Geobacter lovleyi","ADNYNTQPVALAA"},
+ { "Geobacter metallireducens","ADNYDYAVAA"},
+ { "Geobacter sp. M18","ADNYDYALAA"},
+ { "Geobacter sp. M21","ADNYDYALAA"},
+ { "Geobacter sulfurreducens","ADNYDYAVAA"},
+ { "Geobacter uraniireducens","ADNYNYALAA"},
+ { "Geodermatophilus obscurus","ADSSQREFALAA"},
+ { "Glaciecola nitratireducens","ANDENYALAA"},
+ { "Glaciecola sp. 4H-3-7+YE-5","ANDENYALAA"},
+ { "Gloeobacter violaceus","ATNNVVPFARARATVAA"},
+ { "Gluconacetobacter diazotrophicus","ANDNSEVLAVAA"},
+ { "Gluconacetobacter xylinus","ANDNSEVLAVAA"},
+ { "Gluconobacter oxydans","ANDNSEVLAVAA"},
+ { "Gordonia bronchialis","ADSNQRDYALAA"},
+ { "Gordonia polyisoprenivorans","ADKNQRDYALAA"},
+ { "Gordonia rubripertincta","ADSNQRDYALAA"},
+ { "Gordonia sp. KTR9","ADSNQRDYALAA"},
+ { "Gracilaria tenuistipitata chloroplast","AKNNILTLSRRLIYA"},
+ { "Gramella forsetii","GENNYALAA"},
+ { "Granulibacter bethesdensis","ANDNHEALAVAA"},
+ { "Granulicella mallensis","AEPQFALAA"},
+ { "Granulicella tundricola","AEPQFALAA"},
+ { "Guillardia theta chloroplast","ASNIVSFSSKRLVSFA"},
+ { "Haemophilus ducreyi","ANDEQYALAA"},
+ { "Haemophilus influenzae","ANDEQYALAA"},
+ { "Haemophilus parainfluenzae","ANDEQYALAA"},
+ { "Haemophilus parasuis","ANDEQYALAA"},
+ { "Haemophilus somnus","ANDEQYALAA"},
+ { "Hahella chejuensis","ANDETYALAA"},
+ { "Halanaerobium hydrogeniformans","ANDNSYALAAA"},
+ { "Halanaerobium praevalens","ANDNNYTLAAA"},
+ { "Haliangium ochraceum","ANDNAVALAA"},
+ { "Haliscomenobacter hydrossis","GESNYAMAA"},
+ { "Halobacillus halophilus","GESNDNLAVAA"},
+ { "Halomonas elongata","ANDDNYAQGALAA"},
+ { "Halorhodospira halophila","ANDDNYALAA"},
+ { "Halothermothrix orenii","ADNNNYALAAA"},
+ { "Halothiobacillus neapolitanus","ANDDNYALAA"},
+ { "Hamiltonella defensa","AKINKNRPAANGYMPVAALAA"},
+ { "Helicobacter acinonychis","VNNTDYAPAYAKVA"},
+ { "Helicobacter bizzozeronii","VNNPNYAPNYAKAA"},
+ { "Helicobacter cetorum","VNNTNYAPAYAKVA"},
+ { "Helicobacter cinaedi","ANNTNYAPVYAKVA"},
+ { "Helicobacter felis","VNNPNYAPNYAKAA"},
+ { "Helicobacter hepaticus","ANNANYAPAYAKVA"},
+ { "Helicobacter mustelae","ANNKNYAPAYAKVA"},
+ { "Helicobacter pylori 1","VNNTDYAPAYAKAA"},
+ { "Helicobacter pylori 2","VNNTDYAPAYAKAA"},
+ { "Helicobacter pylori 3","VNNADYAPAYAKAA"},
+ { "Heliobacillus mobilis","AEDNYALAA"},
+ { "Heliobacterium modesticaldum","AEENYALAA"},
+ { "Herbaspirillum seropedicae","ANDESYALAA"},
+ { "Herminiimonas arsenicoxydans","DNSYALAA"},
+ { "Herpetosiphon aurantiacus","GKNTFRAPVALAA"},
+ { "Hippea maritima","ADTEYALAA"},
+ { "Hirschia baltica","ANDNFAEGELLAA"},
+ { "Hydrogenophaga palleronii","ANDERFALAA"},
+ { "Hyphomicrobium denitrificans","ANDNYAEAALAA"},
+ { "Hyphomicrobium sp. MC1","ANDNYAEAALAA"},
+ { "Hyphomonas neptunium","ANDNFAEGELLAA"},
+ { "Idiomarina loihiensis","ANDDNYALAA"},
+ { "Ignavibacterium album","GEYNYALAA"},
+ { "Ilyobacter polytropus","ENNNYALAA"},
+ { "Intrasporangium calvum","ANSKRTDFALAA"},
+ { "Isoptericola variabilis","ADNKRTDFTLAA"},
+ { "Jannaschia sp. CCS1","ANDNRAPAMALAA"},
+ { "Janthinobacterium sp. Marseille","ANDNSYALAA"},
+ { "Jonesia denitrificans","ADTKRTDFALAA"},
+ { "Kangiella koreensis","ANEDNYALAA"},
+ { "Ketogulonicigenium vulgare","ANNNRAPAMALAA"},
+ { "Kineococcus radiotolerans","ADSKRTEFALAA"},
+ { "Kitasatospora setae","ANSKRDSQQFALAA"},
+ { "Klebsiella oxytoca","ANDENYALAA"},
+ { "Klebsiella pneumoniae","ANDENYALAA"},
+ { "Kocuria rhizophila","AKSKRTDFALAA"},
+ { "Koribacter versatilis","ANTQMAYAA"},
+ { "Kosmotoga olearia","ANTEFALAA"},
+ { "Kribbella flavida","ADSKRSSFALAA"},
+ { "Krokinobacter sp. 4H-3-7-5","GENNYALAA"},
+ { "Kyrpidia tusciae","ANKQELALAA"},
+ { "Kytococcus sedentarius","ANSKRTDFALAA"},
+ { "Lacinutrix sp. 5H-3-7-4","GENNYALAA"},
+ { "Lactobacillus acidophilus","ANNKNSYALAA"},
+ { "Lactobacillus amylovorus","ANNKNSYALAA"},
+ { "Lactobacillus brevis","AKNNNNSYALAA"},
+ { "Lactobacillus buchneri","AKNNNNSYALAA"},
+ { "Lactobacillus casei","AKNENSYALAA"},
+ { "Lactobacillus crispatus","ANNKNSYALAA"},
+ { "Lactobacillus delbrueckii 1","AKNENNSYALAA"},
+ { "Lactobacillus delbrueckii 2","ANENSYAVAA"},
+ { "Lactobacillus fermentum","ANNNSQSYAYAA"},
+ { "Lactobacillus gallinarum","ANNKNSYALAA"},
+ { "Lactobacillus gasseri","ANNENSYAVAA"},
+ { "Lactobacillus helveticus","ANNKNSYALAA"},
+ { "Lactobacillus johnsonii","ANNENSYAVAA"},
+ { "Lactobacillus kefiranofaciens","ANNKNSYALAA"},
+ { "Lactobacillus plantarum","AKNNNNSYALAA"},
+ { "Lactobacillus reuteri","ANNNSNSYAYAA"},
+ { "Lactobacillus rhamnosus","AKNENSYALAA"},
+ { "Lactobacillus ruminis","AKNNNYSYALAA"},
+ { "Lactobacillus sakei","ANNNNSYAVAA"},
+ { "Lactobacillus salivarius","AKNNNNSYALAA"},
+ { "Lactobacillus sanfranciscensis","AKNNNNSYALAA"},
+ { "Lactococcus garvieae","AKNNTSYALAA"},
+ { "Lactococcus lactis","AKNNTQTYAMAA"},
+ { "Lactococcus plantarum","AKNTQTYALAA"},
+ { "Lactococcus raffinolactis","AKNTQTYAVAA"},
+ { "Laribacter hongkongensis","ANDDTYALAA"},
+ { "Lawsonia intracellularis","ANNNYDYALAA"},
+ { "Leadbetterella byssophila","GNTSYAMAA"},
+ { "Legionella longbeachae","ANDENFAGGEAIAA"},
+ { "Legionella pneumophila","ANDENFAGGEAIAA"},
+ { "Leifsonia xyli","ANSKSTVSAKADFALAA"},
+ { "Leptolyngbya boryana","ANNIVPFARKTAPVAA"},
+ { "Leptospira biflexa","ANNEFALAA"},
+ { "Leptospira borgpetersenii","ANNELALAA"},
+ { "Leptospira interrogans","ANNELALAA"},
+ { "Leptospirillum ferriphilum","ANEELALAA"},
+ { "Leptospirillum ferrooxidans","ANNEMALAA"},
+ { "Leptospirillum groupII","ANEELALAA"},
+ { "Leptospirillum groupIII","ANEELALAA"},
+ { "Leptospirillum sp. Group II '5-way CG'","ANEELALAA"},
+ { "Leptospirillum sp. Group III","ANEELALAA"},
+ { "Leptothrix cholodnii","ANDSTYALAA"},
+ { "Leptotrichia buccalis","GNDNYALAA"},
+ { "Leuconostoc carnosum","AKNENTFAVAA"},
+ { "Leuconostoc citreum","AKNENSFAIAA"},
+ { "Leuconostoc gasicomitatum","AKNENSFAIAA"},
+ { "Leuconostoc gelidum","AKNENSFAIAA"},
+ { "Leuconostoc lactis","AKNENSFAIAA"},
+ { "Leuconostoc mesenteroides","AKNENSFAIAA"},
+ { "Leuconostoc pseudomesenteroides","AKNENSYAIAA"},
+ { "Leuconostoc sp. C2","AKNENSFAIAA"},
+ { "Liberibacter asiaticus","ANDNSAREVLAA"},
+ { "Liberibacter solanacearum","ANDNFAGETRLAA"},
+ { "Listeria grayi 1","GKEKQNLAFAA"},
+ { "Listeria grayi 2","GKQNNNLAFAA"},
+ { "Listeria innocua","GKEKQNLAFAA"},
+ { "Listeria ivanovii","GKEKQNLAFAA"},
+ { "Listeria monocytogenes","GKEKQNLAFAA"},
+ { "Listeria seeligeri","GKEKQNLAFAA"},
+ { "Listeria welshimeri","GKEKQNLAFAA"},
+ { "Lysinibacillus sphaericus","GKQQNLAFAA"},
+ { "Macrococcus caseolyticus","GKTNNFAVAA"},
+ { "Magnetococcus marinus","ANDEHYAPAFAAA"},
+ { "Magnetococcus sp.","ANDEHYAPAFAAA"},
+ { "Magnetospirillum magneticum","ANDNVELAAAA"},
+ { "Magnetospirillum magnetotacticum 1","ANDNFAPVAVAA"},
+ { "Magnetospirillum magnetotacticum 2","ANDNVELAAAA"},
+ { "Mahella australiensis","ADNNAELALAA"},
+ { "Mannheimia haemolytica","ANDEQYALAA"},
+ { "Mannheimia succiniciproducens","ANDEQYALAA"},
+ { "Maribacter sp. HTCC2170","GDNNYALAA"},
+ { "Maricaulis maris","ANDNFAEEVALAA"},
+ { "Marinithermus hydrothermalis","GNNRYALAA"},
+ { "Marinitoga piezophila","AEENYALAA"},
+ { "Marinobacter adhaerens","ANDENYALAA"},
+ { "Marinobacter aquaeolei","ANDENYALAA"},
+ { "Marinobacter hydrocarbonoclasticus","ANDENYALAA"},
+ { "Marinobacter sp. BSs20148","ANDENYSLAA"},
+ { "Marinomonas mediterranea","ANDENYALAA"},
+ { "Marinomonas posidonica","ANDENYALAA"},
+ { "Marinomonas sp. MWYL1","ANDENYALAA"},
+ { "Marivirga tractuosa","GESNYAMAA"},
+ { "Megasphaera elsdenii","AKENNFALAA"},
+ { "Meiothermus ruber","GNVRSNSYALAA"},
+ { "Meiothermus silvanus","GNTQRSYALAA"},
+ { "Melioribacter roseus","GEYNYALAA"},
+ { "Melissococcus plutonius","AKKQNYSYAVAA"},
+ { "Mesoplasma florum","ANKNEENTNEVPTFMLNAGQANYAFA"},
+ { "Mesorhizobium ciceri","ANDNYAEARLAA"},
+ { "Mesorhizobium loti","ANDNYAEARLAA"},
+ { "Mesorhizobium opportunistum","ANDNYAEARLAA"},
+ { "Mesorhizobium sp.","ANDNYAEARLAA"},
+ { "Mesostigma viride chloroplast","ANNILPFNRKTAVAV"},
+ { "Mesotoga prima","ANNEFALAA"},
+ { "Methylacidiphilum infernorum","ANEELALAA"},
+ { "Methylibium petroleiphilum","ANDERFALAA"},
+ { "Methylobacillus flagellatus","ANDETYALAA"},
+ { "Methylobacillus glycogenes","ANDETYALAA"},
+ { "Methylobacterium extorquens","ANDNFAPVAVAA"},
+ { "Methylobacterium nodulans","ANDNYAPVAVAA"},
+ { "Methylobacterium populi","ANDNFAPVAVAA"},
+ { "Methylobacterium radiotolerans","ANDNFAPVAVAA"},
+ { "Methylobacterium sp. 4-46","ANDNYAPVAVAA"},
+ { "Methylocella silvestris","ANDNYAPVAVAA"},
+ { "Methylococcus capsulatus","ANDDVYALAA"},
+ { "Methylocystis sp. SC2","ANDNYAPVAVAA"},
+ { "Methylomicrobium alcaliphilum","ANDENYSMALAA"},
+ { "Methylomirabilis oxyfera","ANHELALAA"},
+ { "Methylomonas methanica","ANDENYSVALAA"},
+ { "Methylophaga sp. JAM1","ANDNNYALAA"},
+ { "Methylophaga sp. JAM7","ANDNNYALAA"},
+ { "Methylotenera mobilis","ANDETYSLAA"},
+ { "Methylotenera versatilis","ANDETYSLAA"},
+ { "Methylovorus glucosetrophus","ANDETYALAA"},
+ { "Micavibrio aeruginosavorus","ANDNFVVANDNSREAAVAIAA"},
+ { "Microbacterium testaceum","ADAKRTDFALAA"},
+ { "Microbulbifer degradans","ANDDNYGAQLAA"},
+ { "Micrococcus luteus","AESKRTDFALAA"},
+ { "Microcystis aeruginosa","ANNIVPFARKAAPVAA"},
+ { "Microlunatus phosphovorus","AKSEQRTDFALAA"},
+ { "Micromonospora aurantiaca","AKNNRADFALAA"},
+ { "Midichloria mitochondrii","ANNKFVPANSDFVPALQAA"},
+ { "Mobiluncus curtisii","AERNSTESFALAA"},
+ { "Modestobacter marinus","ADSSQRDFALAA"},
+ { "Moorella thermoacetica","ADDNLALAA"},
+ { "Moranella endobia","ANDSQYESVALAA"},
+ { "Moraxella catarrhalis","ANDETYALAA"},
+ { "Muricauda ruestringensis","GENNYALAA"},
+ { "Mycobacteriophage Bxz1 virion","ATDTDATVTDAEIEAFFAEEAAALV"},
+ { "Mycobacterium abscessus","ADSHQRDYALAA"},
+ { "Mycobacterium africanum","ADSHQRDYALAA"},
+ { "Mycobacterium austroafricanum","ADSNQRDYALAA"},
+ { "Mycobacterium avium","ADSHQRDYALAA"},
+ { "Mycobacterium bovis","ADSHQRDYALAA"},
+ { "Mycobacterium chubuense","ADSNQRDYALAA"},
+ { "Mycobacterium gilvum","ADSNQRDYALAA"},
+ { "Mycobacterium indicus","ADSHQRDYALAA"},
+ { "Mycobacterium intracellulare","ADSHQRDYALAA"},
+ { "Mycobacterium leprae","ADSYQRDYALAA"},
+ { "Mycobacterium marinum","ADSHQRDYALAA"},
+ { "Mycobacterium microti","ADSHQRDYALAA"},
+ { "Mycobacterium phage","ATDTDATVTDAEIEAFFAEEAAALV"},
+ { "Mycobacterium rhodesiae","ADSNQRDFALAA"},
+ { "Mycobacterium smegmatis","ADSNQRDYALAA"},
+ { "Mycobacterium sp. MCS","ADTNQRDYALAA"},
+ { "Mycobacterium tuberculosis","ADSHQRDYALAA"},
+ { "Mycoplasma agalactiae","ANDKKSEEVRVELPAFAIANANANLAFA"},
+ { "Mycoplasma arthritidis","GNLETSEDKKLDLQFVMNSQTQQNLLFA"},
+ { "Mycoplasma bovis","ANDKKSEEVRLELPAFAIANANANLAFA"},
+ { "Mycoplasma capricolum","ANKNEETFEMPAFMMNNASAGANFMFA"},
+ { "Mycoplasma conjunctivae","ANKKEDKAVDVNLLASQSFNSNLAFA"},
+ { "Mycoplasma crocodyli","GKSKKAENEFSFSNPAFAGNLNLAFA"},
+ { "Mycoplasma fermentans","AEDKKAEEVNISSLMIAQKMQSQSNLAFA"},
+ { "Mycoplasma gallisepticum","DKTSKELADENFVLNQLASNNYALNF"},
+ { "Mycoplasma genitalium 1","DKENNEVLVEPNLIINQQASVNFAFA"},
+ { "Mycoplasma genitalium 2","DKENNEVLVDPNLIINQQASVNFAFA"},
+ { "Mycoplasma haemofelis","ANKQERESSVVNLLMSQPQDLASLSF"},
+ { "Mycoplasma hominis","AEEKQNKQSFVLNQMMSSNPVFAY"},
+ { "Mycoplasma hyorhinis","GKENKKEDYSLLMNASTQSNLAFAF"},
+ { "Mycoplasma leachii","ANKNEETFEMPAFMMNNASAGANFMFA"},
+ { "Mycoplasma mobile","GKEKQLEVSPLLMSSSQSNLVFA"},
+ { "Mycoplasma mycoides","ADKNEENFEMPAFMINNASAGANYMFA"},
+ { "Mycoplasma penetrans","AKNNKNEAVEVELNDFEINALSQNANLALYA"},
+ { "Mycoplasma pneumoniae","DKNNDEVLVDPMLIANQQASINYAFA"},
+ { "Mycoplasma pulmonis","GTKKQENDYQDLMISQNLNQNLAFASV"},
+ { "Mycoplasma putrefaciens","ANKKTEEFEMPAFMINNASAGANLMFA"},
+ { "Mycoplasma synoviae","GNKQSQVEEVTREFSPSLYTFNSNLAYA"},
+ { "Myxococcus fulvus","ANDNVELALAA"},
+ { "Myxococcus xanthus","ANDNVELALAA"},
+ { "Nakamurella multipartita","ADSKRTEFALAA"},
+ { "Natranaerobius thermophilus","ADEDYALAAA"},
+ { "Nautilia profundicola","AANNTNYSPAVARAAA"},
+ { "Neisseria gonorrhoeae","ANDETYALAA"},
+ { "Neisseria lactamica","ANDETYALAA"},
+ { "Neisseria meningitidis","ANDETYALAA"},
+ { "Nephroselmis olivacea chloroplast","TTYHSCLEGHLS"},
+ { "Niastella koreensis","GNTQFAMAA"},
+ { "Nitratifractor salsuginis","ANNTDYRPAYAHAA"},
+ { "Nitratiruptor sp. SB155-2","ANNTDYRPAYAVAA"},
+ { "Nitrobacter hamburgensis","ANDNYAPVAQAA"},
+ { "Nitrobacter Nb-311A","ANDNYAPVAQAA"},
+ { "Nitrobacter winogradskyi","ANDNYAPVAQAA"},
+ { "Nitrosococcus halophilus","ANDDNYALAA"},
+ { "Nitrosococcus oceani","ANDDNYALAA"},
+ { "Nitrosococcus watsonii","ANDDNYALAA"},
+ { "Nitrosomonas cryotolerans","ANDENYALAA"},
+ { "Nitrosomonas europaea","ANDENYALAA"},
+ { "Nitrosomonas eutropha","ANDENYALAA"},
+ { "Nitrosomonas sp. AL212","ANDENYALAA"},
+ { "Nitrosomonas sp. Is79A3","ANDENYALAA"},
+ { "Nitrosospira multiformis","ANDENYALAA"},
+ { "Nitrospira defluvii","ANQELALAA"},
+ { "Nocardia brasiliensis","ADSNQREYALAA"},
+ { "Nocardia cyriacigeorgica","ADSHQREYALAA"},
+ { "Nocardia farcinica","ADSHQREYALAA"},
+ { "Nocardioides sp. JS614","ANTNRSSFALAA"},
+ { "Nocardiopsis alba","ANSKRTEFALAA"},
+ { "Nocardiopsis dassonvillei","ANSKRTEFALAA"},
+ { "Nostoc azollae","ANNIVKFARREALVAA"},
+ { "Nostoc PCC7120","ANNIVKFARKDALVAA"},
+ { "Nostoc punctiforme","ANNIVNFARKDALVAA"},
+ { "Nostoc sp. PCC 7120","ANNIVKFARKDALVAA"},
+ { "Novosphingobium aromaticivorans","ANDNEALALAA"},
+ { "Novosphingobium sp. PP1Y","ANDNEALALAA"},
+ { "Oceanimonas sp. GK1","ANDENYALAA"},
+ { "Oceanithermus profundus","GNDNYALAA"},
+ { "Oceanobacillus iheyensis","GKETNQPVLAAA"},
+ { "Ochrobactrum anthropi","ANDNKAQGYALAA"},
+ { "Odontella sinensis chloroplast","ANNLISSVFKSLSTKQNSLNLSFAV"},
+ { "Odoribacter splanchnicus","GENNYALAA"},
+ { "Oenococcus oeni","AKNNEPSYALAA"},
+ { "Oligotropha carboxidovorans","ANDNYAPVAQAA"},
+ { "Olsenella uli","DNDSYQGSYALAA"},
+ { "Ornithobacterium rhinotracheale","GNNEYALAA"},
+ { "Oscillatoria 6304","ANNIVPFARKAAPVAA"},
+ { "Oscillatoria acuminata","ANNIVPFARKAAPVAA"},
+ { "Owenweeksia hongkongensis","GENNFALAA"},
+ { "Paenibacillus larvae","GKQQNNYALAA"},
+ { "Paenibacillus mucilaginosus","GNQKQQLAFAA"},
+ { "Paenibacillus polymyxa","GKQQNNYAFAA"},
+ { "Paenibacillus sp. JDR-2","GKQQQTYAFAA"},
+ { "Paenibacillus sp. Y412MC10","GKQQNNYAFAA"},
+ { "Paenibacillus terrae","GKQQNNYAFAA"},
+ { "Paludibacter propionicigenes","GENNYALAA"},
+ { "Pantoea ananatis","ANDENYALAA"},
+ { "Pantoea sp. At-9b","ANDNYYDAPAALAA"},
+ { "Pantoea stewartii","ANDENYALAA"},
+ { "Pantoea vagans","ANDENYALAA"},
+ { "Parabacteroides distasonis","GENNYALAA"},
+ { "Parachlamydia acanthamoebae","ADSVSYAAAA"},
+ { "Parachlamydia UWE25","ANNSNKIAKVDFQEGTFARAA"},
+ { "Paracoccus denitrificans","ANDNRAPVALAA"},
+ { "Parvibaculum lavamentivorans","ANDNYAEARLAA"},
+ { "Parvularcula bermudensis","ANDNSSEGFALAA"},
+ { "Pasteurella multocida","ANDEQYALAA"},
+ { "Pavlova lutheri chloroplast","ANNILSFNRVAVA"},
+ { "Pectobacterium atrosepticum","ANDENYALAA"},
+ { "Pectobacterium carotovora","ANDENYALAA"},
+ { "Pectobacterium carotovorum","ANDENYALAA"},
+ { "Pectobacterium wasabiae","ANDENYALAA"},
+ { "Pediococcus claussenii","AKNNNNSYALAA"},
+ { "Pediococcus pentosaceus","AKNNNNSYALAA"},
+ { "Pedobacter heparinus","GENNYALAA"},
+ { "Pedobacter saltans","ENNYALAA"},
+ { "Pelagibacter sp. IMCC9063","ANESYAIAA"},
+ { "Pelagibacter ubique","ADESYALAA"},
+ { "Pelagibacterium halotolerans","ANDNNKAPVALAA"},
+ { "Pelobacter carbinolicus","ADTDVSYALAA"},
+ { "Pelobacter propionicus","ADNYNTPVALAA"},
+ { "Pelodictyon phaeoclathratiforme","ADDYSYAMAA"},
+ { "Pelotomaculum thermopropionicum","AKENYALAA"},
+ { "Petrotoga mobilis","GGSSLPKFSWNLA"},
+ { "Phaeobacter gallaeciensis","ANDNRAPAMAVAA"},
+ { "Photobacterium phosphoreum","ANDENYALAA"},
+ { "Photobacterium profundum","ANDENFALAA"},
+ { "Photorhabdus asymbiotica","ANDNEYALVA"},
+ { "Photorhabdus luminescens","ANDEKYALAA"},
+ { "Phycisphaera mikurensis","ANDENTIAGRIGFGNDALRLAA"},
+ { "Phytoplasma australiense","GKQTNSASEGDQIYNWVPSQSSQNLQQLAFA"},
+ { "Pirellula sp.","AEENFALAA"},
+ { "Pirellula staleyi","AESNLALAA"},
+ { "Planctomyces brasiliensis","ANKQYAMVA"},
+ { "Planctomyces limnophilus","ANTGNYALAA"},
+ { "Plectonema boryanum","ANNIVPFARKTAPVAA"},
+ { "Polaromonas JS666","ANDERFALAA"},
+ { "Polaromonas naphthalenivorans","ANDERFALAA"},
+ { "Polaromonas sp. JS666","ANDERFALAA"},
+ { "Polymorphum gilvum","ANDNYASDVALAA"},
+ { "Polynucleobacter necessarius","ANDERFALAA"},
+ { "Porphyra purpurea chloroplast","AENNIIAFSRKLAVA"},
+ { "Porphyromonas asaccharolytica","AETRHHPGGRCSEAL"},
+ { "Porphyromonas gingivalis","GENNYALAA"},
+ { "Prevotella denticola","GENNYALAA"},
+ { "Prevotella intermedia","GENNYALAA"},
+ { "Prevotella melaninogenica","GENNYALAA"},
+ { "Prevotella ruminicola","GNNEYALAA"},
+ { "Prochlorococcus marinus 1","ANKIVSFSRQTAPVAA"},
+ { "Prochlorococcus marinus 2","ANNIVRFSRQPALVAA"},
+ { "Prochlorococcus marinus 3","ANKIVSFSRQTAPVAA"},
+ { "Prochlorococcus marinus","ANNIVSFSRQTAPVAA"},
+ { "Propionibacterium acidipropionici","ADNKRTDFALAA"},
+ { "Propionibacterium acnes 1","AENTRTDFALAA"},
+ { "Propionibacterium acnes 2","AENTRTDFALAA"},
+ { "Propionibacterium freudenreichii","ADTNRTDFALAA"},
+ { "Propionibacterium propionicum","ANNSRTDFALAA"},
+ { "Prosthecochloris aestuarii","ADDYSYAMAA"},
+ { "Proteobacteria SAR-1, version 1","GENADYALAA"},
+ { "Proteobacteria SAR-1, version 2","ANNYNYSLAA"},
+ { "Proteobacteria SAR-1, version 3","ADNGYMAAA"},
+ { "Proteus mirabilis","ANDNQYKALAA"},
+ { "Protochlamydia amoebophila","ANNSNKIAKVDFQEGTFARAA"},
+ { "Providencia rettgeri","ANDENYALAA"},
+ { "Providencia stuartii","ANDENYALAA"},
+ { "Pseudoalteromonas atlantica","ANDENYALAA"},
+ { "Pseudoalteromonas haloplanktis","ANDDNYSLAA"},
+ { "Pseudoalteromonas sp. SM9913","ANDDNYSLAA"},
+ { "Pseudogulbenkiania sp. NH8B","ANDETYALAA"},
+ { "Pseudomonas aeruginosa","ANDDNYALAA"},
+ { "Pseudomonas brassicacearum","ANDENYGQEFAIAA"},
+ { "Pseudomonas chlororaphis","ANDETYGEYALAA"},
+ { "Pseudomonas entomophila","ANDENYEGYALAA"},
+ { "Pseudomonas fluorescens 1","ANDDQYGAALAA"},
+ { "Pseudomonas fluorescens 2","ANDENYGQEFALAA"},
+ { "Pseudomonas fluorescens 3 (Pf-5)","ANDETYGDYALAA"},
+ { "Pseudomonas fulva","ANDENYEGYALAA"},
+ { "Pseudomonas mendocina","ANDDNYALAA"},
+ { "Pseudomonas protegens","ANDETYGDYALAA"},
+ { "Pseudomonas putida 1","ANDENYGAEYKLAA"},
+ { "Pseudomonas stutzeri","ANDDNYEGYALAA"},
+ { "Pseudomonas syringae 1","ANDENYGAQLAA"},
+ { "Pseudomonas syringae 2","ANDETYGEYALAA"},
+ { "Pseudomonas syringae 3","ANDENYGAQLAA"},
+ { "Pseudonocardia dioxanivorans","ADKSQRAYALAA"},
+ { "Pseudovibrio sp. JE062","ANDNYAMDNAVAA"},
+ { "Pseudoxanthomonas spadix","ANDDNYGSDFALAA"},
+ { "Pseudoxanthomonas suwonensis","ANDDNYALAA"},
+ { "Psychrobacter 2734","ANDENYALAA"},
+ { "Psychrobacter arcticus","ANDENYALAA"},
+ { "Psychrobacter cryohalolentis","ANDENYALAA"},
+ { "Psychrobacter sp. PRwf-1","ANDETYALAA"},
+ { "Psychroflexus torquis","GEDNYALAA"},
+ { "Psychromonas ingrahamii","ANDSNYSLAA"},
+ { "Pusillimonas sp. T7-7","ANDERFALAA"},
+ { "Rahnella aquatilis","ANDENYALAA"},
+ { "Rahnella sp. Y9602","ANDENYALAA"},
+ { "Ralstonia eutropha","ANDERYALAA"},
+ { "Ralstonia metallidurans","ANDERYALAA"},
+ { "Ralstonia pickettii","ANDERYALAA"},
+ { "Ralstonia solanacearum","ANDNRYQLAA"},
+ { "Ramlibacter tataouinensis","ANDERFALAA"},
+ { "Renibacterium salmoninarum","ANSKRTDFALAA"},
+ { "Rhizobium etli","ANDNYAEARLAA"},
+ { "Rhizobium leguminosarum","ANDNYAEARLAA"},
+ { "Rhodobacter capsulatus","ANDNRAPVALAA"},
+ { "Rhodobacter sphaeroides","ANDNRAPVALAA"},
+ { "Rhodococcus equi","AESTQREYALAA"},
+ { "Rhodococcus erythropolis","ADSNQRDYALAA"},
+ { "Rhodococcus jostii","ADSNQRDYALAA"},
+ { "Rhodococcus opacus","ADSNQRDYALAA"},
+ { "Rhodoferax ferrireducens","ANDERFALAA"},
+ { "Rhodomicrobium vannielii","ANDNYAGARPVAIAA"},
+ { "Rhodomonas salina","ANNIVPFSRKVALV"},
+ { "Rhodopirellula baltica","AEENFALAA"},
+ { "Rhodopseudomonas palustris","ANDNYAPVAQAA"},
+ { "Rhodopseudomonas palustris 4","ANDNVRMNEVRLAA"},
+ { "Rhodospirillum centenum","ANDNTAPALRMAA"},
+ { "Rhodospirillum photometricum","ANDNVELAAAA"},
+ { "Rhodospirillum rubrum","ANDNVELAAAA"},
+ { "Rhodothermus marinus","ANDYSYAMAA"},
+ { "Rickettsia africae","ANDNNRSVGHLALAA"},
+ { "Rickettsia amblyommii","ANDNNRSVGRLALAA"},
+ { "Rickettsia australis","ANDNNRSVDLALAA"},
+ { "Rickettsia bellii","ANDNYRSAGTPALAVA"},
+ { "Rickettsia conorii","ANDNNRSVGHLALAA"},
+ { "Rickettsia heilongjiangensis","ANDNNRSVGRLALAA"},
+ { "Rickettsia massiliae","ANDNNRSVGRLALAA"},
+ { "Rickettsia montanensis","ANDNNRSVGRLALAA"},
+ { "Rickettsia parkeri","ANDNNRSVGHLALAA"},
+ { "Rickettsia peacockii","ANDNNRSVGRLALAA"},
+ { "Rickettsia philipii","ANDNNRSVGRLALAA"},
+ { "Rickettsia prowazekii","ANDNRYVGVPALAAA"},
+ { "Rickettsia rhipicephali","ANDNNRSVGRLALAA"},
+ { "Rickettsia rickettsii","ANDNNRSVGRLALAA"},
+ { "Rickettsia sibirica","ANDNNRSVGHLALAA"},
+ { "Rickettsia slovaca","ANDNNRSVGRLALAA"},
+ { "Rickettsia typhi","ANDNKRYVGVAALAAA"},
+ { "Riemerella anatipestifer","GNEEFALAA"},
+ { "Riesia pediculicola","AKTKNYAYAQAA"},
+ { "Robiginitalea biformata","GDNNYALAA"},
+ { "Roseburia hominis","AEDNLAYAA"},
+ { "Roseiflexus castenholzii","ANNNKVVAFKPAMALAA"},
+ { "Roseiflexus sp. RS-1","ANTNKVVAFKPAMALAA"},
+ { "Roseobacter denitrificans","ANDNRAPVAMAA"},
+ { "Roseobacter litoralis","ANDNRAPVAMAA"},
+ { "Rothia dentocariosa","AKSKRTDFALAA"},
+ { "Rothia mucilaginosa","AESKRTDFALAA"},
+ { "Rubrivivax gelatinosus","ANDERFALAA"},
+ { "Rubrobacter xylanophilus","ANDREMALAA"},
+ { "Ruegeria pomeroyi","ANDNRAPVALAA"},
+ { "Ruegeria sp. TM1040","ANDNRAPVALAA"},
+ { "Ruminococcus albus","GHGYFAKAS"},
+ { "Ruminococcus albus","DNDNFAMAA"},
+ { "Runella slithyformis","GEYSYAMAA"},
+ { "Ruthia magnifica","ANENNYALAA"},
+ { "Saccharomonospora viridis","AKTNSQRDFALAA"},
+ { "Saccharophagus degradans","ANDDNYGAQLAA"},
+ { "Saccharopolyspora erythraea","ADKSQREFALAA"},
+ { "Salinibacter ruber","ADDYSYAMAA"},
+ { "Salinispora arenicola","AKQNRADFALAA"},
+ { "Salinispora tropica","AKQNRADFALAA"},
+ { "Salmonella bongori","ANDENYALAA"},
+ { "Salmonella enterica 1","ANDETYALAA"},
+ { "Salmonella enterica 2","ANDENYALAA"},
+ { "Salmonella enterica 3","ANDETYALAA"},
+ { "Salmonella enterica 5","ANDETYALAA"},
+ { "Salmonella enterica 6","ANDENYALAA"},
+ { "Salmonella paratyphi","ANDENYALAA"},
+ { "Salmonella typhimurium","ANDETYALAA"},
+ { "Salmonella typhi","ANDETYALAA"},
+ { "Sanguibacter keddieii","ADSKRTDFALAA"},
+ { "Saprospira grandis","GNTNYALAA"},
+ { "Sebaldella termitidis","GNDNYALAA"},
+ { "secondary endosymbiont","ANDSQFESKTALAA"},
+ { "Segniliparus rotundus","ADTTQRDYALAA"},
+ { "Selenomonas ruminantium","DEFDYAYAA"},
+ { "Selenomonas sputigena","ANEDYALAA"},
+ { "Serratia marcescens","ANDENYALAA"},
+ { "Serratia plymuthica","ANDSQFESAALAA"},
+ { "Serratia proteamaculans","ANDSQFESAALAA"},
+ { "Serratia symbiotica","ANDENYALAA"},
+ { "Shewanella amazonensis","ANDDNYALAA"},
+ { "Shewanella ANA-3","ANDDNYALAA"},
+ { "Shewanella baltica","ANDSNYSLAA"},
+ { "Shewanella denitrificans","ANDSNYSLAA"},
+ { "Shewanella frigidimarina","ANDSNYSLAA"},
+ { "Shewanella halifaxensis","ANDSNYSLAA"},
+ { "Shewanella loihica","ANDDNYALAA"},
+ { "Shewanella oneidensis","ANDDNYALAA"},
+ { "Shewanella pealeana","ANDSNYSLAA"},
+ { "Shewanella piezotolerans","ANDDNYSLAA"},
+ { "Shewanella putrefaciens","ANDDNYALAA"},
+ { "Shewanella PV-4","ANDDNYALAA"},
+ { "Shewanella SAR-1","ANDDNYALAA"},
+ { "Shewanella SAR-1, version 2","ANNDNYALAA"},
+ { "Shewanella SAR-2, version 2","ADYGYMAAA"},
+ { "Shewanella sediminis","ANDSNYSLAA"},
+ { "Shewanella sp. ANA-3","ANDDNYALAA"},
+ { "Shewanella sp. MR-4","ANDDNYALAA"},
+ { "Shewanella sp. MR-7","ANDDNYALAA"},
+ { "Shewanella sp. W3-18-1","ANDDNYALAA"},
+ { "Shewanella violacea","ANDSNYSLAA"},
+ { "Shewanella woodyi","ANDDNYALAA"},
+ { "Shigella boydii","ANDENYALAA"},
+ { "Shigella dysenteriae 1","ANDENYALAA"},
+ { "Shigella dysenteriae 2","ANDENYALAA"},
+ { "Shigella flexneri","ANDENYALAA"},
+ { "Shigella sonnei","ANDENYALAA"},
+ { "Shimwellia blattae","ANDENYALAA"},
+ { "Sideroxydans lithotrophicus","ANDEKYALAA"},
+ { "Silicibacter pomeroyi","ANDNRAPVALAA"},
+ { "Silicibacter TM1040","ANDNRAPVALAA"},
+ { "Simiduia agarivorans","ANDDNYGAQLAA"},
+ { "Simkania negevensis","VDTTEDFYLEAA"},
+ { "Sinorhizobium fredii","ANDNYAEARLAA"},
+ { "Sinorhizobium medicae","ANDNYAEARLAA"},
+ { "Sinorhizobium meliloti","ANDNYAEARLAA"},
+ { "Slackia heliotrinireducens","GKSYNTGRMALAA"},
+ { "Sodalis glossinidius","ANDSQFESNAALAA"},
+ { "Solibacillus silvestris","GKQQNFAFAA"},
+ { "Solibacter usitatus","ANTQFAYAA"},
+ { "Solitalea canadensis","GENNYALAA"},
+ { "Sorangium cellulosum","ANDNAYAVAA"},
+ { "Sphaerobacter thermophilus","GNESYALAA"},
+ { "Sphaerochaeta coccoides","AKKEDENVSYDAEYAFAA"},
+ { "Sphaerochaeta globosa","AKKEDEVSFNAEYAFAA"},
+ { "Sphaerochaeta pleomorpha","AKKEDEVSFNAEYALAA"},
+ { "Sphingobacterium sp. 21","GENNYALAA"},
+ { "Sphingobium chlorophenolicum","ANDNEALALAA"},
+ { "Sphingobium japonicum","ANDNEALALAA"},
+ { "Sphingobium sp. SYK-6","ANDNEALALAA"},
+ { "Sphingomonas elodea","ANDNEALAIAA"},
+ { "Sphingomonas wittichii","ANDNEALAIAA"},
+ { "Sphingopyxis alaskensis","ANDNEALALAA"},
+ { "Spirochaeta africana","AKNEDNVVEVAFGNDDTMLAAA"},
+ { "Spirochaeta smaragdinae","ANDADYALAA"},
+ { "Spirochaeta thermophila","ANDELALAA"},
+ { "Spiroplasma kunkelii","ASKKQKEDKIEMPAFMMNNQLAVSMLAA"},
+ { "Spirosoma linguale","GEYNYAMAA"},
+ { "Stackebrandtia nassauensis","AKTESRSSFALAA"},
+ { "Staphylococcus aureus","GKSNNNFAVAA"},
+ { "Staphylococcus carnosus","GKTNNNLAVAA"},
+ { "Staphylococcus epidermidis","DKSNNNFAVAA"},
+ { "Staphylococcus haemolyticus","DKSNNNFAVAA"},
+ { "Staphylococcus lugdunensis","GKSNNNFAVAA"},
+ { "Staphylococcus pseudintermedius","GKTNNNFAVAA"},
+ { "Staphylococcus saprophyticus","GKENNNFAVAA"},
+ { "Staphylococcus xylosus","GKENNNFAVAA"},
+ { "Starkeya novella","ANDNYAPVAQAA"},
+ { "Stenotrophomonas maltophilia","ANDDNYALAA"},
+ { "Stigmatella aurantiaca","DGKDTKANDNVELALAA"},
+ { "Streptobacillus moniliformis","GKNNFALAA"},
+ { "Streptococcus agalactiae","AKNTNSYALAA"},
+ { "Streptococcus bovis","AKNTNSYAVAA"},
+ { "Streptococcus constellatus","AKNNNSYALAA"},
+ { "Streptococcus criceti","AKNTNSYAVAA"},
+ { "Streptococcus dysgalactiae","AKNTNSYALAA"},
+ { "Streptococcus equi","AKNNTTYALAA"},
+ { "Streptococcus gallolyticus","AKNTNSYAVAA"},
+ { "Streptococcus gordonii","AKNNTSYALAA"},
+ { "Streptococcus macedonicus","AKNTNSYAVAA"},
+ { "Streptococcus mitis","AKNNTSYALAA"},
+ { "Streptococcus mutans","AKNTNSYAVAA"},
+ { "Streptococcus oralis","AKNNTSYALAA"},
+ { "Streptococcus parasanguinis","AKNNNSYALAA"},
+ { "Streptococcus parauberis","AKNTNTYALAA"},
+ { "Streptococcus pneumoniae","AKNNTSYALAA"},
+ { "Streptococcus pseudopneumoniae","AKNNTSYALAA"},
+ { "Streptococcus pyogenes","AKNTNSYALAA"},
+ { "Streptococcus salivarius","AQLNITAKNTNSYAVAA"},
+ { "Streptococcus sanguinis","AKNNNSYALAA"},
+ { "Streptococcus sobrinus","AKNTNSYAVAA"},
+ { "Streptococcus suis","AKNTNTYALAA"},
+ { "Streptococcus thermophilus","AKNTNSYAVAA"},
+ { "Streptococcus uberis","AKNTNSYALAA"},
+ { "Streptococcus zooepidemicus","AKNNTTYALAA"},
+ { "Streptomyces aureofaciens","ANSKRDSQQFALAA"},
+ { "Streptomyces avermitilis","ANTKSDSQSFALAA"},
+ { "Streptomyces avermitilus","ANTKSDSQSFALAA"},
+ { "Streptomyces bingchenggensis","ANTKRDSFALAA"},
+ { "Streptomyces cattleya","ANNKRDSFALAA"},
+ { "Streptomyces coelicolor","ANTKRDSSQQAFALAA"},
+ { "Streptomyces collinus","ANTKRDSSSFALAA"},
+ { "Streptomyces flavogriseus","ANSKRDSSAFALAA"},
+ { "Streptomyces griseus","ANSKRDSSAFALAA"},
+ { "Streptomyces hygroscopicus","ANTKRDSFALAA"},
+ { "Streptomyces lividans","ANTKRDSSQQAFALAA"},
+ { "Streptomyces scabiei","ANSKSDSPQQQFSLAA"},
+ { "Streptomyces sp. SirexAA-E","ANTKRDSSAFALAA"},
+ { "Streptomyces thermophilus","AKNTNSYAVAA"},
+ { "Streptomyces venezuelae","ANSKSDNSRFALAA"},
+ { "Streptomyces violaceusniger","ANTKRDSFALAA"},
+ { "Streptosporangium roseum","ANKTHSEVSQGNLALAA"},
+ { "Sulcia muelleri","GKKNYALAA"},
+ { "Sulfuricurvum kujiense","ANNTNYRPAYAVA"},
+ { "Sulfurimonas autotrophica","ANNTNYRPALAVA"},
+ { "Sulfurimonas denitrificans","ANNTNYRPAYAVA"},
+ { "Sulfurospirillum barnesii","ANNSNYRPAYAVA"},
+ { "Sulfurospirillum deleyianum","ANNSNYRPAYALAA"},
+ { "Sulfurovum sp. NBC37-1","ANNTDYRPAYAVA"},
+ { "Synechococcus elongatus","ANNIVPFARKAAPVAA"},
+ { "Synechococcus sp. CC9311","ANNIVRFSRQAAPVAA"},
+ { "Synechococcus sp. CC9605","ANNIVRFSRQAAPVAA"},
+ { "Synechococcus sp. CC9902","ANNIVRFSRQAAPVAA"},
+ { "Synechococcus sp. JA-2-3B'a(2-13)","ANNVVPFARKAAALAA"},
+ { "Synechococcus sp. JA-3-3Ab (version 1)","ANNVVPFARKAAALAA"},
+ { "Synechococcus sp. JA-3-3Ab (version 2)","ANNVVPFARKAAALAA"},
+ { "Synechococcus sp. PCC 6301","ANNIVPFARKAAPVAA"},
+ { "Synechococcus sp. PCC 6307","ANNIVRFSRQAAPVAA"},
+ { "Synechocystis sp. PCC 6803","ANNIVSFKRVAIAA"},
+ { "Synechococcus sp. PCC 6904","ANNIVRFSRQAAPVAA"},
+ { "Synechococcus sp. PCC 7002","ANNIVPFARKAAAVA"},
+ { "Synechococcus sp. PCC 7009","ANNIVRFSRQAAPVAA"},
+ { "Synechococcus sp. RCC307","ANNIVRFSRQAAPVAA"},
+ { "Synechococcus sp. WH 7803","ANNIVRFSRQAAPVAA"},
+ { "Synechococcus sp. WH 8102","ANNIVRFSRHAAPVAA"},
+ { "Syntrophobacter fumaroxidans","ADDYAYAVAA"},
+ { "Syntrophomonas wolfei","AEDNFALAA"},
+ { "Syntrophothermus lipocalidus","ANNELALAA"},
+ { "Syntrophus aciditrophicus","ANDYEYALAA"},
+ { "Tannerella forsythensis","GENNYALAA"},
+ { "Tannerella forsythia","GENNYALAA"},
+ { "Taylorella asinigenitalis","ANDDKFALAA"},
+ { "Taylorella equigenitalis","ANDENFALAA"},
+ { "Tepidanaerobacter acetatoxydans","ANNDLAYAA"},
+ { "Teredinibacter turnerae","ANDDNYGAQLAA"},
+ { "Terriglobus roseus","AEPQFALAA"},
+ { "Terriglobus saanensis","AEPQFALAA"},
+ { "Tetragenococcus halophilus","AKNNNNSYALAA"},
+ { "Thalassiosira pseudonana chloroplast","ANNIMPFMFNVVKTNRSLTTLNFAV"},
+ { "Thalassiosira weissflogii chloroplast","ANNIIPFIFKAVKTKKEAMALNFAV"},
+ { "Thauera sp. MZ1T","ANDERFALAA"},
+ { "Thermacetogenium phaeum","ANNEYALAA"},
+ { "Thermaerobacter marianensis","ANEELALAA"},
+ { "Thermanaerovibrio acidaminovorans","ANDNYALAA"},
+ { "Thermincola potens","AEENYALAA"},
+ { "Thermoanaerobacter italicus","ADRELAYAA"},
+ { "Thermoanaerobacter mathranii","ADRELAYAA"},
+ { "Thermoanaerobacter pseudethanolicus","ADRELAYAA"},
+ { "Thermoanaerobacter sp. X514","ADRELAYAA"},
+ { "Thermoanaerobacter tengcongensis","ADRELAYAA"},
+ { "Thermoanaerobacter wiegelii","ADRELAYAA"},
+ { "Thermoanaerobacterium saccharolyticum","ANDNLAYAA"},
+ { "Thermoanaerobacterium thermosaccharolyticum","ANNDNLAYAA"},
+ { "Thermoanaerobacterium xylanolyticum","ANDNLAYAA"},
+ { "Thermobaculum terrenum","ANTEYALAA"},
+ { "Thermobifida fusca","ANSKRTEFALAA"},
+ { "Thermobispora bispora","ANKKHAEVSQASLALAA"},
+ { "Thermodesulfatator indicus","ADEYNYAMAA"},
+ { "Thermodesulfobacterium commune","ANEYAYALAA"},
+ { "Thermodesulfobacterium geofontis","ADEYSYALAA"},
+ { "Thermodesulfobium narugense","ANNNSLALAA"},
+ { "Thermodesulfovibrio yellowstonii","ANNELALAA"},
+ { "Thermomicrobium roseum","GERELALAA"},
+ { "Thermomonospora curvata","ANKKQSEFALAA"},
+ { "Thermosediminibacter oceani","ANEELALAA"},
+ { "Thermosipho africanus","ANEELALAA"},
+ { "Thermosipho melanesiensis","ANEEIALAA"},
+ { "Thermosynechococcus elongatus","ANNIVPFARKAAAVA"},
+ { "Thermotoga lettingae","ANNELALAA"},
+ { "Thermotoga maritima ","ANEPVAVAA"},
+ { "Thermotoga neapolitana","ANEPVAVAA"},
+ { "Thermotoga petrophila","ANEPVAVAA"},
+ { "Thermotoga sp. RQ2","ANEPVAVAA"},
+ { "Thermotoga thermarum","ANEELALAA"},
+ { "Thermovibrio ammonificans","ADETLALAA"},
+ { "Thermovirga lienii","ANENYALAA"},
+ { "Thermus oshimai","ANKPAYALAA"},
+ { "Thermus scotoductus","ANKPAYALAA"},
+ { "Thermus sp. CCB_US3_UF1","ANKPAYALAA"},
+ { "Thermus thermophilus","ANTNYALAA"},
+ { "Thioalkalimicrobium cyclicum","ANDDNYALAA"},
+ { "Thioalkalivibrio sp. K90mix","ANDDNYALAA"},
+ { "Thiobacillus denitrificans","AKSKAARRNPACSAGVMELKA"},
+ { "Thiocystis violascens","ANDDNYALAA"},
+ { "Thiomicrospira crunogena","ANDDNYALAA"},
+ { "Thiomonas intermedia","ANDSSYALAA"},
+ { "Thiomonas sp. 3As","ANDSSYALAA"},
+ { "Tistrella mobilis","ANDNRVALAA"},
+ { "Tolumonas auensis","ANDETYALAA"},
+ { "Tremblaya princeps 1 (Dysmicoccus)","APSNRFTIVANDCIDALVRRAVV"},
+ { "Treponema azotonutricium","ADNDNYNYALAA"},
+ { "Treponema brennaborense","AEDNRQFALAA"},
+ { "Treponema caldaria","ADNDSYALAA"},
+ { "Treponema denticola","AENNDSFDYALAA"},
+ { "Treponema pallidum","ANSDSFDYALAA"},
+ { "Treponema primitia","ANNDSYAFAA"},
+ { "Treponema succinifaciens","AKRREDEQSENEQFALAA"},
+ { "Trichodesmium erythraeum","ANNIVPFARKQVAALA"},
+ { "Tropheryma whipplei","ANLKRTDLSLAA"},
+ { "Truepera radiovictrix","GNSNSYALAA"},
+ { "Tsukamurella paurometabola","ADSNQRDFALAA"},
+ { "Turneriella parva","AENETYALAA"},
+ { "uncultured bacterium","ANDNFAPVAVAA"},
+ { "Uncultured ciona","ANDEFFDARLRA"},
+ { "Uncultured FS1","ANDETYALAA"},
+ { "Uncultured FS2","ANDENYALAA"},
+ { "Uncultured LEM1","ANDETYALAA"},
+ { "Uncultured LEM2","ANDETHALAA"},
+ { "Uncultured marineEBAC20E09","ANNDNYALAA"},
+ { "Uncultured phakopsora","ANDNSYALAA"},
+ { "Uncultured QL1","ANVENYALAA"},
+ { "Uncultured RCA1","ANDENYALAA"},
+ { "Uncultured RCA2","SNDENYALAA"},
+ { "Uncultured RCA4","ANDETYALAA"},
+ { "Uncultured remanei","ANDESYALAA"},
+ { "Uncultured stronglyoides1","ANDERFALAA"},
+ { "Uncultured U01a","ANDSNYALAA"},
+ { "Uncultured U02","ANDEQFALAA"},
+ { "Uncultured U04","ANDETYALAA"},
+ { "Uncultured VLS13","ANDENYALAA"},
+ { "Uncultured VLS1","ANDENYALAA"},
+ { "Uncultured VLS5","ANDETYALAA"},
+ { "Uncultured VLS6","ANDENYALAA"},
+ { "Uncultured VLS7","ANDENYALAA"},
+ { "Uncultured VLS9","ANDENYALAA"},
+ { "Uncultured VLW1","ANDENYALAA"},
+ { "Uncultured VLW2","ANDENYALAA"},
+ { "Uncultured VLW3","ANDENYALAA"},
+ { "Uncultured VLW5","ANDENYALAA"},
+ { "Uncultured WW10","ANDENYALAV"},
+ { "Uncultured WW11","ANDDNYALAA"},
+ { "Uncultured WW1","ANDENYALAA"},
+ { "Uncultured WW2","ANDENYALAA"},
+ { "Uncultured WW4","ANDGNYALAA"},
+ { "Uncultured WW5","ANDENYALAA"},
+ { "Uncultured WW7","ANDENCALAA"},
+ { "Uncultured WW8","ANDENYALAA"},
+ { "Uncultured WW9","ANDENYALAA"},
+ { "Ureaplasma parvum","AENKKSSEVELNPAFMASATNANYAFAY"},
+ { "Ureaplasma urealyticum","AENKKSSEVELNPAFMASATNANYAFAY"},
+ { "Variovorax paradoxus","ANDERFALAA"},
+ { "Veillonella parvula","AEENFALAA"},
+ { "Verminephrobacter eiseniae","ANDERFALAA"},
+ { "Verrucomicrobium spinosum","ANSNELALAA"},
+ { "Verrucosispora maris","AKHNRADFALAA"},
+ { "Vesicomyosocius okutanii","ENENNYALAA"},
+ { "Vibrio anguillarum","ANDENYALAA"},
+ { "Vibrio campbellii","ANDENYALAA"},
+ { "Vibrio cholerae","ANDENYALAA"},
+ { "Vibrio Ex25","ANDENYALAA"},
+ { "Vibrio fischeri","ANDENYALAA"},
+ { "Vibrio furnissii","ANDENYALAA"},
+ { "Vibrio parahaemolyticus","ANDENYALAA"},
+ { "Vibrio parahemolyticus","ANDENYALAA"},
+ { "Vibrio sp. EJY3","ANDENYALAA"},
+ { "Vibrio sp. Ex25","ANDENYALAA"},
+ { "Vibrio splendidus","ANDENYALAA"},
+ { "Vibrio vulnificus","ANDENYALAA"},
+ { "Waddlia chondrophila","ADLDLATAAVAA"},
+ { "Weeksella virosa","GNEEYALAA"},
+ { "Weissella koreensis","AKNSNNLAFAA"},
+ { "Wigglesworthia brevipalpis","AKHKYNEPALLAA"},
+ { "Wigglesworthia glossinidia","AKHKYNEPALLAA"},
+ { "Wolbachi.sp","ANDNFAAEDNVDAIAA"},
+ { "Wolbachia endosymbiont","ANDNFAAEEYRVAA"},
+ { "Wolbachia sp. 2 (Brugi)","ANDNFAAEGDVAVAA"},
+ { "Wolbachia sp. 3 (Culex)","ANDNFAAEDNVALAA"},
+ { "Wolbachia sp. 4 (Dros.)","ANDNFAAEEYRVAA"},
+ { "Wolinella succinogenes","ALSSHPKRGKRLGLPITSALGA"},
+ { "Xanthobacter autotrophicus","ANDNYAPVAQAA"},
+ { "Xanthomonas albilineans","ANDDNYALAA"},
+ { "Xanthomonas axonopodis","ANDDNYGSDFAIAA"},
+ { "Xanthomonas campestris 1","ANDDNYGSDFAIAA"},
+ { "Xanthomonas campestris 2","ANDDNYGSDSAIAA"},
+ { "Xanthomonas oryzae","ANDDNYGSDFAIAA"},
+ { "Xenorhabdus bovienii","ANDENYALAA"},
+ { "Xenorhabdus nematophila","ANDENYALAA"},
+ { "Xylanimonas cellulosilytica","ADNTRNDFALAA"},
+ { "Xylella fastidiosa 1","ANEDNFAVAA"},
+ { "Xylella fastidiosa 2","ANEDNFALAA"},
+ { "Xylella fastidiosa 3","ANEDNFAIAA"},
+ { "Xylella fastidiosa 4","ANEDNFALAA"},
+ { "Yersinia bercovieri","ANDSQYESAALAA"},
+ { "Yersinia enterocolitica","ANDSQYESAALAA"},
+ { "Yersinia frederiksenii","ANDENYALAA"},
+ { "Yersinia intermedia","ANDSQYESAALAA"},
+ { "Yersinia mollaretii","ANDSQYESAALAA"},
+ { "Yersinia pestis","ANDENYALAA"},
+ { "Yersinia pseudotuberculosis","ANDENYALAA"},
+ { "Zobellia galactanivorans","GENNYALAA"},
+ { "Zunongwangia profunda","GENNYALAA"} };
+
/* TOOLS */
@@ -1668,12 +3056,18 @@ typedef struct { FILE *f;
char upcasec(char c)
{ return((c >= 'a')?c-32:c); }
-
int length(char *s)
-{ int i=0;
+{ int i = 0;
while (*s++) i++;
return(i); }
+char *softmatch(char *s, char *key)
+{ while (upcasec(*key) == upcasec(*s))
+ { if (!*key++) return(s);
+ s++; }
+ if (*key) return(NULL);
+ return(s); }
+
char *strpos(char *s, char *k)
{ char c,d;
int i;
@@ -1698,6 +3092,17 @@ char *softstrpos(char *s, char *k)
s++; }
return(NULL); }
+char *wildstrpos(char *s, char *k)
+{ char c,d;
+ int i;
+ d = upcasec(*k);
+ while (c = *s)
+ { if ((upcasec(c) == d) || (d == '*'))
+ { i = 0;
+ do if (!k[++i]) return(s);
+ while ((upcasec(s[i]) == upcasec(k[i])) || (k[i] == '*')); }
+ s++; }
+ return(NULL); }
char *marginstring(char *s, char *k, int margin)
{ char c,d;
@@ -1726,8 +3131,29 @@ int margindetect(char *line, int margin)
if (c == '\r') return(0);
if (c == '\0') return(0);
return(1); }
-
+
+char *backword(char *line, char *s, int n)
+{
+int spzone;
+if (space(*s))
+ { spzone = 1; }
+else
+ { spzone = 0;
+ n++; }
+while (s > line)
+ { if (space(*s))
+ { if (spzone == 0)
+ { spzone = 1;
+ if (--n <= 0)
+ return(++s); }}
+ else spzone = 0;
+ s--; }
+if (!space(*s))
+ if (n <= 1) return(s);
+return(NULL);
+}
+
char *dconvert(char *s, double *r)
{ static char zero='0',nine='9';
int shift,expshift,sgn,expsgn,exponent;
@@ -1802,6 +3228,7 @@ char *lconvert(char *s, long *r)
char *getlong(char *line, long *l)
{ static char zero='0',nine='9';
char c1,c2,*s;
+ if (!line) return(NULL);
s = line;
while (c1 = *s)
{ if (c1 >= zero)
@@ -1822,6 +3249,21 @@ char *copy(char *from, char *to)
return(--to); }
+char *copy2sp(char *from1, char *from2, char *to, int n)
+{
+char *s;
+s = to;
+while (from1 < from2)
+ { *s++ = *from1++;
+ if (--n <= 0)
+ { do if (--s <= to) break;
+ while (!space(*s));
+ break; }}
+*s = '\0';
+return(s);
+}
+
+
char *copy3cr(char *from, char *to, int n)
{ while (*to = *from++)
{ if (*to == DLIM)
@@ -1838,60 +3280,151 @@ char *quotestring(char *line, char *a, int n)
while (ch = *line++)
if (ch == '"')
{ while (ch = *line++)
- if (ch != '"')
- { *a++ = ch;
- if (--n <= 0) break; }
- else break;
+ { if (ch == '"') break;
+ if (ch == ';') break;
+ if (ch == '\n') break;
+ if (ch == '\r') break;
+ *a++ = ch;
+ if (--n <= 0) break; }
break; }
*a = '\0';
return(a); }
+
/* LIBRARY */
+
+int fseekd(data_set *d, long fpos, long foffset)
+{
+if (d->bugmode)
+ { fpos += foffset;
+ if (fpos < 0L) fpos = 0L;
+ if (fseek(d->f,0L,SEEK_SET)) return(EOF);
+ d->filepointer = -1L;
+ while (++d->filepointer < fpos)
+ if (getc(d->f) == EOF) return(EOF);
+ return(0); }
+if (fseek(d->f,fpos,SEEK_SET)) return(EOF);
+d->filepointer = fpos;
+if (foffset != 0L)
+ { if ((fpos + foffset) < 0L) foffset = -fpos;
+ if (fseek(d->f,foffset,SEEK_CUR)) return(EOF);
+ d->filepointer += foffset; }
+return(0);
+}
+
+
+long ftelld(data_set *d)
+{
+if (d->bugmode) return(d->filepointer);
+else return(ftell(d->f));
+}
+
+
+char fgetcd(data_set *d)
+{
+int ic;
+if ((ic = getc(d->f)) == EOF) return(NOCHAR);
+d->filepointer++;
+return((char)ic);
+}
+
+
+char *fgetsd(data_set *d, char line[], int len)
+{
+int i,ic;
+i = 0;
+while (i < len)
+ { if ((ic = getc(d->f)) == EOF) break;
+ d->filepointer++;
+ if (ic == '\r') continue;
+ if (ic == '\n')
+ { line[i++] = DLIM;
+ break; }
+ line[i++] = (char)ic; }
+if (i < 1) return(NULL);
+line[i] = '\0';
+return(line);
+}
+
+int agene_position_check(data_set *d, int nagene, annotated_gene *agene)
+{
+int a;
+long l,swap;
+if ((agene->stop - agene->start) > MAXAGENELEN)
+ { swap = agene->stop;
+ agene->stop = agene->start;
+ agene->start = swap;
+ agene->stop += d->aseqlen; }
+if (agene->start > agene->stop) agene->stop += d->aseqlen;
+l = agene->stop - agene->start;
+if ((l < 1) || (l > MAXAGENELEN)) return(0);
+if (agene->stop == d->aseqlen)
+ { for (a = 0; a < nagene; a++)
+ if (d->gene[a].start == agene->start)
+ if (d->gene[a].genetype == agene->genetype)
+ if (softmatch(d->gene[a].species,agene->species))
+ return(0); }
+return(1);
+}
+
+
+
long process_sequence_heading(data_set *d, csw *sw)
{ int i,ic,nagene;
long l,realstart;
- char line[STRLEN],c,*s,*sq;
- annotated_gene *agene;
- FILE *f;
- f = d->f;
+ char line[STRLEN],c,*s,*sq,*sd;
+ annotated_gene *agene,tmpagene;
d->datatype = FASTA;
- fseek(f,d->seqstart,SEEK_SET);
- do { if ((ic = getc(f)) == EOF) return(-1L);
- c = (char)ic; }
+ fseekd(d,d->seqstart,d->seqstartoff);
+ HEADING:
+ do if ((c = fgetcd(d)) == NOCHAR) return(-1L);
while (space(c));
- if (!fgets(d->seqname,STRLENM1,f)) return(-1L);
+ if (c == '#')
+ { if (!fgetsd(d,line,STRLENM1)) return(-1L);
+ goto HEADING; }
+ if (!fgetsd(d,d->seqname,STRLENM1)) return(-1L);
if (c != '>')
- { if (upcasec(c) != 'L') goto FNSN;
- if (!(s = softstrpos(d->seqname,"OCUS"))) goto FNSN;
+ { s = d->seqname;
+ if (upcasec(c) != 'L')
+ { do if (!(c = *s++)) goto FNSN;
+ while (upcasec(c) != 'L'); }
+ if (!(s = softmatch(s,"OCUS"))) goto FNSN;
+ if (sd = softstrpos(d->seqname,"BP"))
+ { sd = backword(d->seqname,sd,1);
+ if (sd = getlong(sd,&l)) d->aseqlen = l; }
s += 4;
while (space(*s)) s++;
sq = d->seqname;
while (!space(*s)) *sq++ = *s++;
- *sq++ = ' ';
- if (!fgets(line,STRLENM1,f)) return(-1L);
- if (!(s = softstrpos(line,"DEFINITION"))) return(-1L);
- s += 10;
- while (space(*s)) s++;
- copy(s,sq);
- if (!fgets(line,STRLENM1,f)) return(-1L);
+ d->aseqlen = 0L;
+ if (!fgetsd(d,line,STRLENM1)) return(-2L);
+ if (sd = softstrpos(line,"DEFINITION"))
+ { sd += 10;
+ while (space(*sd)) sd++;
+ *sq++ = ' ';
+ copy(sd,sq);
+ if (!fgetsd(d,line,STRLENM1)) return(-2L); }
+ else copy(s,sq);
for (i = 0; i < NS; i++) d->nagene[i] = 0;
nagene = 0;
while (!marginstring(line,"ORIGIN",10))
{ if (nagene >= NGFT) goto GBNL;
- if (!(s = marginstring(line,"tRNA",10))) goto CDSEQ;
agene = &(d->gene[nagene]);
+ agene->comp = 0;
+ agene->start = -1L;
+ agene->stop = -1L;
+ agene->antistart = -1L;
+ agene->antistop = -1L;
+ agene->permuted = 0;
+ agene->pseudogene = 0;
+ if (!(s = marginstring(line,"tRNA",10))) goto TMRNASEQ;
agene->genetype = tRNA;
if (softstrpos(s,"complement")) agene->comp = 1;
- else agene->comp = 0;
- if (!(s = getlong(s,&l))) l = -1L;
- agene->start = l;
- if (!(s = getlong(s,&l))) l = -1L;
- agene->stop = l;
+ if (s = getlong(s,&l)) agene->start = l;
+ if (s = getlong(s,&l)) agene->stop = l;
copy("tRNA-???",agene->species);
- agene->antistart = -1L;
- agene->antistop = -1L;
- if (!fgets(line,STRLENM1,f)) return(-1L);
+ if (!fgetsd(d,line,STRLENM1)) return(-2L);
while (!margindetect(line,10))
{ if (s = softstrpos(line,"product="))
if (s = softstrpos(s,"tRNA-"))
@@ -1904,24 +3437,76 @@ long process_sequence_heading(data_set *d, csw *sw)
agene->antistart = l;
if (!(s = getlong(s,&l))) l = -1L;
agene->antistop = l; }
- if (!fgets(line,STRLENM1,f)) return(-1L); }
- d->nagene[tRNA]++;
+ if (softstrpos(line,"/pseudo")) agene->pseudogene = 1;
+ if (!fgetsd(d,line,STRLENM1)) return(-2L); }
+ if (agene_position_check(d,nagene,agene))
+ { d->nagene[tRNA]++;
+ nagene++; }
+ continue;
+ TMRNASEQ:
+ if (!(s = marginstring(line,"tmRNA",10))) goto CDSEQ;
+ agene->genetype = tmRNA;
+ if (softstrpos(s,"complement")) agene->comp = 1;
+ if (s = getlong(s,&l)) agene->start = l;
+ if (s = getlong(s,&l)) agene->stop = l;
+ copy("tmRNA",agene->species);
+ if (!agene_position_check(d,nagene,agene)) goto GBNL;
+ d->nagene[tmRNA]++;
nagene++;
+ if (!fgetsd(d,line,STRLENM1)) return(-2L);
+ while (!margindetect(line,10))
+ { if (softstrpos(line,"acceptor")) agene->permuted = 1;
+ if (softstrpos(line,"/pseudo")) agene->pseudogene = 1;
+ if (!fgetsd(d,line,STRLENM1)) return(-2L); }
+ if (s = marginstring(line,"tmRNA",10))
+ { tmpagene.comp = 0;
+ tmpagene.start = -1L;
+ tmpagene.stop = -1L;
+ tmpagene.antistart = -1L;
+ tmpagene.antistop = -1L;
+ tmpagene.permuted = 0;
+ tmpagene.pseudogene = 0;
+ if (softstrpos(s,"complement")) tmpagene.comp = 1;
+ if (s = getlong(s,&l)) tmpagene.start = l;
+ if (s = getlong(s,&l)) tmpagene.stop = l;
+ if (!fgetsd(d,line,STRLENM1)) return(-2L);
+ while (!margindetect(line,10))
+ { if (softstrpos(line,"coding")) tmpagene.permuted = 1;
+ if (softstrpos(line,"/pseudo")) tmpagene.pseudogene = 1;
+ if (s = softstrpos(line,"/tag_peptide"))
+ { if (s = getlong(s,&l)) tmpagene.antistart = l;
+ if (s = getlong(s,&l)) tmpagene.antistop = l; }
+ if (!fgetsd(d,line,STRLENM1)) return(-2L); }
+ if (agene->permuted && tmpagene.permuted)
+ { agene->stop = tmpagene.stop;
+ agene->antistart = tmpagene.antistart;
+ agene->antistop = tmpagene.antistop;
+ copy("tmRNA(Perm)",agene->species); }
+ else
+ { if (nagene >= NGFT) goto GBNL;
+ agene = &(d->gene[nagene]);
+ agene->comp = tmpagene.comp;
+ agene->start = tmpagene.start;
+ agene->stop = tmpagene.stop;
+ agene->antistart = -1L;
+ agene->antistop = -1L;
+ agene->permuted = 0;
+ agene->pseudogene = tmpagene.pseudogene;
+ copy("tmRNA",agene->species);
+ if (agene_position_check(d,nagene,agene))
+ { d->nagene[tmRNA]++;
+ nagene++; }}}
continue;
CDSEQ:
if (!(s = marginstring(line,"CDS",10)))
if (!(s = marginstring(line,"mRNA",10)))
goto RRNA;
- agene = &(d->gene[nagene]);
agene->genetype = CDS;
if (softstrpos(s,"complement")) agene->comp = 1;
- else agene->comp = 0;
- if (!(s = getlong(s,&l))) l = -1L;
- agene->start = l;
- if (!(s = getlong(s,&l))) l = -1L;
- agene->stop = l;
+ if (s = getlong(s,&l)) agene->start = l;
+ if (s = getlong(s,&l)) agene->stop = l;
copy("???",agene->species);
- if (!fgets(line,STRLENM1,f)) return(-1L);
+ if (!fgetsd(d,line,STRLENM1)) return(-2L);
while (!margindetect(line,10))
{ if (s = softstrpos(line,"gene="))
{ s += 5;
@@ -1929,22 +3514,20 @@ long process_sequence_heading(data_set *d, csw *sw)
else if (s = softstrpos(line,"product="))
{ s += 8;
quotestring(s,agene->species,SHORTSTRLENM1); }
- if (!fgets(line,STRLENM1,f)) return(-1L); }
- d->nagene[CDS]++;
- nagene++;
+ if (softstrpos(line,"/pseudo")) agene->pseudogene = 1;
+ if (!fgetsd(d,line,STRLENM1)) return(-2L); }
+ if (agene_position_check(d,nagene,agene))
+ { d->nagene[CDS]++;
+ nagene++; }
continue;
RRNA:
if (!(s = marginstring(line,"rRNA",10))) goto GBNL;
- agene = &(d->gene[nagene]);
agene->genetype = rRNA;
if (softstrpos(s,"complement")) agene->comp = 1;
- else agene->comp = 0;
- if (!(s = getlong(s,&l))) l = -1L;
- agene->start = l;
- if (!(s = getlong(s,&l))) l = -1L;
- agene->stop = l;
+ if (s = getlong(s,&l)) agene->start = l;
+ if (s = getlong(s,&l)) agene->stop = l;
copy("???",agene->species);
- if (!fgets(line,STRLENM1,f)) return(-1L);
+ if (!fgetsd(d,line,STRLENM1)) return(-2L);
while (!margindetect(line,10))
{ if (s = softstrpos(line,"gene="))
{ s += 5;
@@ -1952,26 +3535,27 @@ long process_sequence_heading(data_set *d, csw *sw)
else if (s = softstrpos(line,"product="))
{ s += 8;
quotestring(s,agene->species,SHORTSTRLENM1); }
- if (!fgets(line,STRLENM1,f)) return(-1L); }
- d->nagene[rRNA]++;
- nagene++;
+ if (softstrpos(line,"/pseudo")) agene->pseudogene = 1;
+ if (!fgetsd(d,line,STRLENM1)) return(-2L); }
+ if (agene_position_check(d,nagene,agene))
+ { d->nagene[rRNA]++;
+ nagene++; }
continue;
GBNL:
- if (!fgets(line,STRLENM1,f)) return(-1L); }
+ if (!fgetsd(d,line,STRLENM1)) return(-2L); }
d->datatype = GENBANK;
d->nagene[NS-1] = nagene;
sw->annotated = 1;
- realstart = ftell(f); }
+ realstart = ftelld(d); }
else
{ MH:
- realstart = ftell(f);
- do { if ((ic = getc(f)) == EOF) return(-1L);
- c = (char)ic; }
+ realstart = ftelld(d);
+ do if ((c = fgetcd(d)) == NOCHAR) return(-3L);
while (space(c));
if (c == '>')
- { if (!fgets(line,STRLENM1,f)) return(-1L);
+ { if (!fgetsd(d,line,STRLENM1)) return(-3L);
goto MH; }
- fseek(f,realstart,SEEK_SET); }
+ fseekd(d,realstart,0L); }
s = d->seqname;
i = 0;
while ((c = *s) != '\0')
@@ -1982,11 +3566,11 @@ long process_sequence_heading(data_set *d, csw *sw)
*s = '\0';
return(realstart);
FNSN:
- realstart = d->seqstart;
s = copy("Unnamed sequence ",d->seqname);
- fseek(f,realstart,SEEK_SET);
- if (fgets(line,STRLENM1,f)) copy3cr(line,s,50);
- fseek(f,realstart,SEEK_SET);
+ fseekd(d,d->seqstart,d->seqstartoff);
+ realstart = ftelld(d);
+ if (fgetsd(d,line,STRLENM1)) copy3cr(line,s,50);
+ fseekd(d,realstart,0L);
return(realstart); }
@@ -2012,10 +3596,10 @@ int move_forward(data_set *d)
-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4 };
if (d->ps >= d->psmax)
if (d->psmax > 0L)
- { fseek(d->f,d->seqstart,SEEK_SET);
+ { fseekd(d,d->seqstart,d->seqstartoff);
d->ps = 0L; }
NL:
- if ((ic = getc(d->f)) == EOF) goto FAIL;
+ if ((ic = (int)fgetcd(d)) == NOCHAR) goto FAIL;
SC:
ic = map[ic];
BS:
@@ -2023,42 +3607,63 @@ int move_forward(data_set *d)
{ d->ps++;
return(ic); }
if (ic == -2)
- { d->nextseq = ftell(d->f) - 1L;
+ { d->nextseq = ftelld(d);
+ d->nextseqoff = -1L;
return(TERM); }
if (ic == -3)
if (d->datatype == GENBANK)
- { if ((ic = getc(d->f)) == EOF) goto FAIL;
+ { if ((ic = (int)fgetcd(d)) == NOCHAR) goto FAIL;
if ((ic = map[ic]) != -3) goto BS;
- do if ((ic = getc(d->f)) == EOF) goto FAIL;
+ do if ((ic = (int)fgetcd(d)) == NOCHAR) goto FAIL;
while (space(ic));
- d->nextseq = ftell(d->f) - 1L;
+ d->nextseq = ftelld(d);
+ d->nextseqoff = -1L;
return(TERM); }
if (ic == -5)
- { nextbase = ftell(d->f);
- if ((ic = getc(d->f)) == EOF) goto FAIL;
+ { nextbase = ftelld(d);
+ if ((ic = (int)fgetcd(d)) == NOCHAR) goto FAIL;
if (upcasec(ic) == 'O')
- { if ((ic = getc(d->f)) == EOF) goto FAIL;
+ { if ((ic = (int)fgetcd(d)) == NOCHAR) goto FAIL;
if (upcasec(ic) == 'C')
- { if ((ic = getc(d->f)) == EOF) goto FAIL;
+ { if ((ic = (int)fgetcd(d)) == NOCHAR) goto FAIL;
if (upcasec(ic) == 'U')
- { if ((ic = getc(d->f)) == EOF) goto FAIL;
+ { if ((ic = (int)fgetcd(d)) == NOCHAR) goto FAIL;
if (upcasec(ic) == 'S')
- { d->nextseq = nextbase - 1L;
+ { d->nextseq = nextbase;
+ d->nextseqoff = -1L;
return(TERM); }}}}
- fseek(d->f,nextbase,SEEK_SET); }
+ fseekd(d,nextbase,0L); }
goto NL;
FAIL:
d->nextseq = -1L;
+ d->nextseqoff = 0L;
if (d->psmax > 0L)
{ d->ps = d->psmax;
return(NOBASE); }
else return(TERM); }
+
+char cbase(int c)
+{ static char base[7] = "acgt..";
+ if (c < Adenine) return('#');
+ if (c > NOBASE) return((char)c);
+ return(base[c]); }
+
+
+
+
int seq_init(data_set *d, csw *sw)
{ long ngc;
int ic;
- if ((d->seqstart = process_sequence_heading(d,sw)) < 0L) return(0);
+ d->filepointer = 0;
+ if ((d->seqstart = process_sequence_heading(d,sw)) < 0L)
+ { if (d->seqstart == -2L)
+ fprintf(stderr,"ERROR - unable to read Genbank sequence %s\n",d->seqname);
+ else if (d->seqstart == -2L)
+ fprintf(stderr,"ERROR - unable to read fasta sequence %s\n",d->seqname);
+ return(0); }
+ d->seqstartoff = 0L;
d->ps = 0L;
d->psmax = -1L;
ngc = 0L;
@@ -2068,21 +3673,13 @@ int seq_init(data_set *d, csw *sw)
ngc++;
if ((d->psmax = d->ps) <= 0L) return(0);
d->gc = (double)ngc/(double)d->psmax;
- fseek(d->f,d->seqstart,SEEK_SET);
+ fseekd(d,d->seqstart,d->seqstartoff);
d->ps = 0L;
return(1); }
-
-char cbase(int c)
-{ static char base[6] = "acgt..";
- if (c < Adenine) return('#');
- if (c > NOBASE) return((char)c);
- return(base[c]); }
-
-
char cpbase(int c)
-{ static char base[6] = "ACGT..";
+{ static char base[7] = "ACGT..";
if (c < Adenine) return('#');
if (c > NOBASE) return((char)c);
return(base[c]); }
@@ -2124,6 +3721,24 @@ char ptranslate(int *codon, csw *sw)
return(aapolarity[aamap[sw->geneticcode][((3-p3)<<4)+((3-p2)<<2)+(3-p1)]]); }
+int seqlen(gene *t)
+{
+return(t->nbase + t->nintron);
+}
+
+
+int aseqlen(data_set *d, annotated_gene *a)
+{
+int alen;
+long astart,astop;
+astart = a->start;
+astop = a->stop;
+if (astart > astop) astop += d->psmax;
+alen = (int)(astop - astart) + 1;
+return(alen);
+}
+
+
double gc_content(gene *t)
{ int *s,*se;
double ngc;
@@ -2168,8 +3783,8 @@ int find_var_hairpin(gene *t)
e = 0;
sb = t->seq + t->astem1 + t->spacer1 + 2*t->dstem + t->dloop +
t->spacer2 + 2*t->cstem + t->cloop + t->nintron;
- sc = sb + 3; /* 4 */
- se = sb + t->var - 2; /* 3 */
+ sc = sb + 3;
+ se = sb + t->var - 2;
sf = se - 2;
te[0] = A[*se];
te[1] = C[*se];
@@ -2688,7 +4303,7 @@ int *make_var(int *seq, char matrix[][MATY],
stem = varbp & 0x1f;
e = stem + ((varbp >> 5) & 0x1f);
p = var - e;
- if (p < 1) goto NBP; /* 2 */
+ if (p < 1) goto NBP;
if (p > 4) goto NBP;
pxf = px + 2*ux[orient] + 3*vx[orient];
pyf = py + 2*uy[orient] + 3*vy[orient];
@@ -2913,482 +4528,6 @@ void xcopy(char m[][MATY], int x, int y, char *s, int l)
int identify_tag(char tag[], int len, char (*thit)[50], int nt)
{ int i,n;
char *s,*st,*sb,*sd;
- static struct { char name[50]; char tag[50]; } tagdatabase[NTAG] =
- { { "Cyanidioschyzon merolae Chloroplast","ANQILPFSIPVKHLAV" },
- { "Mesostigma viride chloroplast","ANNILPFNRKTAVAV" },
- { "Nephroselmis olivacea chloroplast","TTYHSCLEGHLS" },
- { "Pirellula sp.","AEENFALAA" },
- { "Rhodopirellula baltica","AEENFALAA" },
- { "Desulfotalea psychrophila","ADDYNYAVAA" },
- { "Desulfuromonas acetoxidans","ADTDVSYALAA" },
- { "Exiguobacterium sp.","GKTNTQLAAA" },
- { "Mycoplasma gallisepticum","DKTSKELADENFVLNQLASNNYALNF" },
- { "Aquifex aeolicus","APEAELALAA" },
- { "Thermotoga maritima ","ANEPVAVAA" },
- { "Thermotoga neapolitana","ANEPVAVAA" },
- { "Chloroflexus aurantiacus","ANTNTRAQARLALAA" },
- { "Thermus thermophilus","ANTNYALAA" },
- { "Deinococcus radiodurans","GNQNYALAA" },
- { "Deinococcus geothermalis","GNQNYALAA" },
- { "Cytophaga hutchinsonii","GEESYAMAA" },
- { "Bacteroides fragilis","GETNYALAA" },
- { "Tannerella forsythensis","GENNYALAA" },
- { "Porphyromonas gingivalis","GENNYALAA" },
- { "Prevotella intermedia","GENNYALAA" },
- { "Chlorobium tepidum","ADDYSYAMAA" },
- { "Chlorobium chlorochromatii","ADDYSYAMAA" },
- { "Salinibacter ruber","ADDYSYAMAA" },
- { "Gemmata obscuriglobus","AEPQYSLAA" },
- { "Chlammydophila pneumoniae","AEPKAECEIISLFDSVEERLAA" },
- { "Chlammydophila caviae","AEPKAECEIISFSDLTEERLAA" },
- { "Chlammydophila abortus","AEPKAKCEIISFSELSEQRLAA" },
- { "Chlammydia trachomatis","AEPKAECEIISFADLEDLRVAA" },
- { "Chlammydia muridarum","AEPKAECEIISFADLNDLRVAA" },
- { "Nostoc PCC7120","ANNIVKFARKDALVAA" },
- { "Nostoc punctiforme","ANNIVNFARKDALVAA" },
- { "Fremyella diplosiphon","ANNIVKFARKEALVAA" },
- { "Plectonema boryanum","ANNIVPFARKTAPVAA" },
- { "Trichodesmium erythraeum","ANNIVPFARKQVAALA" },
- { "Oscillatoria 6304","ANNIVPFARKAAPVAA" },
- { "Chroococcidiopsis PCC6712","ANNIVKFERQAVFA" },
- { "Synechocystis PCC6803","ANNIVSFKRVAIAA" },
- { "Thermosynechococcus elongatus","ANNIVPFARKAAAVA" },
- { "Synechococcus PCC6301","ANNIVPFARKAAPVAA" },
- { "Synechococcus elongatus","ANNIVPFARKAAPVAA" },
- { "Synechococcus WH8102","ANNIVRFSRHAAPVAA" },
- { "Synechococcus PCC6307","ANNIVRFSRQAAPVAA" },
- { "Synechococcus PCC7002","ANNIVPFARKAAAVA" },
- { "Synechococcus PCC7009","ANNIVRFSRQAAPVAA" },
- { "Synechococcus PCC6904","ANNIVRFSRQAAPVAA" },
- { "Synechococcus CC9311","ANNIVRFSRQAAPVAA" },
- { "Synechococcus CC9902","ANNIVRFSRQAAPVAA" },
- { "Synechococcus CC9605","ANNIVRFSRQAAPVAA" },
- { "Prochlorococcus marinus 1","ANKIVSFSRQTAPVAA" },
- { "Prochlorococcus marinus 2","ANNIVRFSRQPALVAA" },
- { "Prochlorococcus marinus 3","ANKIVSFSRQTAPVAA" },
- { "Cyanophora paradoxa chloroplast","ATNIVRFNRKAAFAV" },
- { "Thalassiosira weissflogii chloroplast","ANNIIPFIFKAVKTKKEAMALNFAV" },
- { "Odontella sinensis chloroplast","ANNLISSVFKSLSTKQNSLNLSFAV" },
- { "Bolidomonas pacifica chloroplast","ANNILAFNRKSLSFA" },
- { "Pavlova lutheri chloroplast","ANNILSFNRVAVA" },
- { "Porphyra purpurea chloroplast","AENNIIAFSRKLAVA" },
- { "Guillardia theta chloroplast","ASNIVSFSSKRLVSFA" },
- { "Fibrobacter succinogenes","ADENYALAA" },
- { "Treponema pallidum","ANSDSFDYALAA" },
- { "Treponema denticola","AENNDSFDYALAA" },
- { "Leptospira interrogans","ANNELALAA" },
- { "Borrelia burgdorferi","AKNNNFTSSNLVMAA" },
- { "Borrelia garinii","AKNNNFTSSNLVMAA" },
- { "Caulobacter crescentus","ANDNFAEEFAVAA" },
- { "Rhodobacter sphaeroides","ANDNRAPVALAA" },
- { "Silicibacter pomeroyi","ANDNRAPVALAA" },
- { "Silicibacter TM1040","ANDNRAPVALAA" },
- { "Paracoccus denitrificans","ANDNRAPVALAA" },
- { "Nitrobacter hamburgensis","ANDNYAPVAQAA" },
- { "Nitrobacter winogradskyi","ANDNYAPVAQAA" },
- { "Nitrobacter Nb-311A","ANDNYAPVAQAA" },
- { "Rhodopseudomonas palustris","ANDNYAPVAQAA" },
- { "Rhodopseudomonas palustris 4","ANDNVRMNEVRLAA" },
- { "Bradyrhizobium japonicum","ANDNFAPVAQAA" },
- { "Agrobacterium tumefaciens 1","ANDNNAKEYALAA" },
- { "Agrobacterium tumefaciens 2","ANDNNAKECALAA" },
- { "Rhizobium leguminosarum","ANDNYAEARLAA" },
- { "Sinorhizobium meliloti","ANDNYAEARLAA" },
- { "Mesorhizobium loti","ANDNYAEARLAA" },
- { "Mesorhizobium sp.","ANDNYAEARLAA" },
- { "Bartonella henselae","ANDNYAEARLAA" },
- { "Bartonella quintana","ANDNYAEARLAA" },
- { "Brucella melitensis","ANDNNAQGYALAA" },
- { "Brucella abortus","ANDNNAQGYALAA" },
- { "Brucella suis","ANDNNAQGYALAA" },
- { "Methylobacterium extorquens","ANDNFAPVAVAA" },
- { "Magnetospirillum magnetotacticum 1","ANDNFAPVAVAA" },
- { "Magnetospirillum magnetotacticum 2","ANDNVELAAAA" },
- { "Rhodospirillum rubrum","ANDNVELAAAA" },
- { "Novosphingobium aromaticivorans","ANDNEALALAA" },
- { "Sphingopyxis alaskensis","ANDNEALALAA" },
- { "Erythrobacter litoralis","ANDNEALALAA" },
- { "Ehrlichia chaffeensis","ANDNFVFANDNNSSANLVAA" },
- { "Anaplasma phagocytophilum","ANDDFVAANDNVETAFVAAA" },
- { "Wolbachi.sp","ANDNFAAEDNVDAIAA" },
- { "Rickettsia conorii","ANDNNRSVGHLALAA" },
- { "Rickettsia sibirica","ANDNNRSVGHLALAA" },
- { "Rickettsia typhi","ANDNKRYVGVAALAAA" },
- { "Rickettsia prowazekii","ANDNRYVGVPALAAA" },
- { "Neisseria gonorrhoeae","ANDETYALAA" },
- { "Neisseria meningitidis","ANDETYALAA" },
- { "Neisseria lactamica","ANDETYALAA" },
- { "Chromobacterium violaceum","ANDETYALAA" },
- { "Uncultured U02","ANDEQFALAA" },
- { "Nitrosomonas europaea","ANDENYALAA" },
- { "Nitrosomonas cryotolerans","ANDENYALAA" },
- { "Methylobacillus glycogenes","ANDETYALAA" },
- { "Methylobacillus flagellatus","ANDETYALAA" },
- { "Moraxella catarrhalis","ANDETYALAA" },
- { "Uncultured U04","ANDETYALAA" },
- { "Ralstonia pickettii","ANDERYALAA" },
- { "Ralstonia solanacearum","ANDNRYQLAA" },
- { "Ralstonia eutropha","ANDERYALAA" },
- { "Ralstonia metallidurans","ANDERYALAA" },
- { "Alcaligenes faecalis","ANDERFALAA" },
- { "Comamonas testosteroni","ANDERFALAA" },
- { "Variovorax paradoxus","ANDERFALAA" },
- { "Hydrogenophaga palleronii","ANDERFALAA" },
- { "Burkholderia pseudomallei","ANDDTFALAA" },
- { "Burkholderia mallei","ANDDTFALAA" },
- { "Burkholderia fungorum","ANDDTFALAA" },
- { "Burkholderia cepacia","ANDDTFALAA" },
- { "Burkholderia cenocepacia","ANDDTFALAA" },
- { "Burkholderia thailandensis","ANDDTFALAA" },
- { "Burkholderia vietnamiensis","ANDDTFALAA" },
- { "Burkholderia sp. 383","ANDDTFALAA" },
- { "Bordetella avium","ANDERFALAA" },
- { "Bordetella pertussis","ANDERFALAA" },
- { "Bordetella parapertussis","ANDERFALAA" },
- { "Bordetella bronchiseptica","ANDERFALAA" },
- { "Polaromonas JS666","ANDERFALAA" },
- { "Rubrivivax gelatinosus","ANDERFALAA" },
- { "Uncultured stronglyoides1","ANDERFALAA" },
- { "Azoarcus BH72","ANDERFALAA" },
- { "Xylella fastidiosa 1","ANEDNFAVAA" },
- { "Xylella fastidiosa 2","ANEDNFALAA" },
- { "Xylella fastidiosa 3","ANEDNFAIAA" },
- { "Xylella fastidiosa 4","ANEDNFALAA" },
- { "Xanthomonas campestris 1","ANDDNYGSDFAIAA" },
- { "Xanthomonas campestris 2","ANDDNYGSDSAIAA" },
- { "Xanthomonas axonopodis","ANDDNYGSDFAIAA" },
- { "Xanthomonas oryzae","ANDDNYGSDFAIAA" },
- { "Legionella pneumophila","ANDENFAGGEAIAA" },
- { "Coxiella burnetii","ANDSNYLQEAYA" },
- { "Methylococcus capsulatus","ANDDVYALAA" },
- { "Uncultured U01a","ANDSNYALAA" },
- { "Dichelobacter nodosus","ANDDNYALAA" },
- { "Francisella tularensis 1","GNKKANRVAANDSNFAAVAKAA" },
- { "Francisella tularensis 2","ANDSNFAAVAKAA" },
- { "Acidithiobacillus ferrooxidans","ANDSNYALAA" },
- { "Acinetobacter ADP1","ANDETYALAA" },
- { "Psychrobacter 2734","ANDENYALAA" },
- { "Psychrobacter cryohalolentis","ANDENYALAA" },
- { "Psychrobacter arcticus","ANDENYALAA" },
- { "Azotobacter vinelandii","ANDDNYALAA" },
- { "Pseudomonas aeruginosa","ANDDNYALAA" },
- { "Pseudomonas syringae 1","ANDENYGAQLAA" },
- { "Pseudomonas syringae 2","ANDETYGEYALAA" },
- { "Pseudomonas syringae 3","ANDENYGAQLAA" },
- { "Pseudomonas fluorescens 1","ANDDQYGAALAA" },
- { "Pseudomonas fluorescens 2","ANDENYGQEFALAA" },
- { "Pseudomonas putida 1","ANDENYGAEYKLAA" },
- { "Marinobacter hydrocarbonoclasticus","ANDENYALAA" },
- { "Marinobacter aquaeolei","ANDENYALAA" },
- { "Pseudoalteromonas haloplanktis","ANDDNYSLAA" },
- { "Pseudoalteromonas atlantica","ANDENYALAA" },
- { "Uncultured WW11","ANDDNYALAA" },
- { "Shewanella oneidensis","ANDDNYALAA" },
- { "Shewanella putrefaciens","ANDDNYALAA" },
- { "Shewanella PV-4","ANDDNYALAA" },
- { "Shewanella amazonensis","ANDDNYALAA" },
- { "Shewanella SAR-1","ANDDNYALAA" },
- { "Shewanella ANA-3","ANDDNYALAA" },
- { "Idiomarina loihiensis","ANDDNYALAA" },
- { "Photorhabdus asymbiotica","ANDNEYALVA" },
- { "Microbulbifer degradans","ANDDNYGAQLAA" },
- { "Saccharophagus degradans","ANDDNYGAQLAA" },
- { "Colwellia sp","ANDDTFALAA" },
- { "Colwellia psychrerythraea","ANDDTFALAA" },
- { "Photobacterium phosphoreum","ANDENYALAA" },
- { "Vibrio cholerae","ANDENYALAA" },
- { "Vibrio vulnificus","ANDENYALAA" },
- { "Vibrio Ex25","ANDENYALAA" },
- { "Vibrio parahemolyticus","ANDENYALAA" },
- { "Aeromonas salmonicida","ANDENYALAA" },
- { "Aeromonas hydrophila 1","ANDENYALAA" },
- { "Aeromonas hydrophila 2","ANDENYALAA" },
- { "Uncultured VLW3","ANDENYALAA" },
- { "Uncultured VLS13","ANDENYALAA" },
- { "Uncultured WW9","ANDENYALAA" },
- { "Uncultured WW10","ANDENYALAV" },
- { "Uncultured VLW5","ANDENYALAA" },
- { "Uncultured RCA4","ANDETYALAA" },
- { "Uncultured LEM1","ANDETYALAA" },
- { "Uncultured LEM2","ANDETHALAA" },
- { "Wigglesworthia brevipalpis","AKHKYNEPALLAA" },
- { "Wigglesworthia glossinidia","AKHKYNEPALLAA" },
- { "Buchnera aphidicola 1","ANNKQNYALAA" },
- { "Buchnera aphidicola 2","ANNKQNYALAA" },
- { "Buchnera aphidicola 3","AKQNQYALAA" },
- { "Shigella dysenteriae 1","ANDENYALAA" },
- { "Shigella dysenteriae 2","ANDENYALAA" },
- { "Shigella flexneri","ANDENYALAA" },
- { "Shigella boydii","ANDENYALAA" },
- { "Shigella sonnei","ANDENYALAA" },
- { "Escherichia coli","ANDENYALAA" },
- { "Providencia rettgeri","ANDENYALAA" },
- { "Serratia marcescens","ANDENYALAA" },
- { "Klebsiella pneumoniae","ANDENYALAA" },
- { "Pectobacterium carotovora","ANDENYALAA" },
- { "Erwinia chrysanthemi","ANDENFAPAALAA" },
- { "Erwinia amylovora","ANDENFAPAALAA" },
- { "Erwinia carotovora","ANDENYALAA" },
- { "Salmonella bongori","ANDENYALAA" },
- { "Salmonella typhimurium","ANDETYALAA" },
- { "Salmonella typhi","ANDETYALAA" },
- { "Salmonella paratyphi","ANDENYALAA" },
- { "Salmonella enterica 1","ANDETYALAA" },
- { "Salmonella enterica 2","ANDENYALAA" },
- { "Salmonella enterica 3","ANDETYALAA" },
- { "Salmonella enterica 5","ANDETYALAA" },
- { "Salmonella enterica 6","ANDENYALAA" },
- { "Uncultured RCA1","ANDENYALAA" },
- { "Uncultured VLS1","ANDENYALAA" },
- { "Uncultured WW1","ANDENYALAA" },
- { "Uncultured RCA2","SNDENYALAA" },
- { "Uncultured WW2","ANDENYALAA" },
- { "Uncultured QL1","ANVENYALAA" },
- { "Uncultured WW4","ANDGNYALAA" },
- { "Uncultured VLS5","ANDETYALAA" },
- { "Uncultured FS1","ANDETYALAA" },
- { "Uncultured VLS6","ANDENYALAA" },
- { "Uncultured FS2","ANDENYALAA" },
- { "Uncultured WW5","ANDENYALAA" },
- { "Uncultured VLW1","ANDENYALAA" },
- { "Uncultured VLS7","ANDENYALAA" },
- { "Uncultured VLS9","ANDENYALAA" },
- { "Uncultured VLW2","ANDENYALAA" },
- { "Uncultured WW7","ANDENCALAA" },
- { "Uncultured WW8","ANDENYALAA" },
- { "Yersinia enterocolitica","ANDSQYESAALAA" },
- { "Yersinia intermedia","ANDSQYESAALAA" },
- { "Yersinia mollaretii","ANDSQYESAALAA" },
- { "Yersinia bercovieri","ANDSQYESAALAA" },
- { "Yersinia pestis","ANDENYALAA" },
- { "Yersinia frederiksenii","ANDENYALAA" },
- { "Yersinia pseudotuberculosis","ANDENYALAA" },
- { "Mannheimia haemolytica","ANDEQYALAA" },
- { "Mannheimia succiniciproducens","ANDEQYALAA" },
- { "Haemophilus ducreyi","ANDEQYALAA" },
- { "Haemophilus influenzae","ANDEQYALAA" },
- { "Haemophilus somnus","ANDEQYALAA" },
- { "Pasteurella multocida","ANDEQYALAA" },
- { "Actinobacillus actinomycetemcomitans","ANDEQYALAA" },
- { "Actinobacillus pleuropneumoniae","ANDEQYALAA" },
- { "Lawsonia intracellularis","ANNNYDYALAA" },
- { "Desulfovibrio desulfuricans","ANNDYDYAYAA" },
- { "Desulfovibrio vulgaris","ANNYDYALAA" },
- { "Desulfovibrio yellowstonii","ANNELALAA" },
- { "Geobacter sulfurreducens","ADNYDYAVAA" },
- { "Geobacter metallireducens","ADNYDYAVAA" },
- { "Helicobacter pylori 1","VNNTDYAPAYAKAA" },
- { "Helicobacter pylori 2","VNNTDYAPAYAKAA" },
- { "Helicobacter pylori 3","VNNADYAPAYAKAA" },
- { "Campylobacter jejuni","ANNVKFAPAYAKAA" },
- { "Campylobacter lari","ANNVKFAPAYAKAA" },
- { "Campylobacter fetus 2","ANNVKFAPAYAKAA" },
- { "Campylobacter coli","ANNVKFAPAYAKAA" },
- { "Fusobacterium nucleatum 1","GNKDYALAA" },
- { "Fusobacterium nucleatum 2","GNKEYALAA" },
- { "Dehalococcoides ethenogenes","GERELVLAG" },
- { "Mycobacterium leprae","ADSYQRDYALAA" },
- { "Mycobacterium avium","ADSHQRDYALAA" },
- { "Mycobacterium bovis","ADSHQRDYALAA" },
- { "Mycobacterium tuberculosis","ADSHQRDYALAA" },
- { "Mycobacterium marinum","ADSHQRDYALAA" },
- { "Mycobacterium microti","ADSHQRDYALAA" },
- { "Mycobacterium africanum","ADSHQRDYALAA" },
- { "Mycobacterium smegmatis","ADSNQRDYALAA" },
- { "Corynebacterium diphtheriae","AENTQRDYALAA" },
- { "Corynebacterium glutamicum","AEKSQRDYALAA" },
- { "Thermobifida fusca","ANSKRTEFALAA" },
- { "Streptomyces coelicolor","ANTKRDSSQQAFALAA" },
- { "Streptomyces lividans","ANTKRDSSQQAFALAA" },
- { "Tropheryma whipplei","ANLKRTDLSLAA" },
- { "Clavibacter michiganensis","ANNKQSSFVLAA" },
- { "Bifidobacterium longum","AKSNRTEFALAA" },
- { "Bifidobacterium longum","AKSNRTEFALAA" },
- { "Bacillus anthracis","GKQNNLSLAA" },
- { "Bacillus thuringiensis","GKQNNLSLAA" },
- { "Bacillus cereus","GKQNNLSLAA" },
- { "Bacillus megaterium","GKSNNNFALAA" },
- { "Bacillus halodurans","GKENNNFALAA" },
- { "Bacillus clausii","GKENNNFALAA" },
- { "Bacillus subtilis","GKTNSFNQNVALAA" },
- { "Bacillus stearothermophilus","GKQNYALAA" },
- { "Geobacillus kaustophilus","GKQNYALAA" },
- { "Staphylococcus aureus","GKSNNNFAVAA" },
- { "Staphylococcus saprophyticus","GKENNNFAVAA" },
- { "Staphylococcus xylosus","GKENNNFAVAA" },
- { "Staphylococcus epidermidis","DKSNNNFAVAA" },
- { "Oceanobacillus iheyensis","GKETNQPVLAAA" },
- { "Listeria monocytogenes","GKEKQNLAFAA" },
- { "Listeria innocua","GKEKQNLAFAA" },
- { "Listeria welshimeri","GKEKQNLAFAA" },
- { "Listeria seeligeri","GKEKQNLAFAA" },
- { "Listeria grayi 1","GKEKQNLAFAA" },
- { "Listeria grayi 2","GKQNNNLAFAA" },
- { "Listeria ivanovii","GKEKQNLAFAA" },
- { "Lactobacillus gasseri","ANNENSYAVAA" },
- { "Lactobacillus johnsonii","ANNENSYAVAA" },
- { "Lactobacillus sakei","ANNNNSYAVAA" },
- { "Lactobacillus helveticus","ANNKNSYALAA" },
- { "Lactobacillus gallinarum","ANNKNSYALAA" },
- { "Lactobacillus acidophilus","ANNKNSYALAA" },
- { "Lactobacillus plantarum","AKNNNNSYALAA" },
- { "Pediococcus pentosaceus","AKNNNNSYALAA" },
- { "Leuconostoc mesenteroides","AKNENSFAIAA" },
- { "Leuconostoc lactis","AKNENSFAIAA" },
- { "Leuconostoc pseudomesenteroides","AKNENSYAIAA" },
- { "Enterococcus durans","AKNENNSYALAA" },
- { "Oenococcus oeni","AKNNEPSYALAA" },
- { "Enterococcus faecium","AKNENNSYALAA" },
- { "Enterococcus faecalis","AKNENNSFALAA" },
- { "Streptococcus equi","AKNNTTYALAA" },
- { "Streptococcus zooepidemicus","AKNNTTYALAA" },
- { "Streptococcus suis","AKNTNTYALAA" },
- { "Streptococcus uberis","AKNTNSYALAA" },
- { "Streptococcus pyogenes","AKNTNSYALAA" },
- { "Streptococcus agalactiae","AKNTNSYALAA" },
- { "Streptococcus mutans","AKNTNSYAVAA" },
- { "Streptococcus sobrinus","AKNTNSYAVAA" },
- { "Streptococcus gordonii","AKNNTSYALAA" },
- { "Streptococcus pneumoniae","AKNNTSYALAA" },
- { "Streptococcus mitis","AKNNTSYALAA" },
- { "Streptococcus thermophilus","AKNTNSYAVAA" },
- { "Lactococcus raffinolactis","AKNTQTYAVAA" },
- { "Lactococcus plantarum","AKNTQTYALAA" },
- { "Lactococcus garvieae","AKNNTSYALAA" },
- { "Lactococcus lactis","AKNNTQTYAMAA" },
- { "Mycoplasma capricolum","ANKNEETFEMPAFMMNNASAGANFMFA" },
- { "Mesoplasma florum","ANKNEENTNEVPTFMLNAGQANYAFA" },
- { "Spiroplasma kunkelii","ASKKQKEDKIEMPAFMMNNQLAVSMLAA" },
- { "Ureaplasma urealyticum","AENKKSSEVELNPAFMASATNANYAFAY" },
- { "Ureaplasma parvum","AENKKSSEVELNPAFMASATNANYAFAY" },
- { "Mycoplasma pulmonis","GTKKQENDYQDLMISQNLNQNLAFASV" },
- { "Mycoplasma penetrans","AKNNKNEAVEVELNDFEINALSQNANLALYA" },
- { "Mycoplasma genitalium 1","DKENNEVLVEPNLIINQQASVNFAFA" },
- { "Mycoplasma genitalium 2","DKENNEVLVDPNLIINQQASVNFAFA" },
- { "Mycoplasma pneumoniae","DKNNDEVLVDPMLIANQQASINYAFA" },
- { "Thermoanaerobacter tengcongensis","ADRELAYAA" },
- { "Heliobacillus mobilis","AEDNYALAA" },
- { "Desulfitobacterium hafniense","ANDDNYALAA" },
- { "Nitrosococcus oceani","ANDDNYALAA" },
- { "Thiomicrospira crunogena","ANDDNYALAA" },
- { "Stenotrophomonas maltophilia","ANDDNYALAA" },
- { "Carboxydothermus hydrogenoformans","ANENYALAA" },
- { "Ruminococcus albus","GHGYFAKAS" },
- { "Clostridium acetobutylicum","DNENNLALAA" },
- { "Clostridium perfringens","AEDNFALAA" },
- { "Clostridium thermocellum","ANEDNYALAAA" },
- { "Clostridium botulinum","ANDNFALAA" },
- { "Clostridium tetani","ADDNFVLAA" },
- { "Clostridium difficile","ADDNFAIAA" },
- { "Hyphomonas neptunium","ANDNFAEGELLAA" },
- { "Vibrio fischeri","ANDENYALAA" },
- { "Corynebacterium efficiens","AEKTQRDYALAA" },
- { "Streptomyces avermitilus","ANTKSDSQSFALAA" },
- { "Brevibacterium linens","AKSNNRTDFALAA" },
- { "Lactobacillus delbrueckii 1","AKNENNSYALAA" },
- { "Lactobacillus delbrueckii 2","ANENSYAVAA" },
- { "Lactobacillus casei","AKNENSYALAA" },
- { "Lactobacillus brevis","AKNNNNSYALAA" },
- { "Streptomyces thermophilus","AKNTNSYAVAA" },
- { "Bacillusphage G","AKLNITNNELQVA" },
- { "Thermodesulfobacterium commune","ANEYAYALAA" },
- { "Thermomicrobium roseum","GERELALAA" },
- { "Leptospirillum groupII","ANEELALAA" },
- { "Leptospirillum groupIII","ANEELALAA" },
- { "Gloeobacter violaceus","ATNNVVPFARARATVAA" },
- { "Crocosphaera watsonii","ANNIVSFKRVAVAA" },
- { "Thalassiosira pseudonana chloroplast","ANNIMPFMFNVVKTNRSLTTLNFAV" },
- { "Emiliania huxleyi chloroplast","ANNILNFNSKLAIA" },
- { "Cyanidium caldarium chloroplast","ANNIIEISNIRKPALVV" },
- { "Gracilaria tenuistipitata chloroplast","AKNNILTLSRRLIYA" },
- { "Prevotella ruminicola","GNNEYALAA" },
- { "Jannaschia sp. CCS1","ANDNRAPAMALAA" },
- { "Agrobacterium vitis","ANDNNAQGYAVAA" },
- { "Alphaproteobacteria SAR-1","ANDELALAA" },
- { "Gluconobacter oxydans","ANDNSEVLAVAA" },
- { "Sphingomonas elodea","ANDNEALAIAA" },
- { "Ehrlichia ruminantium 1","ANDNFVSANDNNSTANLVAA" },
- { "Ehrlichia ruminantium 2","ANDNFVSANDNNSTANLVAA" },
- { "Ehrlichia canis","ANDNFVFANDNNSSVAGLVAA" },
- { "Anaplasma marginale","ANDDFVAANDNMETAFVAAA" },
- { "Wolbachia sp. 2 (Brugi)","ANDNFAAEGDVAVAA" },
- { "Wolbachia sp. 3 (Culex)","ANDNFAAEDNVALAA" },
- { "Wolbachia sp. 4 (Dros.)","ANDNFAAEEYRVAA" },
- { "Rickettsia rickettsii","ANDNNRSVGRLALAA" },
- { "Tremblaya princeps 1 (Dysmicoccus)",
- "APSNRFTIVANDCIDALVRRAVV" },
- { "Azoarcus EbN1","ANDERFAVAA" },
- { "Dechloromonas aromatica","ANDEQFAIAA" },
- { "Dechloromonas agitata","ANDEQFAIAA" },
- { "Thiobacillus denitrificans","AKSKAARRNPACSAGVMELKA" },
- { "Shewanella SAR-2, version 2","ADYGYMAAA" },
- { "Shewanella SAR-1, version 2","ANNDNYALAA" },
- { "Uncultured marineEBAC20E09","ANNDNYALAA" },
- { "Pseudomonas fluorescens 3 (Pf-5)","ANDETYGDYALAA" },
- { "Uncultured remanei","ANDESYALAA" },
- { "Chromohalobacter salexigens","ANDDNYAQGALAA" },
- { "Gammaproteobacteria SAR-1","ANNYNYSLAA" },
- { "Shewanella denitrificans","ANDSNYSLAA" },
- { "Shewanella frigidimarina","ANDSNYSLAA" },
- { "Shewanella baltica","ANDSNYSLAA" },
- { "Photobacterium profundum","ANDENFALAA" },
- { "Blochmannia floridanus","AKNKYNEPVALAA" },
- { "Blochmannia pennsylvanicus","ANNTTYRESVALAA" },
- { "Photorhabdus luminescens","ANDEKYALAA" },
- { "Proteus mirabilis","ANDNQYKALAA" },
- { "Magnetococcus sp.","ANDEHYAPAFAAA" },
- { "Proteobacteria SAR-1, version 1","GENADYALAA" },
- { "Proteobacteria SAR-1, version 2","ANNYNYSLAA" },
- { "Proteobacteria SAR-1, version 3","ADNGYMAAA" },
- { "Desulfovibrio desulfuricans 2 (G20)","ANNDYEYAMAA" },
- { "Uncultured ciona","ANDEFFDARLRA" },
- { "Bacteriovorax marinus","AESNFAPAMAA" },
- { "Bdellovibrio bacteriovorus","GNDYALAA" },
- { "Myxococcus xanthus","ANDNVELALAA" },
- { "Wolinella succinogenes","ALSSHPKRGKRLGLPITSALGA" },
- { "Campylobacter upsaliensis","ANNAKFAPAYAKVA" },
- { "Helicobacter mustelae","ANNKNYAPAYAKVA" },
- { "Helicobacter hepaticus","ANNANYAPAYAKVA" },
- { "Ruminococcus albus","DNDNFAMAA" },
- { "Coprothermobacter proteolyticus","AEPEFALAA" },
- { "Moorella thermoacetica","ADDNLALAA" },
- { "Mycoplasma mycoides","ADKNEENFEMPAFMINNASAGANYMFA" },
- { "Mycoplasma mobile","GKEKQLEVSPLLMSSSQSNLVFA" },
- { "Mycoplasma arthritidis","GNLETSEDKKLDLQFVMNSQTQQNLLFA" },
- { "Paenibacillus larvae","GKQQNNYALAA" },
- { "Bacillus licheniformis","GKSNQNLALAA" },
- { "Actinomyces naeslundii","ADNTRTDFALAA" },
- { "Arthrobacter FB24","AKQTRTDFALAA" },
- { "Leifsonia xyli","ANSKSTVSAKADFALAA" },
- { "Nocardia farcinica","ADSHQREYALAA" },
- { "Propionibacterium acnes 1","AENTRTDFALAA" },
- { "Propionibacterium acnes 2","AENTRTDFALAA" },
- { "Streptomyces collinus","ANTKRDSSSFALAA" },
- { "Streptomyces aureofaciens","ANSKRDSQQFALAA" },
- { "Kineococcus radiotolerans","ADSKRTEFALAA" },
- { "Frankia sp. CcI3","ANKTQPTTPTYALAA" },
- { "Frankia sp. EAN1pec","ATKTQPASSTFALAA" },
- { "Rubrobacter xylanophilus","ANDREMALAA" },
- { "Parachlamydia UWE25","ANNSNKIAKVDFQEGTFARAA" },
- { "Verrucomicrobium spinosum","ANSNELALAA" },
- { "Acidobacterium capsulatum","ANNNLALAA" },
- { "Acidobacterium Ellin6076","ANTQFAYAA" },
- { "Solibacter usitatus","ANTQFAYAA" },
- { "Dictyoglomus thermophilum","ANTNLALAA" },
- { "Mycobacteriophage Bxz1 virion","ATDTDATVTDAEIEAFFAEEAAALV" },
- { "Catera virion","ATDTDATVTDAEIEAFFAEEAAALV" },
- { "Cyanobium gracile","ANNIVRFSRQAAPVAA" },
- { "Anabaena variabilis","ANNIVKFARKDALVAA" },
- { "Nitrosospira multiformis","ANDENYALAA" },
- { "Enterobacter sakazakii","ANDENYALAA" },
- { "Pantoea stewartii","ANDENYALAA" },
- { "Citrobacter rodentium","ANDENYALAA" },
- { "Prochlorococcus marinus","ANNIVSFSRQTAPVAA" },
- { "Azospira oryzae","ANDERFAIAA" },
- { "Uncultured phakopsora","ANDNSYALAA" },
- { "Syntrophus aciditrophicus","ANDYEYALAA" },
- { "Alkaliphilus metalliredigenes","ANDNYSLAAA" },
- { "Caldicellulosiruptor saccharolyticus","ADKAELALAA" } };
n = 0;
st = tag + len;
while (*--st == '*');
@@ -3416,16 +4555,109 @@ int identify_tag(char tag[], int len, char (*thit)[50], int nt)
return(-1); }
+
+int peptide_tag(char tag[], int maxlen, gene *t, csw *sw)
+{
+int i,lx,*se;
+se = t->eseq + t->tps;
+lx = (t->tpe - t->tps + 1);
+if (ltranslate(se+lx,t,sw) == '*')
+ { lx += 3;
+ if (ltranslate(se+lx,t,sw) == '*') lx += 3; }
+lx /= 3;
+if (lx > maxlen) lx = maxlen;
+for (i = 0; i < lx; i++)
+ { tag[i] = ltranslate(se,t,sw);
+ se += 3; }
+tag[i] = '\0';
+return(lx);
+}
+
+
+void update_tmrna_tag_database(gene ts[], int nt, csw *sw)
+{
+int nn,i,k,c,lx;
+char *sp,*se,*s;
+char species[STRLEN],tag[100];
+gene *t;
+if (sw->tagend >= NTAGMAX) return;
+for (i = 0; i < nt; i++)
+ { t = ts + i;
+ if (t->genetype != tmRNA) continue;
+ s = t->name;
+ se = NULL;
+ while (*s)
+ { if (*s == '|') se = s;
+ s++; }
+ if (!*se) continue;
+ while (++se) if (space(*se)) break;
+ if (!*se) continue;
+ while (++se) if (!space(*se)) break;
+ if (!*se) continue;
+ if (softstrpos(se," sp. "))
+ { if (!(sp = softstrpos(se,"two-piece")))
+ if (!(sp = softstrpos(se,"tmRNA")))
+ continue;
+ while (space(sp[-1])) sp--;
+ copy2sp(se,sp,species,49); }
+ else
+ { s = species;
+ c = 2;
+ while (*se)
+ { if (space(*se))
+ if (--c <= 0) break;
+ *s++ = *se++; }
+ *s = '\0'; }
+ for (k = 0; k < sw->tagend; k++)
+ if (softstrpos(tagdatabase[k].name,species)) break;
+ if (k < sw->tagend) continue;
+ copy(species,tagdatabase[sw->tagend].name);
+ s = tag;
+ lx = peptide_tag(s,50,t,sw);
+ s += (lx - 1);
+ while (*s == '*') s--;
+ *++s = '\0';
+ copy(tag,tagdatabase[sw->tagend].tag);
+ if (++sw->tagend >= NTAGMAX) break; }
+}
+
+int string_compare(char *s1, char *s2)
+{
+int r;
+char c1,c2;
+r = 0;
+while (c1 = *s1++)
+ { if (!(c2 = *s2++)) break;
+ r = (int)upcasec(c1) - (int)upcasec(c2);
+ if (r != 0) break; }
+return(r);
+}
+
+void report_new_tmrna_tags(csw *sw)
+{
+int k,n,sort[NTAGMAX];
+for (n = 0; n < sw->tagend; n++)
+ { k = n;
+ while (--k >= 0)
+ { if (string_compare(tagdatabase[n].name,tagdatabase[sort[k]].name) >= 0) break;
+ sort[k+1] = sort[k]; }
+ sort[++k] = n; }
+fprintf(sw->f,"\ntmRNA tag database update:\n");
+for (k = 0; k < sw->tagend; k++)
+ { n = sort[k];
+ fprintf(sw->f," { \"%s\",\"%s\"},\n",
+ tagdatabase[n].name,tagdatabase[n].tag); }
+fprintf(sw->f,"\n%d tmRNA peptide tags\n",sw->tagend);
+fprintf(sw->f,"%d new tmRNA peptide tags\n\n",sw->tagend - NTAG);
+}
+
+
void disp_peptide_tag(FILE *f, gene *t, csw *sw)
{ int i,lx,nm,nmh,c1,c2,c3,*s,*se;
- char tag[50],thit[21][50];
- fprintf(f,"Tag peptide (at %d)\nTag sequence: ",t->tps+1);
+ char tag[52],thit[21][50];
+ fprintf(f,"Tag peptide at [%d,%d]\nTag sequence: ",t->tps+1,t->tpe+1);
+ lx = peptide_tag(tag,50,t,sw);
se = t->eseq + t->tps;
- lx = (t->tpe - t->tps + 1);
- if (ltranslate(se+lx,t,sw) == '*')
- { lx += 3;
- if (ltranslate(se+lx,t,sw) == '*') lx += 3; }
- lx /= 3;
s = se;
for (i = 0; i < lx; i++)
{ if (i > 0) fputc('-',f);
@@ -3441,13 +4673,7 @@ void disp_peptide_tag(FILE *f, gene *t, csw *sw)
{ fprintf(f,"%s",translate(s,sw));
s += 3;
if (i < (lx-1)) fputc('-',f); }
- s = se;
- fprintf(f,"\nTag peptide: ");
- for (i = 0; i < lx; i++)
- { tag[i] = ltranslate(s,t,sw);
- fprintf(f,"%c",tag[i]);
- s += 3; }
- tag[lx] = '\0';
+ fprintf(f,"\nTag peptide: %s",tag);
if (sw->energydisp)
{ s = se;
fprintf(f,"\nTag Polarity: ");
@@ -3526,7 +4752,6 @@ void disp_location(gene *t, csw *sw, char *m)
fprintf(sw->f,"%s %s\n",m,position(sp,t,sw)); }
-
char *name(gene *t, char *si, int proc, csw *sw)
{ int s[5],*ss,*sin,*sm,*s0,*s1,*s2,*s3,nintron;
char *sb,*st;
@@ -3539,10 +4764,14 @@ char *name(gene *t, char *si, int proc, csw *sw)
sprintf(si,"srpRNA");
break;
case tmRNA:
- if (t->asst > 0)
- sprintf(si,"tmRNA (Permuted)");
+ if (sw->dispmatch)
+ { if (t->asst > 0)
+ sprintf(si,"tmRNA(Perm) ");
+ else sprintf(si,"tmRNA "); }
else
- sprintf(si,"tmRNA");
+ { if (t->asst > 0)
+ sprintf(si,"tmRNA (Permuted)");
+ else sprintf(si,"tmRNA"); }
break;
case tRNA:
ss = (proc?t->seq:t->ps);
@@ -3794,8 +5023,11 @@ void disp_tmrna_seq(FILE *f, gene *t, csw *sw)
{ fputc('\n',f);
i = 0; }}
if (i > 0) fputc('\n',f);
- fputc('\n',f);
- fprintf(f,"Resume consensus sequence (at %d): ",t->tps - 6);
+ fprintf(f,"\n5' tRNA domain at [%d,%d]\n",
+ 1,t->intron);
+ fprintf(f,"3' tRNA domain at [%d,%d]\n",
+ t->intron+t->nintron+1,t->nbase+t->nintron);
+ fprintf(f,"Resume consensus sequence at [%d,%d]: ",t->tps - 6,t->tps + 11);
s = t->eseq + t->tps - 7;
for (i = 0; i < 18; i++) fputc(cbase(*s++),f);
fputc('\n',f);
@@ -3858,7 +5090,11 @@ void disp_tmrna_perm_seq(FILE *f, gene *t, csw *sw)
{ fputc('\n',f);
i = 0; }}
if (i > 0) fputc('\n',f);
- fprintf(f,"\nResume consensus sequence (at %d): ",t->tps - 6);
+ fprintf(f,"\n5' tRNA domain at [%d,%d]\n",
+ t->asst+1,t->asst+t->astem1+t->dloop+t->cstem);
+ fprintf(f,"3' tRNA domain at [%d,%d]\n",
+ 55,t->intron);
+ fprintf(f,"Resume consensus sequence at [%d,%d]: ",t->tps - 6,t->tps + 11);
s = t->eseq + t->tps - 7;
for (i = 0; i < 18; i++) fputc(cbase(*s++),f);
fputc('\n',f);
@@ -3896,9 +5132,9 @@ void disp_cds(FILE *f, gene *t, csw *sw)
fputc('\n',f); }
-int pseudogene(gene *t)
+int pseudogene(gene *t, csw *sw)
{
-if (t->energy < 100.0) return(1);
+if (t->energy < sw->pseudogenethresh) return(1);
if (t->genetype == tRNA)
if (t->cloop != 7)
return(1);
@@ -3925,7 +5161,7 @@ void disp_gene(gene *t, char m[][MATY], csw *sw)
sprintf(stat,"%d bases, %%GC = %2.1f",t->nbase,100.0*gc);
xcopy(m,4,2,stat,length(stat));
if (sw->reportpseudogenes)
- if (pseudogene(t))
+ if (pseudogene(t,sw))
xcopy(m,4,4,"Possible Pseudogene",19);
if (sw->energydisp)
{ sprintf(stat,"Score = %g\n",t->energy);
@@ -3937,21 +5173,32 @@ void disp_batch_trna(FILE *f, gene *t, csw *sw)
char pos[50],species[50];
static char type[2][6] = { "tRNA","mtRNA" };
static char asterisk[2] = { ' ','*'};
- anticodon = 1 + t->anticodon;
- if (t->nintron > 0)
- if (t->intron <= t->anticodon)
- anticodon += t->nintron;
s = t->seq + t->anticodon;
- ps = sw->reportpseudogenes?(pseudogene(t)?1:0):0;
- switch(t->cloop)
- { case 6:
- case 8:
- sprintf(species,"%s-???%c",type[sw->mtrna],asterisk[ps]);
- break;
- case 7:
- default:
- sprintf(species,"%s-%s%c",type[sw->mtrna],aa(s,sw),asterisk[ps]);
- break; }
+ ps = sw->reportpseudogenes?(pseudogene(t,sw)?1:0):0;
+ if (sw->batchfullspecies)
+ { switch(t->cloop)
+ { case 6:
+ sprintf(species,"%s-?(%s|%s)%c",
+ type[sw->mtrna],aa(s-1,sw),aa(s,sw),asterisk[ps]);
+ break;
+ case 8:
+ sprintf(species,"%s-?(%s|%s)%c",
+ type[sw->mtrna],aa(s,sw),aa(s+1,sw),asterisk[ps]);
+ break;
+ case 7:
+ default:
+ sprintf(species,"%s-%s%c",type[sw->mtrna],aa(s,sw),asterisk[ps]);
+ break; }}
+ else
+ { switch(t->cloop)
+ { case 6:
+ case 8:
+ sprintf(species,"%s-???%c",type[sw->mtrna],asterisk[ps]);
+ break;
+ case 7:
+ default:
+ sprintf(species,"%s-%s%c",type[sw->mtrna],aa(s,sw),asterisk[ps]);
+ break; }}
position(pos,t,sw);
ls = length(species);
if (ls <= 10) fprintf(f,"%-10s%28s",species,pos);
@@ -3959,6 +5206,10 @@ void disp_batch_trna(FILE *f, gene *t, csw *sw)
else fprintf(f,"%-25s%13s",species,pos);
if (sw->energydisp)
{ fprintf(f,"\t%5.1f",t->energy); }
+ anticodon = 1 + t->anticodon;
+ if (t->nintron > 0)
+ if (t->intron <= t->anticodon)
+ anticodon += t->nintron;
fprintf(f,"\t%-4d",anticodon);
switch(t->cloop)
{ case 6:
@@ -4247,7 +5498,7 @@ void tmrna_score(FILE *f, gene *t, csw *sw)
{ int r,j,te,*s,*sb,*se,*tpos,tarm;
double e,er,et,eal,esp,ed,ec,ea,egga,etcca,egg,eta,edgg;
double ehairpin,euhairpin;
- static int gtemplate[6] = { 0x00,0x00,0x11,0x00,0x00,0x00 };
+ static int gtem[6] = { 0x00,0x00,0x11,0x00,0x00,0x00 };
static double tagend_score[4] = { 36.0, 66.0, 62.0, 72.0 };
static int nps[126] =
{ 0,0,0,0,
@@ -4335,9 +5586,9 @@ void tmrna_score(FILE *f, gene *t, csw *sw)
s = t->eseq + t->asst + t->astem1;
sb = s + 3;
se = s + 7;
- r = gtemplate[*sb++];
+ r = gtem[*sb++];
while (sb < se)
- { r = (r >> 4) + gtemplate[*sb++];
+ { r = (r >> 4) + gtem[*sb++];
if ((r & 3) == 2)
{ edgg = 14.0;
break; }}
@@ -4386,7 +5637,7 @@ void tmrna_score(FILE *f, gene *t, csw *sw)
int find_tstems(int *s, int ls, trna_loop hit[], int nh, csw *sw)
{ int i,r,c,tstem,tloop,ithresh1;
- int *s1,*s2,*se,*ss,*si,*sb,*sc,*sf,*sl,*sx,*template;
+ int *s1,*s2,*se,*ss,*si,*sb,*sc,*sf,*sl,*sx,*tem;
double ec,energy,penalty,thresh2;
static double bem[6][6] =
{ { -2.144,-0.428,-2.144, ATBOND, 0.000, 0.000 },
@@ -4399,22 +5650,22 @@ int find_tstems(int *s, int ls, trna_loop hit[], int nh, csw *sw)
static double C[6] = { 0.0,2.0,0.0,0.0,0.0,0.0 };
static double G[6] = { 0.0,0.0,2.0,0.0,0.0,0.0 };
static double T[6] = { 0.0,0.0,0.0,2.0,0.0,0.0 };
- static int template_trna[6] =
+ static int tem_trna[6] =
{ 0x0100, 0x0002, 0x2000, 0x0220, 0x0000, 0x0000 };
- static int template_tmrna[6] =
+ static int tem_tmrna[6] =
{ 0x0100, 0x0002, 0x2220, 0x0220, 0x0000, 0x0000 };
i = 0;
- template = (sw->tmrna)?template_tmrna:template_trna;
+ tem = (sw->tmrna)?tem_tmrna:tem_trna;
ithresh1 = (int)sw->ttscanthresh;
thresh2 = sw->ttarmthresh;
ss = s + sw->loffset;
si = ss + 4 - 1;
sl = s + ls - sw->roffset + 5 + 3;
- r = template[*si++];
- r = (r >> 4) + template[*si++];
- r = (r >> 4) + template[*si++];
+ r = tem[*si++];
+ r = (r >> 4) + tem[*si++];
+ r = (r >> 4) + tem[*si++];
while (si < sl)
- { r = (r >> 4) + template[*si++];
+ { r = (r >> 4) + tem[*si++];
if ((c = (r & 0xF)) < ithresh1) continue;
sb = si - 7;
sf = sb + 13;
@@ -4460,7 +5711,7 @@ int find_astem5(int *si, int *sl, int *astem3, int n3,
int *s1,*s2,*se;
unsigned int r,tascanthresh;
double tastemthresh,energy;
- static unsigned int template[6] = { 0,0,0,0,0,0 };
+ static unsigned int tem[6] = { 0,0,0,0,0,0 };
static unsigned int A[6] = { 0,0,0,2,0,0 };
static unsigned int C[6] = { 0,0,2,0,0,0 };
static unsigned int G[6] = { 0,2,0,1,0,0 };
@@ -4477,20 +5728,20 @@ int find_astem5(int *si, int *sl, int *astem3, int n3,
i = 0;
sl += n3;
se = astem3 + n3 - 1;
- template[0] = A[*se];
- template[1] = C[*se];
- template[2] = G[*se];
- template[3] = T[*se];
+ tem[0] = A[*se];
+ tem[1] = C[*se];
+ tem[2] = G[*se];
+ tem[3] = T[*se];
while (--se >= astem3)
- { template[0] = (template[0] << 4) + A[*se];
- template[1] = (template[1] << 4) + C[*se];
- template[2] = (template[2] << 4) + G[*se];
- template[3] = (template[3] << 4) + T[*se]; }
- r = template[*si++];
+ { tem[0] = (tem[0] << 4) + A[*se];
+ tem[1] = (tem[1] << 4) + C[*se];
+ tem[2] = (tem[2] << 4) + G[*se];
+ tem[3] = (tem[3] << 4) + T[*se]; }
+ r = tem[*si++];
k = 1;
- while (++k < n3) r = (r >> 4) + template[*si++];
+ while (++k < n3) r = (r >> 4) + tem[*si++];
while (si < sl)
- { r = (r >> 4) + template[*si++];
+ { r = (r >> 4) + tem[*si++];
if ((r & 15) >= tascanthresh)
{ s1 = astem3;
s2 = si;
@@ -4524,7 +5775,6 @@ V = A or C or G
M = A or C
H = A or C or T
K = G or T
-
*/
int find_resume_seq(int *s, int ls, trna_loop hit[], int nh, csw *sw)
@@ -4546,7 +5796,7 @@ int find_resume_seq(int *s, int ls, trna_loop hit[], int nh, csw *sw)
0,0,0,0, 0,0,0,0,
0,0,0,0, 0,0,0,0,0 };
static double score[4] = { 36.0, 66.0, 62.0, 72.0 };
- static unsigned int template[6] =
+ static unsigned int tem[6] =
{ 0x10310000, 0x01000101, 0x00010030,
0x02000100, 0x00000000, 0x00000000 };
static int A[6] = { 0,1,1,1,1,1 };
@@ -4555,16 +5805,16 @@ int find_resume_seq(int *s, int ls, trna_loop hit[], int nh, csw *sw)
thresh = (unsigned int)sw->tmrthresh;
i = 0;
sl = s + ls;
- r = template[*s++];
- r = (r >> 4) + template[*s++];
- r = (r >> 4) + template[*s++];
- r = (r >> 4) + template[*s++];
- r = (r >> 4) + template[*s++];
- r = (r >> 4) + template[*s++];
- r = (r >> 4) + template[*s++];
+ r = tem[*s++];
+ r = (r >> 4) + tem[*s++];
+ r = (r >> 4) + tem[*s++];
+ r = (r >> 4) + tem[*s++];
+ r = (r >> 4) + tem[*s++];
+ r = (r >> 4) + tem[*s++];
+ r = (r >> 4) + tem[*s++];
if (sw->tmstrict)
while (s < sl)
- { r = (r >> 4) + template[*s++];
+ { r = (r >> 4) + tem[*s++];
if ((c = (r & 0xF)) < thresh) continue;
c -= (V[s[1]] + V[s[2]] + M[s[5]] + A[s[8]]);
if (c < thresh) continue;
@@ -4605,7 +5855,7 @@ int find_resume_seq(int *s, int ls, trna_loop hit[], int nh, csw *sw)
i++; }
else
while (s < sl)
- { r = (r >> 4) + template[*s++];
+ { r = (r >> 4) + tem[*s++];
if ((c = (r & 0xF)) < thresh) continue;
if (i >= nh) goto FL;
st = s - 2;
@@ -4690,7 +5940,7 @@ gene *nearest_trna_gene(data_set *d, int nt, gene *t, csw *sw)
{ ilength = e - c;
if ((2*thresh) > (5*ilength)) continue;
if ((2*ilength) > (5*thresh)) continue; }
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
+ score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
if (score >= proximity)
if (ts[i].energy < energy)
{ n = i;
@@ -4708,7 +5958,7 @@ gene *nearest_trna_gene(data_set *d, int nt, gene *t, csw *sw)
{ ilength = e - c;
if ((2*thresh) > (5*ilength)) continue;
if ((2*ilength) > (5*thresh)) continue; }
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
+ score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
if (score >= proximity)
if (ts[i].energy < energy)
{ n = i;
@@ -4730,7 +5980,7 @@ gene *nearest_trna_gene(data_set *d, int nt, gene *t, csw *sw)
{ ilength = e - c;
if ((2*thresh) > (5*ilength)) continue;
if ((2*ilength) > (5*thresh)) continue; }
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
+ score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
if (score >= proximity)
if (ts[i].energy < energy)
{ n = i;
@@ -4748,7 +5998,7 @@ gene *nearest_trna_gene(data_set *d, int nt, gene *t, csw *sw)
{ ilength = e - c;
if ((2*thresh) > (5*ilength)) continue;
if ((2*ilength) > (5*thresh)) continue; }
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
+ score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
if (score >= proximity)
if (ts[i].energy < energy)
{ n = i;
@@ -4779,7 +6029,7 @@ gene *nearest_tmrna_gene(data_set *d, int nt, gene *t)
if (b < c) goto NXTW;
if (ts[i].genetype != tmRNA) continue;
if (ts[i].comp != comp) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
+ score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
if (score >= smax)
if (score > smax)
{ n = i;
@@ -4794,7 +6044,7 @@ gene *nearest_tmrna_gene(data_set *d, int nt, gene *t)
if (b < c) continue;
if (ts[i].genetype != tmRNA) continue;
if (ts[i].comp != comp) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
+ score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
if (score >= smax)
if (score > smax)
{ n = i;
@@ -4813,7 +6063,7 @@ gene *nearest_tmrna_gene(data_set *d, int nt, gene *t)
if (b < c) goto NXTN;
if (ts[i].genetype != tmRNA) continue;
if (ts[i].comp != comp) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
+ score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
if (score >= smax)
if (score > smax)
{ n = i;
@@ -4828,7 +6078,7 @@ gene *nearest_tmrna_gene(data_set *d, int nt, gene *t)
if (b < c) continue;
if (ts[i].genetype != tmRNA) continue;
if (ts[i].comp != comp) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
+ score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
if (score >= smax)
if (score > smax)
{ n = i;
@@ -4949,7 +6199,7 @@ gene *find_slot(data_set *d, gene *t, int *nts, csw *sw)
ts = tsn;
init_gene(sw->genespace,newspace);
sw->genespace = newspace; }
- copy3cr(d->seqname,t->name,79);
+ copy3cr(d->seqname,t->name,99);
tn = ts + (*nts);
*nts = (*nts) + 1;
if (sw->verbose)
@@ -5039,7 +6289,7 @@ int find_mt_trna(data_set *d, int *seq, int lseq, int nts, csw *sw)
static int RI[6] = { 1,0,1,0,1,0 };
static int YI[6] = { 0,1,0,1,1,0 };
static int WI[6] = { 1,0,0,1,1,0 };
- static unsigned int template[6] = { 0,0,0,0,0,0 };
+ static unsigned int tem[6] = { 0,0,0,0,0,0 };
static unsigned int At[6] = { 0,0,0,1,1,0 };
static unsigned int Ct[6] = { 0,0,1,0,1,0 };
static unsigned int Gt[6] = { 0,1,0,1,1,0 };
@@ -5389,30 +6639,30 @@ int find_mt_trna(data_set *d, int *seq, int lseq, int nts, csw *sw)
sg = sc + 16;
sge = sg + 30;
slb = sg + 32;
- template[0] = At[*slm];
- template[1] = Ct[*slm];
- template[2] = Gt[*slm];
- template[3] = Tt[*slm];
+ tem[0] = At[*slm];
+ tem[1] = Ct[*slm];
+ tem[2] = Gt[*slm];
+ tem[3] = Tt[*slm];
while (--slm > sle)
- { template[0] = (template[0] << 4) | At[*slm];
- template[1] = (template[1] << 4) | Ct[*slm];
- template[2] = (template[2] << 4) | Gt[*slm];
- template[3] = (template[3] << 4) | Tt[*slm]; }
+ { tem[0] = (tem[0] << 4) | At[*slm];
+ tem[1] = (tem[1] << 4) | Ct[*slm];
+ tem[2] = (tem[2] << 4) | Gt[*slm];
+ tem[3] = (tem[3] << 4) | Tt[*slm]; }
while (slm >= sb)
- { template[0] = ((template[0] << 4) | At[*slm]) & 0xfffff;
- template[1] = ((template[1] << 4) | Ct[*slm]) & 0xfffff;
- template[2] = ((template[2] << 4) | Gt[*slm]) & 0xfffff;
- template[3] = ((template[3] << 4) | Tt[*slm]) & 0xfffff;
+ { tem[0] = ((tem[0] << 4) | At[*slm]) & 0xfffff;
+ tem[1] = ((tem[1] << 4) | Ct[*slm]) & 0xfffff;
+ tem[2] = ((tem[2] << 4) | Gt[*slm]) & 0xfffff;
+ tem[3] = ((tem[3] << 4) | Tt[*slm]) & 0xfffff;
sf = slm + 3;
if (sf > sge) sf = sge;
apos2 = slm + 5;
si = sg;
s = si + 4;
- r = template[*si];
- while (++si < s) r = (r >> 4) + template[*si];
+ r = tem[*si];
+ while (++si < s) r = (r >> 4) + tem[*si];
while (si <= sf)
{ if (si < slm)
- r = (r >> 4) + template[*si++];
+ r = (r >> 4) + tem[*si++];
else
{ si++;
r = r >> 4; }
@@ -5658,28 +6908,28 @@ int find_mt_trna(data_set *d, int *seq, int lseq, int nts, csw *sw)
sle = sc - 4;
slb = sc - 8;
slm = sc - 1;
- template[0] = dAt[*slm];
- template[1] = dCt[*slm];
- template[2] = dGt[*slm];
- template[3] = dTt[*slm];
+ tem[0] = dAt[*slm];
+ tem[1] = dCt[*slm];
+ tem[2] = dGt[*slm];
+ tem[3] = dTt[*slm];
while (--slm > sle)
- { template[0] = (template[0] << 4) | dAt[*slm];
- template[1] = (template[1] << 4) | dCt[*slm];
- template[2] = (template[2] << 4) | dGt[*slm];
- template[3] = (template[3] << 4) | dTt[*slm]; }
+ { tem[0] = (tem[0] << 4) | dAt[*slm];
+ tem[1] = (tem[1] << 4) | dCt[*slm];
+ tem[2] = (tem[2] << 4) | dGt[*slm];
+ tem[3] = (tem[3] << 4) | dTt[*slm]; }
slm1 = slm;
while (slm > slb)
- { template[0] = ((template[0] << 4) | dAt[*slm]) & 0xffff;
- template[1] = ((template[1] << 4) | dCt[*slm]) & 0xffff;
- template[2] = ((template[2] << 4) | dGt[*slm]) & 0xffff;
- template[3] = ((template[3] << 4) | dTt[*slm]) & 0xffff;
+ { tem[0] = ((tem[0] << 4) | dAt[*slm]) & 0xffff;
+ tem[1] = ((tem[1] << 4) | dCt[*slm]) & 0xffff;
+ tem[2] = ((tem[2] << 4) | dGt[*slm]) & 0xffff;
+ tem[3] = ((tem[3] << 4) | dTt[*slm]) & 0xffff;
slm--;
si = slm - 18;
s = si + 3;
- r = template[*si];
- while (++si < s) r = (r >> 4) + template[*si];
+ r = tem[*si];
+ while (++si < s) r = (r >> 4) + tem[*si];
while (si <= slm1)
- { if (si < slm) r = (r >> 4) + template[*si++];
+ { if (si < slm) r = (r >> 4) + tem[*si++];
else
{ r = r >> 4;
si++; }
@@ -5872,32 +7122,32 @@ int find_mt_trna(data_set *d, int *seq, int lseq, int nts, csw *sw)
sg = sf - 6;
sb = sc + 17;
se = s2 + 6;
- template[0] = aAt[*se];
- template[1] = aCt[*se];
- template[2] = aGt[*se];
- template[3] = aTt[*se];
+ tem[0] = aAt[*se];
+ tem[1] = aCt[*se];
+ tem[2] = aGt[*se];
+ tem[3] = aTt[*se];
while (--se > s2)
- { template[0] = (template[0] << 4) | aAt[*se];
- template[1] = (template[1] << 4) | aCt[*se];
- template[2] = (template[2] << 4) | aGt[*se];
- template[3] = (template[3] << 4) | aTt[*se]; }
+ { tem[0] = (tem[0] << 4) | aAt[*se];
+ tem[1] = (tem[1] << 4) | aCt[*se];
+ tem[2] = (tem[2] << 4) | aGt[*se];
+ tem[3] = (tem[3] << 4) | aTt[*se]; }
ti = (int)(se - sc);
while (se >= sb)
- { template[0] = ((template[0] << 4) | aAt[*se]) & 0xfffffff;
- template[1] = ((template[1] << 4) | aCt[*se]) & 0xfffffff;
- template[2] = ((template[2] << 4) | aGt[*se]) & 0xfffffff;
- template[3] = ((template[3] << 4) | aTt[*se]) & 0xfffffff;
+ { tem[0] = ((tem[0] << 4) | aAt[*se]) & 0xfffffff;
+ tem[1] = ((tem[1] << 4) | aCt[*se]) & 0xfffffff;
+ tem[2] = ((tem[2] << 4) | aGt[*se]) & 0xfffffff;
+ tem[3] = ((tem[3] << 4) | aTt[*se]) & 0xfffffff;
if (tendmap[ti])
{ nti = (tendmap[ti] < 0x2000)?1:0; }
else
{ if (se > sle) goto ANX;
nti = -1; }
si = sg;
- r = template[*si];
- while (++si < sf) r = (r >> 4) + template[*si];
+ r = tem[*si];
+ while (++si < sf) r = (r >> 4) + tem[*si];
di = (int)(sc - si);
while (si < sa)
- { r = (r >> 4) + template[*si++];
+ { r = (r >> 4) + tem[*si++];
if (dposmap[--di])
{ if (nti <= 0)
{ if (nti < 0)
@@ -6037,28 +7287,6 @@ int find_mt_trna(data_set *d, int *seq, int lseq, int nts, csw *sw)
ea -= 2.0;
break; }
-/* if (incds) continue; */
-
-
-
-/*
- s = apos1 + nbase/2;
- if (incodon(s-75,s+75) > 30.0) /@ 3.5,3.0,2.5 @/
- { incds = 1;
- ea -= 2.0; }
- else
- incds = 0;
-*/
-
-/*
- s = apos1 + nbase/2;
- if (incodon(s-150,s+150) > 0.0)
- { incds = 1;
- ea -= 2.0; }
- else
- incds = 0;
-*/
-
/* cycle through carms that fall between astem */
@@ -7050,28 +8278,28 @@ int find_mt_trna(data_set *d, int *seq, int lseq, int nts, csw *sw)
/* remember fully formed D-loop replacement mttRNA gene */
/* if threshold reached */
- if (energy < thresh) goto DN;
- te.energy = energy;
- thresh = energy;
- te.ps = apos1;
- te.spacer1 = 0;
- te.spacer2 = 0;
- te.dstem = 0;
- te.dloop = dloop;
- te.cstem = cstem;
- te.cloop = cloop;
- te.anticodon = astem + dloop + cstem + 2;
- te.nintron = 0;
- te.intron = 0;
- te.var = var;
- te.varbp = 0;
- te.tstem = tstem;
- te.tloop = tl;
- te.nbase = astem + dloop + carm + var +
+ if (energy < thresh) goto DN;
+ te.energy = energy;
+ thresh = energy;
+ te.ps = apos1;
+ te.spacer1 = 0;
+ te.spacer2 = 0;
+ te.dstem = 0;
+ te.dloop = dloop;
+ te.cstem = cstem;
+ te.cloop = cloop;
+ te.anticodon = astem + dloop + cstem + 2;
+ te.nintron = 0;
+ te.intron = 0;
+ te.var = var;
+ te.varbp = 0;
+ te.tstem = tstem;
+ te.tloop = tl;
+ te.nbase = astem + dloop + carm + var +
2*tstem + tl;
- tastem = astem;
- tastem8 = astem8;
- tastem8d = astem8d;
+ tastem = astem;
+ tastem8 = astem8;
+ tastem8d = astem8d;
/* build fully formed cloverleaf mttRNA genes */
@@ -8106,7 +9334,7 @@ int tmopt(data_set *d,
int nts,int *seq, csw *sw)
{ int r,na,nr,nrh,ibase,flag,as,aext,nbasefext;
int *s,*v,*s1,*s2,*sa,*sb,*se,*sf,*ps,*tpos,pseq[MAXETRNALEN+1];
- static int gtemplate[6] = { 0x00,0x00,0x11,0x00,0x00,0x00 };
+ static int gtem[6] = { 0x00,0x00,0x11,0x00,0x00,0x00 };
static double A[6] = { 6.0,0.0,0.0,0.0,0.0,0.0 };
static double Ar[6] = { 10.0,0.0,0.0,0.0,0.0,0.0 };
static double Cr[6] = { 0.0,10.0,0.0,0.0,0.0,0.0 };
@@ -8170,9 +9398,9 @@ int tmopt(data_set *d,
if (energy < cathresh) continue;
sb = sa + 3;
sf = sa + 7;
- r = gtemplate[*sb++];
+ r = gtem[*sb++];
while (sb < sf)
- { r = (r >> 4) + gtemplate[*sb++];
+ { r = (r >> 4) + gtem[*sb++];
if ((r & 3) == 2)
{ energy += 14.0;
break; }}
@@ -8213,7 +9441,7 @@ int tmopt_perm(data_set *d,
int nts, int *seq, csw *sw)
{ int r,na,nr,nrh,flag,as,aext;
int *s,*v,*s1,*s2,*sa,*sb,*se,*sf,*ps,*apos,*tpos;
- static int gtemplate[6] = { 0x00,0x00,0x11,0x00,0x00,0x00 };
+ static int gtem[6] = { 0x00,0x00,0x11,0x00,0x00,0x00 };
double e,energy,penergy,tenergy,aenergy,athresh,cthresh,cathresh;
static double A[6] = { 6.0,0.0,0.0,0.0,0.0,0.0 };
static double Ar[6] = { 10.0,0.0,0.0,0.0,0.0,0.0 };
@@ -8271,9 +9499,9 @@ int tmopt_perm(data_set *d,
if (energy < cathresh) continue;
sb = sa + 3;
sf = sa + 7;
- r = gtemplate[*sb++];
+ r = gtem[*sb++];
while (sb < sf)
- { r = (r >> 4) + gtemplate[*sb++];
+ { r = (r >> 4) + gtem[*sb++];
if ((r & 3) == 2)
{ energy += 14.0;
break; }}
@@ -8284,10 +9512,10 @@ int tmopt_perm(data_set *d,
{ ps = rhit[nr].pos;
t.energy = penergy + rhit[nr].energy;
if (rhit[nr].stem < 24) t.energy -= 15.0;
- if (t.energy > te.energy)
+ if (t.energy > te.energy)
{ flag = 1;
- t.tstem = th->stem;
- t.tloop = th->loop;
+ t.tstem = th->stem;
+ t.tloop = th->loop;
t.asst = (long)(apos - tpos) + t.var + t.cstem;
t.ps = tpos - t.var - t.cstem;
t.tps = (int)(ps - t.ps);
@@ -8609,7 +9837,7 @@ int tmioptimise(data_set *d, int *seq, int lseq, int nts, csw *sw)
if (*tloopfold == Guanine)
{ sb = dpos + dstem + 2;
sc = sb;
- se = sb + t.dloop - 3;
+ se = sb + dhit[nd1].loop - 3;
r = TT[*sb++];
while (sb < se)
{ r = (r >> 4) + TT[*sb++];
@@ -8636,6 +9864,8 @@ int tmioptimise(data_set *d, int *seq, int lseq, int nts, csw *sw)
{ denergy = e;
dhit[ndx].end = NULL;
ndx = nd2; }}}
+ cposmin = 0;
+ cposmax = 0;
nd1 = ndh;
while (--nd1 >= 0)
{ if (!dhit[nd1].end) continue;
@@ -8792,11 +10022,11 @@ int tmioptimise(data_set *d, int *seq, int lseq, int nts, csw *sw)
energy += 4.0; }
if (energy < ethresh) continue;
t.energy = energy;
+ t.dstem = dstem;
t.astem1 = (t.dstem < 6)?7:((t.tstem < 5)?9:8);
t.astem2 = t.astem1;
t.ps = apos + 7 - t.astem1;
t.nbase = (int)(tend - t.ps) + t.astem2;
- t.dstem = dstem;
t.dloop = dhit[ndx].loop;
t.spacer1 = (int)(dpos - apos) - 7;
t.spacer2 = (int)(cpos - dhit[ndx].end);
@@ -8816,32 +10046,31 @@ int tmioptimise(data_set *d, int *seq, int lseq, int nts, csw *sw)
void disp_ftable_entry(FILE *f, int n[], int i, int m, csw *sw)
{ if (m > 0)
- switch(sw->geneticcode)
- { case METAZOAN_MT:
- if (i < 2) fprintf(f," %-4s %-4d",aa(n,sw),m);
- else fprintf(f," %-18s %-4d",aa(n,sw),m);
- break;
- case STANDARD:
- case VERTEBRATE_MT:
- default:
- fprintf(f," %-4s %-5d",aa(n,sw),m);
- break; }
- else
- switch(sw->geneticcode)
- { case METAZOAN_MT:
- if (i < 2) fprintf(f," %-4s ",aa(n,sw));
- else fprintf(f," %-18s ",aa(n,sw));
- break;
- case STANDARD:
- case VERTEBRATE_MT:
- default:
- fprintf(f," %-4s ",aa(n,sw));
- break; }}
+ switch(sw->geneticcode)
+ { case METAZOAN_MT:
+ fprintf(f," %-18s %-4d",aa(n,sw),m);
+ break;
+ case STANDARD:
+ case VERTEBRATE_MT:
+ default:
+ fprintf(f," %-4s %-5d",aa(n,sw),m);
+ break; }
+ else
+ switch(sw->geneticcode)
+ { case METAZOAN_MT:
+ fprintf(f," %-18s ",aa(n,sw));
+ break;
+ case STANDARD:
+ case VERTEBRATE_MT:
+ default:
+ fprintf(f," %-4s ",aa(n,sw));
+ break; }}
void disp_freq_table(int nt, csw *sw)
-{ int i,j,k,m,ambig,*s,c1,c2,c3,c[3],n[3],table[4][4][4];
+{ int i,j,k,m,ambig,*s,c1,c2,c3,c[3],a[3],table[4][4][4];
static int cgflip[4] = { 0,2,1,3 };
+ static int codonorder[4] = { 3,1,0,2 };
FILE *f = sw->f;
ambig = 0;
for (i = 0; i < 4; i++)
@@ -8864,40 +10093,43 @@ void disp_freq_table(int nt, csw *sw)
else ambig++;
else ambig++; }
else ambig++;
- fprintf(f,"tRNA Anticodon Frequency\n");
- for (j = 0; j < 4; j++)
- { n[2] = cgflip[j];
- for (k = 0; k < 4; k++)
- { n[1] = cgflip[k];
- for (i = 0; i < 4; i++)
- { n[0] = cgflip[i];
- fprintf(f,"%c%c%c",cpbase(n[0]),cpbase(n[1]),cpbase(n[2]));
- m = table[n[0]][n[1]][n[2]];
- disp_ftable_entry(f,n,i,m,sw); }
- fputc('\n',f); }}
+ fprintf(f,"tRNA anticodon frequency\n");
+ for (i = 0; i < 4; i++)
+ { c[0] = codonorder[i];
+ a[2] = 3 - c[0];
+ for (j = 0; j < 4; j++)
+ { c[2] = codonorder[j];
+ a[0] = 3 - c[2];
+ for (k = 0; k < 4; k++)
+ { c[1] = codonorder[k];
+ a[1] = 3 - c[1];
+ fprintf(f,"%c%c%c",cpbase(a[0]),cpbase(a[1]),cpbase(a[2]));
+ m = table[a[0]][a[1]][a[2]];
+ disp_ftable_entry(f,a,k,m,sw); }
+ fputc('\n',f); }
+ if (i < 3) fputc('\n',f); }
if (ambig > 0) fprintf(f,"Ambiguous: %d\n",ambig);
- fprintf(f,"\ntRNA Codon Frequency\n");
+ fprintf(f,"\ntRNA codon frequency\n");
for (i = 0; i < 4; i++)
- { n[0] = 3 - cgflip[i];
+ { c[0] = codonorder[i];
+ a[2] = 3 - c[0];
for (j = 0; j < 4; j++)
- { n[1] = 3 - cgflip[j];
+ { c[2] = codonorder[j];
+ a[0] = 3 - c[2];
for (k = 0; k < 4; k++)
- { n[2] = 3 - cgflip[k];
- fprintf(f,"%c%c%c",cpbase(n[0]),cpbase(n[1]),cpbase(n[2]));
- c[0] = 3 - n[2];
- c[1] = 3 - n[1];
- c[2] = 3 - n[0];
- m = table[c[0]][c[1]][c[2]];
- disp_ftable_entry(f,c,k,m,sw); }
- fputc('\n',f); }}
+ { c[1] = codonorder[k];
+ a[1] = 3 - c[1];
+ fprintf(f,"%c%c%c",cpbase(c[0]),cpbase(c[1]),cpbase(c[2]));
+ m = table[a[0]][a[1]][a[2]];
+ disp_ftable_entry(f,a,k,m,sw); }
+ fputc('\n',f); }
+ if (i < 3) fputc('\n',f); }
if (ambig > 0) fprintf(f,"Ambiguous: %d\n",ambig);
fputc('\n',f); }
void disp_energy_stats(data_set *d, int nt, csw *sw)
{ int i,n[NS],genetype,introns,nintron,trna,mtrna,ntv,nd,nps;
double gc,gcmin[NS],gcmax[NS];
- static char genetype_name[NS][30] =
- { "tRNA genes","tmRNA genes","srpRNA genes","rRNA genes","CDS genes","Overall" };
FILE *f = sw->f;
mtrna = sw->mtrna;
trna = sw->trna | mtrna;
@@ -8919,7 +10151,7 @@ void disp_energy_stats(data_set *d, int nt, csw *sw)
{ n[NS-1]++;
genetype = ts[i].genetype;
n[genetype]++;
- if (pseudogene(ts + i)) nps++;
+ if (pseudogene(ts + i,sw)) nps++;
if (genetype == tRNA)
{ if (mtrna)
{ if (ts[i].tstem == 0) ntv++;
@@ -8944,7 +10176,7 @@ void disp_energy_stats(data_set *d, int nt, csw *sw)
fprintf(f,"Number of tRNA genes with C-loop introns = %d\n",
nintron); }
else
- fprintf(f,"Number of %s = %d\n",genetype_name[tRNA],n[tRNA]);
+ fprintf(f,"Number of %s genes = %d\n",sw->genetypename[tRNA],n[tRNA]);
if (mtrna)
{ if (sw->tvloop)
fprintf(f,"Number of TV replacement loop tRNA genes = %d\n",
@@ -8957,7 +10189,7 @@ void disp_energy_stats(data_set *d, int nt, csw *sw)
if (sw->tmrna)
{ sw->ngene[tmRNA] += n[tmRNA];
if ((n[tmRNA] > 1) || (trna && (n[tRNA] > 0)))
- fprintf(f,"Number of %s = %d\n",genetype_name[tmRNA],n[tmRNA]); }
+ fprintf(f,"Number of %s genes = %d\n",sw->genetypename[tmRNA],n[tmRNA]); }
sw->nps += nps;
if (sw->reportpseudogenes)
if (nps > 0)
@@ -8966,6 +10198,7 @@ void disp_energy_stats(data_set *d, int nt, csw *sw)
fputc('\n',f);
fputc('\n',f); }
+
void batch_energy_stats(data_set *d, int nt, csw *sw)
{ int i,n[NS],genetype,introns,nintron,trna,mtrna,ntv,nd,nps;
double gc,gcmin[NS],gcmax[NS];
@@ -9045,23 +10278,81 @@ int gene_sort(data_set *d, int nt, int sort[], csw *sw)
int iamatch(data_set *d, gene *t, csw *sw)
{ char key[5],*k,s[100];
- if (!(k = softstrpos(d->seqname,"TRNA-"))) return(-1);
- copy3cr(k+5,key,3);
+ if (k = softstrpos(d->seqname,"TRNA-")) k += 5;
+ else
+ if (k = wildstrpos(d->seqname,"|***|")) k++;
+ else return(-1);
+ copy3cr(k,key,3);
name(t,s,1,sw);
if (softstrpos(s,key)) return(1);
return(0); }
-int nearest_annotated_gene(data_set *d, gene *t, int matchgenetype)
-{ int n,i,nagene,max;
- long a,b,c,e,score,thresh,psmax,proximity;
+
+int gene_mismatch(data_set *d, annotated_gene *agene, gene *t, csw *sw)
+{
+int w,alen,dlen;
+char *s;
+w = 0;
+dlen = seqlen(t);
+alen = aseqlen(d,agene);
+switch(t->genetype)
+ { case tRNA:
+ s = aa(t->seq + t->anticodon,sw);
+ if (!softstrpos(s,agene->species+5))
+ { if (t->cloop == 8)
+ { s = aa(t->seq + t->anticodon + 1,sw);
+ if (!softstrpos(s,agene->species+5)) w += 1; }
+ else if (t->cloop == 6)
+ { s = aa(t->seq + t->anticodon - 1,sw);
+ if (!softstrpos(s,agene->species+5)) w += 1; }
+ else w += 1; }
+ if (agene->comp != t->comp) w += 2;
+ if (alen <= (dlen - sw->trnalenmisthresh)) w += 4;
+ else if (alen >= (dlen + sw->trnalenmisthresh)) w += 4;
+ break;
+ case tmRNA:
+ if (agene->comp != t->comp) w += 2;
+ if (alen <= (dlen - sw->tmrnalenmisthresh)) w += 4;
+ else if (alen >= (dlen + sw->tmrnalenmisthresh)) w += 4;
+ break; }
+return(w);
+}
+
+
+int gene_mismatch_report(data_set *d, annotated_gene *agene, gene *t, char *report, csw *sw)
+{
+int w;
+char *s;
+w = gene_mismatch(d,agene,t,sw);
+s = report;
+if (w & 1) s = copy("amino acceptor",s);
+if (w & 2)
+ { if (w & 1)
+ if (w & 4) s = copy(", ",s);
+ else s = copy(" and ",s);
+ s = copy("sense",s); }
+if (w & 4)
+ { if ((w & 3) > 0) s = copy(" and ",s);
+ s = copy("sequence length",s); }
+if (w > 0) s = copy(" mismatch",s);
+*s = '\0';
+return(w);
+}
+
+
+
+int nearest_annotated_gene(data_set *d, gene *t,
+ int list[], int score[], int nmax,
+ csw *sw)
+{ int n,i,j,k,q,w,nagene;
+ long a,b,c,e,thresh,psmax;
+ char *s;
annotated_gene *ta;
psmax = d->psmax;
nagene = d->nagene[NS-1];
ta = d->gene;
- n = -1;
- max = 0;
- proximity = matchgenetype?40:1;
+ n = 0;
a = t->start;
b = t->stop;
thresh = b-a;
@@ -9075,25 +10366,19 @@ int nearest_annotated_gene(data_set *d, gene *t, int matchgenetype)
{ e += psmax;
if (a > e) goto NXTW;
if (b < c) goto NXTW;
- if (matchgenetype)
- if (ta[i].genetype != t->genetype) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
- if (score >= proximity)
- if (score > max)
- { n = i;
- max = score; }
+ if (n >= nmax) break;
+ list[n] = i;
+ score[n] = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
+ n++;
NXTW:
c -= psmax;
e -= psmax; }
if (a > e) continue;
if (b < c) continue;
- if (matchgenetype)
- if (ta[i].genetype != t->genetype) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
- if (score >= proximity)
- if (score > max)
- { n = i;
- max = score; } }
+ if (n >= nmax) break;
+ list[n] = i;
+ score[n] = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
+ n++; }
a -= psmax;
b -= psmax; }
for (i = 0; i < nagene; i++)
@@ -9103,302 +10388,342 @@ int nearest_annotated_gene(data_set *d, gene *t, int matchgenetype)
{ e += psmax;
if (a > e) goto NXTN;
if (b < c) goto NXTN;
- if (matchgenetype)
- if (ta[i].genetype != t->genetype) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
- if (score >= proximity)
- if (score > max)
- { n = i;
- max = score; }
+ if (n >= nmax) break;
+ list[n] = i;
+ score[n] = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
+ n++;
NXTN:
c -= psmax;
e -= psmax; }
if (a > e) continue;
if (b < c) continue;
- if (matchgenetype)
- if (ta[i].genetype != t->genetype) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
- if (score >= proximity)
- if (score > max)
- { n = i;
- max = score; } }
+ if (n >= nmax) break;
+ list[n] = i;
+ score[n] = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?e-c:b-c);
+ n++; }
+ for (i = 0; i < n; i++)
+ { k = list[i];
+ if (ta[k].genetype == t->genetype)
+ { score[i] += 5000;
+ w = gene_mismatch(d,ta + k,t,sw);
+ if (w & 1) score[i] -= 2;
+ if (w & 2) score[i] -= 1; }}
+ if (n > 1)
+ { for (i = 0; i < (n-1); i++)
+ for (j = i+1; j < n; j++)
+ if (score[j] > score[i])
+ { k = list[i];
+ list[i] = list[j];
+ list[j] = k;
+ k = score[i];
+ score[i] = score[j];
+ score[j] = k; }}
return(n); }
+
+
+
+int proximity_compare(data_set *d, int is,
+ long prox, long dlen, long alen,
+ annotated_gene *a,
+ csw *sw)
+{
+int w,score;
+long diff;
+char nm[200];
+gene *t;
+t = ts + is;
+w = gene_mismatch(d,a,t,sw);
+if (prox >= alen)
+ { diff = dlen - alen;
+ if (prox >= (2L*diff)) score = (int)(prox - diff);
+ else score = (int)(prox/2L); }
+else
+ if (prox >= dlen)
+ { diff = alen - dlen;
+ if (prox >= (2L*diff)) score = (int)(prox - diff);
+ else score = (int)(prox/2L); }
+else { score = (int)prox; }
+if (w & 1) score -= 10;
+if (w & 2) score -= 2;
+if (score < 0) score = 0;
+if (t->annotation >= 0)
+ if (t->annosc >= score) return(-1);
+return(score);
+}
+
-int nearest_detected_gene(data_set *d, int *sort, int ns,
- int proxtype, int *overlap,
- annotated_gene *t)
+
+
+int nearest_detected_gene(data_set *d, int sort[], int nd,
+ int *scorep,
+ annotated_gene *ag, csw *sw)
{ int n,i,is;
- long a,b,c,e,score,thresh,scoremax,psmax;
- long proximity;
- double energy;
+ long a,b,c,e,score,alen,scoremax,psmax;
+ long prox,proximity;
psmax = d->psmax;
n = -1;
- energy = -INACTIVE;
scoremax = -1;
- a = t->start;
- b = t->stop;
- thresh = b-a;
- proximity = thresh;
- if (proximity < 0) proximity = -proximity;
- proximity = 1 + proximity/2;
- if (proximity > 40) proximity = 40;
+ a = ag->start;
+ b = ag->stop;
+ alen = b - a;
+ if (b < a) alen += psmax;
+ proximity = 1 + alen/2;
+ if (proximity > 30) proximity = 30;
if (b < a)
{ b += psmax;
- thresh += psmax;
- for (i = 0; i < ns; i++)
+ for (i = 0; i < nd; i++)
{ is = sort[i];
+ if (ag->genetype != ts[is].genetype) continue;
c = ts[is].start;
e = ts[is].stop;
if (e < c)
{ e += psmax;
if (a > e) goto NXTW;
if (b < c) goto NXTW;
- if (ts[is].genetype != t->genetype) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
- if (score >= proximity)
- if (proxtype)
- { if (score > scoremax)
- { n = i;
- scoremax = score; }}
- else
- if (ts[is].energy > energy)
- { n = i;
- scoremax = score;
- energy = ts[is].energy; }
+ prox = (a >= c)?((b >= e)?e-a:alen):((b >= e)?e-c:b-c);
+ if (prox >= proximity)
+ if ((score = proximity_compare(d,is,prox,e-c,alen,ag,sw)) > scoremax)
+ { n = i;
+ scoremax = score; }
NXTW:
c -= psmax;
e -= psmax; }
if (a > e) continue;
if (b < c) continue;
- if (ts[is].genetype != t->genetype) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
- if (score >= proximity)
- if (proxtype)
- { if (score > scoremax)
- { n = i;
- scoremax = score; }}
- else
- if (ts[is].energy > energy)
- { n = i;
- scoremax = score;
- energy = ts[is].energy; } }
+ prox = (a >= c)?((b >= e)?e-a:alen):((b >= e)?e-c:b-c);
+ if (prox >= proximity)
+ if ((score = proximity_compare(d,is,prox,e-c,alen,ag,sw)) > scoremax)
+ { n = i;
+ scoremax = score; }}
a -= psmax;
b -= psmax; }
- for (i = 0; i < ns; i++)
+ for (i = 0; i < nd; i++)
{ is = sort[i];
+ if (ag->genetype != ts[is].genetype) continue;
c = ts[is].start;
e = ts[is].stop;
if (e < c)
{ e += psmax;
if (a > e) goto NXTN;
if (b < c) goto NXTN;
- if (ts[is].genetype != t->genetype) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
- if (score >= proximity)
- if (proxtype)
- { if (score > scoremax)
- { n = i;
- scoremax = score; }}
- else
- if (ts[is].energy > energy)
- { n = i;
- scoremax = score;
- energy = ts[is].energy; }
+ prox = (a >= c)?((b >= e)?e-a:alen):((b >= e)?e-c:b-c);
+ if (prox >= proximity)
+ if ((score = proximity_compare(d,is,prox,e-c,alen,ag,sw)) > scoremax)
+ { n = is;
+ scoremax = score; }
NXTN:
c -= psmax;
e -= psmax; }
if (a > e) continue;
if (b < c) continue;
- if (ts[is].genetype != t->genetype) continue;
- score = (a >= c)?((b >= e)?e-a:thresh):((b >= e)?thresh:b-c);
- if (score >= proximity)
- if (proxtype)
- { if (score > scoremax)
- { n = i;
- scoremax = score; }}
- else
- if (ts[is].energy > energy)
- { n = i;
- scoremax = score;
- energy = ts[is].energy; } }
- *overlap = (scoremax + 1);
+ prox = (a >= c)?((b >= e)?e-a:alen):((b >= e)?e-c:b-c);
+ if (prox >= proximity)
+ if ((score = proximity_compare(d,is,prox,e-c,alen,ag,sw)) > scoremax)
+ { n = is;
+ scoremax = score; }}
+ *scorep = scoremax;
return(n); }
+
void disp_match(data_set *d, int *sort, int nd, csw *sw)
-{ int i,ld,fn,fp,fpd,fptv,w,alen,overlap,length,detect[NGFT],n[NS];
- char nm[100],anm[100],ps[100],*s;
- FILE *f = sw->f;
- gene *t;
- annotated_gene *agene;
- static char comp[3] = " c";
- for (i = 0; i < NS; i++) n[i] = 0;
- for (i = 0; i < nd; i++)
- { w = sort[i];
- if (ts[w].energy >= 0.0)
- { n[NS-1]++;
- n[ts[w].genetype]++; }}
- fprintf(f,"\n%s\n",d->seqname);
- fprintf(f,"%ld nucleotides in sequence\n",d->psmax);
- fprintf(f,"Mean G+C content = %2.1f%%\n",100.0*d->gc);
- fprintf(f,"GenBank to Aragorn Comparison\n");
- if (sw->trna | sw->mtrna)
- { fn = 0;
- fp = 0;
- fpd = 0;
- fptv = 0;
- fprintf(f,"\n%d annotated tRNA genes\n",d->nagene[tRNA]);
- fprintf(f,"%d detected tRNA genes\n\n",n[tRNA]);
- fprintf(f," GenBank\t\t\t\tAragorn\n");
- ld = 0;
- for (i = 0; i < d->nagene[NS-1]; i++)
- { agene = d->gene + i;
- if (agene->genetype != tRNA) continue;
- detect[i] = nearest_detected_gene(d,sort,nd,0,&overlap,agene);
- while (ld < nd)
- { t = ts + sort[ld];
- if (detect[i] >= 0)
- if (ld >= detect[i]) break;
- if (t->start < t->stop)
- if (t->start > agene->start) break;
- fprintf(f,"* Not annotated %s ",name(t,nm,1,sw));
- fprintf(f,"%s",position(ps,t,sw));
- if (sw->reportpseudogenes)
- if (pseudogene(t))
- fprintf(f," PS");
- fputc('\n',f);
- fp++;
- if (t->genetype == tRNA)
- { if (t->dstem == 0) fpd++;
- if (t->tstem == 0) fptv++; }
- ld++; }
- if (detect[i] >= 0)
- { ld = detect[i] + 1;
- w = 0;
- t = ts + sort[detect[i]];
- s = aa(t->seq + t->anticodon,sw);
- if (!softstrpos(s,agene->species+5)) w += 1;
- if (agene->comp != t->comp) w += 2;
- alen = agene->stop - agene->start;
- if (alen < 0) alen = -alen;
- if (alen < (t->nbase - 10)) w += 4;
- else if (alen > (t->nbase + 10)) w += 4;
- if (w > 0) fputc('*',f);
- else fputc(' ',f); }
- else
- fputc('*',f);
- sprintf(anm," %s %c(%ld,%ld)",
- agene->species,comp[agene->comp],agene->start,agene->stop);
- fprintf(f,"%-30s ",anm);
- if (detect[i] >= 0)
- { fprintf(f,"%s ",name(t,nm,1,sw));
- fprintf(f,"%s",position(ps,t,sw));
- if (sw->reportpseudogenes)
- if (pseudogene(t))
- fprintf(f," PS");
- if (w & 1) fprintf(f," AAM");
- if (w & 2) fprintf(f," SM");
- if (w & 4) fprintf(f," LM");
- fputc('\n',f); }
- else
- { fprintf(f,"Not detected\n");
- fn++; }}
- while (ld < nd)
- { fprintf(f,"* Not annotated\t\t\t%s ",name(ts + sort[ld],nm,1,sw));
- fprintf(f,"%s\n",position(ps,ts + sort[ld],sw));
- fp++;
- if (t->genetype == tRNA)
- { if (t->dstem == 0) fpd++;
- if (t->tstem == 0) fptv++; }
- ld++; }
- fprintf(f,"\nNumber of false negative genes = %d\n",fn);
- fprintf(f,"Number of false positive genes = %d\n",fp);
- fprintf(f,"Number of false positive D-replacement tRNA genes = %d\n",fpd);
- fprintf(f,"Number of false positive TV-replacement tRNA genes = %d\n",fptv);
- fprintf(f,"\n\n");
- sw->nagene[tRNA] += d->nagene[tRNA];
- sw->natfn += fn;
- sw->natfp += fp;
- sw->natfpd += fpd;
- sw->natfptv += fptv; }
- if (sw->cds)
- { fn = 0;
- fp = 0;
- fprintf(f,"\n%d annotated CDS genes\n",d->nagene[CDS]);
- fprintf(f,"%d detected CDS genes\n\n",n[CDS]);
- fprintf(f," GenBank\t\t\t\t Aragorn\n");
- ld = 0;
- for (i = 0; i < d->nagene[NS-1]; i++)
- { agene = d->gene + i;
- if (agene->genetype != CDS) continue;
- length = (int)(agene->stop - agene->start) + 1;
- sw->lacds += length;
- detect[i] = nearest_detected_gene(d,sort,nd,1,&overlap,agene);
- while (ld < nd)
- { t = ts + sort[ld];
- if (detect[i] >= 0)
- if (ld >= detect[i]) break;
- if (t->start < t->stop)
- if (t->start > agene->start) break;
- fprintf(f,"* Not annotated ");
- sprintf(anm,"%s %s",
- name(t,nm,1,sw),position(ps,t,sw));
- fprintf(f,"%-18s",anm);
- if (sw->energydisp) fprintf(f," %lg",t->energy);
- if (sw->reportpseudogenes)
- if (pseudogene(t))
- fprintf(f," PS");
- fputc('\n',f);
- fp++;
- ld++; }
- if (detect[i] >= 0)
- { ld = detect[i] + 1;
- t = ts + sort[detect[i]];
- fputc(' ',f); }
- else
- fputc('*',f);
- fprintf(f," %-33s",agene->species);
- sprintf(anm,"%c(%ld,%ld)",comp[agene->comp],agene->start,agene->stop);
- fprintf(f,"%14s ",anm);
- if (detect[i] >= 0)
- { sprintf(anm,"%s %s",name(t,nm,1,sw),position(ps,t,sw));
- fprintf(f,"%-18s",anm);
- if (sw->energydisp) fprintf(f," %lg",t->energy);
- if (sw->reportpseudogenes)
- if (pseudogene(t))
- fprintf(f," PS");
- fputc('\n',f);
- length = (int)(t->stop - t->start) + 1;
- sw->ldcds += length; }
- else
- { fprintf(f,"Not detected\n");
- fn++; }}
- while (ld < nd)
- { t = ts + sort[ld];
- fprintf(f,"* Not annotated ");
- sprintf(anm,"%s %s",name(t,nm,1,sw),position(ps,t,sw));
- fprintf(f,"%-18s",anm);
- if (sw->energydisp) fprintf(f," %lg",t->energy);
- if (sw->reportpseudogenes)
- if (pseudogene(t))
- fprintf(f," PS");
- fputc('\n',f);
- fp++;
- ld++; }
- fprintf(f,"\nNumber of false negative CDS genes = %d\n",fn);
- fprintf(f,"Number of false positive CDS genes = %d\n",fp);
- fprintf(f,"\n\n");
- sw->nagene[CDS] += d->nagene[CDS];
- sw->nacdsfn += fn;
- sw->nacdsfp += fp; }
- sw->nabase += d->psmax; }
+{
+int i,ld,fn[NS],fp[NS],fpd,fptv,w,score,detect,n[NS];
+int prevannoted,nl,k,csort[NGFT],*msort;
+long start;
+char tag[52],nm[100],anm[100],ps[100],mreport[100],*s;
+FILE *f = sw->f;
+gene *t;
+annotated_gene *agene,*a;
+static char gp[2][7] = { "genes","gene" };
+static char comp[3] = " c";
+static char aps[2][5] = { " ","PS" };
+nl = nd;
+if (sw->trna | sw->mtrna) nl += d->nagene[tRNA];
+if (sw->tmrna) nl += d->nagene[tmRNA];
+if (nl < NGFT) msort = csort;
+else
+ { msort = (int *)malloc(nl*sizeof(int));
+ if (msort == NULL)
+ { fprintf(stderr,"Not enough memory to match genes\n");
+ return; }}
+fprintf(f,"\n%s\n",d->seqname);
+fprintf(f,"%ld nucleotides in sequence\n",d->psmax);
+fprintf(f,"Mean G+C content = %2.1f%%\n",100.0*d->gc);
+fprintf(f,"\nGenBank to Aragorn comparison\n\n");
+sw->dispmatch = 1;
+for (i = 0; i < NS; i++)
+ { n[i] = 0;
+ fn[i] = 0;
+ fp[i] = 0; }
+for (i = 0; i < nd; i++)
+ { w = sort[i];
+ if (ts[w].energy >= 0.0)
+ { n[NS-1]++;
+ n[ts[w].genetype]++; }
+ ts[w].annotation = -1;
+ ts[w].annosc = -1; }
+if (sw->trna | sw->mtrna | sw->tmrna)
+ { fpd = 0;
+ fptv = 0;
+ if (sw->trna | sw->mtrna)
+ { fprintf(f,"%d annotated tRNA %s\n",d->nagene[tRNA],gp[(d->nagene[tRNA]==1)?1:0]);
+ fprintf(f,"%d detected tRNA %s\n",n[tRNA],gp[(n[tRNA]==1)?1:0]); }
+ if (sw->tmrna)
+ { fprintf(f,"%d annotated tmRNA %s\n",d->nagene[tmRNA],gp[(d->nagene[tmRNA]==1)?1:0]);
+ fprintf(f,"%d detected tmRNA %s\n",n[tmRNA],gp[(n[tmRNA]==1)?1:0]); }
+ fprintf(f,"\n GenBank Aragorn\n");
+ nl = 0;
+ for (i = 0; i < d->nagene[NS-1]; i++)
+ { agene = d->gene + i;
+ agene->detected = -1;
+ if (agene->genetype != tRNA)
+ { if (agene->genetype != tmRNA) continue;
+ else if (!sw->tmrna) continue; }
+ else if (!sw->trna) if (!sw->mtrna) continue;
+ a = agene;
+ k = i;
+ while ((a->detected = nearest_detected_gene(d,sort,nd,&score,a,sw)) >= 0)
+ { t = ts + a->detected;
+ prevannoted = t->annotation;
+ t->annotation = k;
+ t->annosc = score;
+ if (prevannoted < 0) break;
+ if (prevannoted == k) break;
+ if (prevannoted == i) break;
+ a = d->gene + prevannoted;
+ k = prevannoted; }
+ k = nl;
+ while (--k >= 0)
+ { if (agene->start >= d->gene[msort[k]].start) break;
+ msort[k+1] = msort[k]; }
+ msort[++k] = i;
+ nl++; }
+ for (i = 0; i < nd; i++)
+ { t = ts + sort[i];
+ if (t->annotation >= 0) continue;
+ if (t->genetype != tRNA)
+ { if (t->genetype != tmRNA) continue;
+ else if (!sw->tmrna) continue; }
+ else if (!sw->trna) if (!sw->mtrna) continue;
+ k = nl;
+ while (--k >= 0)
+ { if (msort[k] >= 0) start = d->gene[msort[k]].start;
+ else start = ts[-1-msort[k]].start;
+ if (t->start >= start) break;
+ msort[k+1] = msort[k]; }
+ msort[++k] = -(sort[i] + 1);
+ nl++; }
+ for (i = 0; i < nl; i++)
+ { if (msort[i] >= 0)
+ { agene = d->gene + msort[i];
+ detect = agene->detected;
+ if (detect >= 0)
+ { t = ts + detect;
+ w = gene_mismatch_report(d,agene,t,mreport,sw);
+ if (w > 0) fputc('*',f);
+ else fputc(' ',f); }
+ else fputc('*',f);
+ sprintf(anm," %-11s%c(%ld,%ld) %s",
+ agene->species,comp[agene->comp],
+ sq(agene->start),sq(agene->stop),aps[agene->pseudogene]);
+ fprintf(f,"%-45s ",anm);
+ if (detect >= 0)
+ { fprintf(f,"%s ",name(t,nm,1,sw));
+ if (t->comp == 0) fputc(' ',f);
+ fprintf(f,"%s",position(ps,t,sw));
+ if (sw->energydisp) fprintf(f," %7.3lf",t->energy);
+ if (t->genetype == tmRNA)
+ { peptide_tag(tag,50,t,sw);
+ fprintf(f," %s",tag); }
+ if (sw->reportpseudogenes)
+ if (pseudogene(t,sw))
+ fprintf(f," PS");
+ if (w > 0) fprintf(f," %s",mreport);
+ fputc('\n',f); }
+ else
+ { fprintf(f,"Not detected\n");
+ fn[agene->genetype]++; }}
+ else
+ { t = ts - (msort[i] + 1);
+ fprintf(f,"* Not annotated %s ",name(t,nm,1,sw));
+ if (t->comp == 0) fputc(' ',f);
+ fprintf(f,"%s",position(ps,t,sw));
+ if (sw->energydisp) fprintf(f," %7.3lf",t->energy);
+ if (t->genetype == tmRNA)
+ { peptide_tag(tag,50,t,sw);
+ fprintf(f," %s",tag); }
+ if (sw->reportpseudogenes)
+ if (pseudogene(t,sw))
+ fprintf(f," PS");
+ fputc('\n',f);
+ fp[t->genetype]++;
+ if (t->genetype == tRNA)
+ { if (t->dstem == 0) fpd++;
+ if (t->tstem == 0) fptv++; }}}
+ fprintf(f,"\n");
+ if (sw->trna | sw->mtrna)
+ { fprintf(f,"Number of annotated tRNA genes not detected = %d\n",fn[tRNA]);
+ fprintf(f,"Number of unannotated tRNA genes detected = %d\n",fp[tRNA]); }
+ if (sw->mtrna)
+ { fprintf(f,"Number of unannotated D-replacement tRNA genes detected = %d\n",fpd);
+ fprintf(f,"Number of unannotated TV-replacement tRNA genes detected = %d\n",fptv); }
+ if (sw->tmrna)
+ { fprintf(f,"Number of annotated tmRNA genes not detected = %d\n",fn[tmRNA]);
+ fprintf(f,"Number of unannotated tmRNA genes detected = %d\n",fp[tmRNA]); }
+ fprintf(f,"\n\n");
+ for (i = tRNA; i <= tmRNA; i++)
+ { sw->nagene[i] += d->nagene[i];
+ sw->nafn[i] += fn[i];
+ sw->nafp[i] += fp[i]; }
+ if (sw->mtrna)
+ { sw->natfpd += fpd;
+ sw->natfptv += fptv; }}
+sw->nabase += d->psmax;
+sw->dispmatch = 0;
+if (nl >= NGFT) free((void *)msort);
+}
+void annotation_overlap_check(data_set *d, gene *t, char *margin, csw *sw)
+{
+int a,m,n,w,list[20],score[20];
+char mreport[100];
+static char comp[3] = " c";
+n = nearest_annotated_gene(d,t,list,score,20,sw);
+if (n < 1) m = -1;
+else
+ { m = 0;
+ a = list[m];
+ if (d->gene[a].genetype != t->genetype) m = -1;
+ else
+ { w = gene_mismatch_report(d,d->gene+a,t,mreport,sw);
+ if (w & 1)
+ { if ((score[m] - 5000) < (3*seqlen(t)/4)) m = -1; }
+ else
+ { if ((score[m] - 5000) < (seqlen(t)/3)) m = -1; }}}
+if (m < 0)
+ fprintf(sw->f,"%sNot annotated\n",margin);
+else
+ { a = list[m];
+ fprintf(sw->f,"%sMatch with annotated %s %c(%ld,%ld)",
+ margin,d->gene[a].species,comp[d->gene[a].comp],
+ d->gene[a].start,d->gene[a].stop);
+ w = gene_mismatch_report(d,d->gene+a,t,mreport,sw);
+ if (w > 0) fprintf(sw->f," * %s",mreport);
+ fputc('\n',sw->f); }
+while (++m < n)
+ { a = list[m];
+ fprintf(sw->f,"%sOverlap with annotated %s %c(%ld,%ld)\n",
+ margin,d->gene[a].species,comp[d->gene[a].comp],
+ d->gene[a].start,d->gene[a].stop); }
+fputc('\n',sw->f);
+}
+
void disp_gene_set(data_set *d, int nt, csw *sw)
-{ int i,j,n,a,vsort[NT],*sort;
+{ int i,j,n,vsort[NT],*sort;
char m[MATX][MATY],s[20];
- static char comp[3] = " c";
gene *t;
FILE *f = sw->f;
if (nt <= NT)
@@ -9430,13 +10755,7 @@ void disp_gene_set(data_set *d, int nt, csw *sw)
{ fprintf(f," Iso-acceptor mismatch\n");
sw->iamismatch++; }
if (sw->annotated)
- if ((a = nearest_annotated_gene(d,t,1)) < 0)
- { fprintf(f," Annotation false positive\n");
- if ((a = nearest_annotated_gene(d,t,0)) >= 0)
- fprintf(f," Overlap with %s %c(%ld,%ld)\n",
- d->gene[a].species,comp[d->gene[a].comp],
- d->gene[a].start,d->gene[a].stop);
- fputc('\n',f); }
+ annotation_overlap_check(d,t," ",sw);
overlap(d,sort,n,i,sw);
if (sw->seqdisp) disp_seq(f,t,sw);
if (t->nintron > 0) disp_intron(f,t,sw);
@@ -9448,12 +10767,19 @@ void disp_gene_set(data_set *d, int nt, csw *sw)
disp_gene(t,m,sw);
sprintf(s,"%d.",j);
xcopy(m,0,32,s,length(s));
- disp_matrix(f,m,MATY); }
+ disp_matrix(f,m,MATY);
+ if (sw->annotated)
+ annotation_overlap_check(d,t," ",sw); }
else
{ fprintf(f,"\n%d.\n",j);
disp_location(t,sw,"Location");
+ if (sw->reportpseudogenes)
+ if (pseudogene(t,sw))
+ fprintf(f,"Possible Pseudogene\n");
if (sw->energydisp)
- fprintf(f,"Score = %g\n",t->energy); }
+ fprintf(f,"Score = %g\n",t->energy);
+ if (sw->annotated)
+ annotation_overlap_check(d,t,"",sw); }
overlap(d,sort,n,i,sw);
if (t->asst == 0) disp_tmrna_seq(f,t,sw);
else disp_tmrna_perm_seq(f,t,sw);
@@ -9462,6 +10788,8 @@ void disp_gene_set(data_set *d, int nt, csw *sw)
case CDS:
fprintf(f,"\n%d.\nCDS gene\n",j);
disp_location(t,sw,"Location");
+ if (sw->annotated)
+ annotation_overlap_check(d,t,"",sw);
overlap(d,sort,n,i,sw);
disp_cds(f,t,sw);
break;
@@ -9611,7 +10939,7 @@ void iopt_fastafile(data_set *d, csw *sw)
int *s,*sf,*se,*sc,*swrap;
int seq[2*LSEQ+WRAP+1],cseq[2*LSEQ+WRAP+1],wseq[2*WRAP+1];
long gap,start,rewind,drewind,psmax,tmaxlen,vstart,vstop;
- double sensitivity,sel1,sel2;
+ double sens,sel1,sel2;
char c1,c2,c3;
static char trnatypename[3][25] =
{ "Metazoan mitochondrial","Cytosolic","Mammalian mitochondrial" };
@@ -9639,7 +10967,9 @@ void iopt_fastafile(data_set *d, csw *sw)
"deleted -> standard",
"Trematode Mitochondrial",
"Scenedesmus obliquus Mitochondrial",
- "Thraustochytrium Mitochondrial" };
+ "Thraustochytrium Mitochondrial",
+ "Pterobranchia mitochondrial",
+ "Gracilibacteria" };
FILE *f = sw->f;
init_tmrna(f,sw);
aragorn = (sw->trna || sw->tmrna || sw->cds || sw->srprna);
@@ -9718,9 +11048,12 @@ void iopt_fastafile(data_set *d, csw *sw)
sw->roffset = rewind;
drewind = 2*rewind;
d->ns = 0;
+ d->nf = 0;
d->nextseq = 0L;
+ d->nextseqoff = 0L;
while (d->nextseq >= 0L)
{ d->seqstart = d->nextseq;
+ d->seqstartoff = d->nextseqoff;
if (!seq_init(d,sw)) break;
psmax = d->psmax;
if (sw->verbose)
@@ -9817,7 +11150,9 @@ void iopt_fastafile(data_set *d, csw *sw)
while (s < se) *s++ = *sf++;
start += len - drewind;
goto NX; }
+ if (nt < 1) d->nf++;
if (sw->maxintronlen > 0) remove_overlapping_trna(d,nt,sw);
+ if (sw->updatetmrnatags) update_tmrna_tag_database(ts,nt,sw);
disp_gene_set(d,nt,sw);
if (sw->verbose) fprintf(stderr,"%s\nSearch Finished\n\n",d->seqname);
d->ns++; }
@@ -9831,50 +11166,71 @@ void iopt_fastafile(data_set *d, csw *sw)
if (sw->reportpseudogenes)
if (sw->nps > 0)
fprintf(f,"Total number of possible pseudogenes = %d\n",sw->nps);
+ if (d->nf > 0)
+ { sens = 100.0*(d->ns - d->nf)/d->ns;
+ fprintf(f,"Nothing found in %d sequences (%.2lf%% sensitivity)\n",d->nf,sens); }
if (sw->annotated)
{ if (sw->trna | sw->mtrna)
{ fprintf(f,"\nTotal number of annotated tRNA genes = %d\n",
sw->nagene[tRNA]);
- fprintf(f,"Total number of annotated false negatives = %d\n",sw->natfn);
- fprintf(f,"Total number of annotated false positives = %d\n",sw->natfp);
- fprintf(f,"Total number of annotated DRL false positives = %d\n",
+ fprintf(f,"Total number of annotated tRNA genes not detected = %d\n",sw->nafn[tRNA]);
+ fprintf(f,"Total number of unannotated tRNA genes detected = %d\n",sw->nafp[tRNA]);
+ fprintf(f,"Total number of unannotated DRL tRNA genes detected = %d\n",
sw->natfpd);
- fprintf(f,"Total number of annotated TVRL false positives = %d\n",
+ fprintf(f,"Total number of unannotated TVRL tRNA genes detected = %d\n",
sw->natfptv);
fprintf(f,"Total annotated sequence length = %ld bases\n",sw->nabase);
- sensitivity = (sw->nagene[tRNA] > 0)?
- 100.0*(double)(sw->nagene[tRNA] - sw->natfn)/
+ sens = (sw->nagene[tRNA] > 0)?
+ 100.0*(double)(sw->nagene[tRNA] - sw->nafn[tRNA])/
(double)sw->nagene[tRNA]:0.0;
sel1 = (sw->nagene[tRNA] > 0)?
- 100.0*(double)(sw->natfp)/
+ 100.0*(double)(sw->nafp[tRNA])/
(double)sw->nagene[tRNA]:0.0;
sel2 = (sw->nabase > 0)?
- 1000000.0*(double)(sw->natfp)/
+ 1000000.0*(double)(sw->nafp[tRNA])/
(double)sw->nabase:0.0;
- fprintf(f,"Sensitivity = %lg%%\n",sensitivity);
+ fprintf(f,"Sensitivity = %lg%%\n",sens);
+ fprintf(f,"Selectivity = %lg%% or %lg per Megabase\n\n",sel1,sel2); }
+ if (sw->tmrna)
+ { fprintf(f,"\nTotal number of annotated tmRNA genes = %d\n",
+ sw->nagene[tmRNA]);
+ fprintf(f,"Total number of annotated tmRNA genes not detected = %d\n",sw->nafn[tmRNA]);
+ fprintf(f,"Total number of unannotated tmRNA genes detected = %d\n",sw->nafp[tmRNA]);
+ fprintf(f,"Total annotated sequence length = %ld bases\n",sw->nabase);
+ sens = (sw->nagene[tmRNA] > 0)?
+ 100.0*(double)(sw->nagene[tmRNA] - sw->nafn[tmRNA])/
+ (double)sw->nagene[tmRNA]:0.0;
+ sel1 = (sw->nagene[tmRNA] > 0)?
+ 100.0*(double)(sw->nafp[tmRNA])/
+ (double)sw->nagene[tmRNA]:0.0;
+ sel2 = (sw->nabase > 0)?
+ 1000000.0*(double)(sw->nafp[tmRNA])/
+ (double)sw->nabase:0.0;
+ fprintf(f,"Sensitivity = %lg%%\n",sens);
fprintf(f,"Selectivity = %lg%% or %lg per Megabase\n\n",sel1,sel2); }
if (sw->cds)
{ fprintf(f,"\nTotal number of annotated CDS genes = %d\n",
sw->nagene[CDS]);
- fprintf(f,"Total number of annotated false negatives = %d\n",sw->nacdsfn);
- fprintf(f,"Total number of annotated false positives = %d\n",sw->nacdsfp);
+ fprintf(f,"Total number of annotated CDS genes not detected = %d\n",sw->nafn[CDS]);
+ fprintf(f,"Total number of unannotated CDS genes detected = %d\n",sw->nafp[CDS]);
fprintf(f,"Total annotated sequence length = %ld bases\n",sw->nabase);
- sensitivity = (sw->nagene[CDS] > 0)?
- 100.0*(double)(sw->nagene[CDS] - sw->nacdsfn)/
+ sens = (sw->nagene[CDS] > 0)?
+ 100.0*(double)(sw->nagene[CDS] - sw->nafn[CDS])/
(double)sw->nagene[CDS]:0.0;
sel1 = (sw->nagene[CDS] > 0)?
- 100.0*(double)(sw->nacdsfp)/
+ 100.0*(double)(sw->nafp[CDS])/
(double)sw->nagene[CDS]:0.0;
sel2 = (sw->nabase > 0)?
- 1000000.0*(double)(sw->nacdsfp)/
+ 1000000.0*(double)(sw->nafp[CDS])/
(double)sw->nabase:0.0;
- fprintf(f,"Sensitivity = %lg%%\n",sensitivity);
+ fprintf(f,"Sensitivity = %lg%%\n",sens);
fprintf(f,"Selectivity = %lg%% or %lg per Megabase\n",sel1,sel2);
- sensitivity = (sw->lacds > 0)?
+ sens = (sw->lacds > 0)?
100.0*(double)sw->ldcds/(double)sw->lacds:0.0;
- fprintf(f,"Length sensitivity = %lg%%\n\n",sensitivity); }
+ fprintf(f,"Length sensitivity = %lg%%\n\n",sens); }
} }
- }
+ if (sw->updatetmrnatags) report_new_tmrna_tags(sw);
+}
void bopt_fastafile(data_set *d, csw *sw)
@@ -9882,6 +11238,7 @@ void bopt_fastafile(data_set *d, csw *sw)
int *s,*sf,*se,*sc,*swrap;
int seq[2*LSEQ+WRAP+1],cseq[2*LSEQ+WRAP+1],wseq[2*WRAP+1];
long gap,start,rewind,drewind,psmax,tmaxlen,vstart,vstop;
+ double sens;
FILE *f = sw->f;
rewind = MAXTAGDIST + 20;
if (sw->trna | sw->mtrna)
@@ -9896,9 +11253,12 @@ void bopt_fastafile(data_set *d, csw *sw)
sw->roffset = rewind;
drewind = 2*rewind;
d->ns = 0;
+ d->nf = 0;
d->nextseq = 0L;
+ d->nextseqoff = 0L;
while (d->nextseq >= 0L)
{ d->seqstart = d->nextseq;
+ d->seqstartoff = d->nextseqoff;
if (!seq_init(d,sw)) break;
psmax = d->psmax;
if (sw->verbose)
@@ -9982,7 +11342,9 @@ void bopt_fastafile(data_set *d, csw *sw)
while (s < se) *s++ = *sf++;
start += len - drewind;
goto NX; }
+ if (nt < 1) d->nf++;
if (sw->maxintronlen > 0) remove_overlapping_trna(d,nt,sw);
+ if (sw->updatetmrnatags) update_tmrna_tag_database(ts,nt,sw);
batch_gene_set(d,nt,sw);
if (sw->verbose) fprintf(stderr,"%s\nSearch Finished\n\n",d->seqname);
d->ns++; }
@@ -9990,168 +11352,19 @@ void bopt_fastafile(data_set *d, csw *sw)
{ fprintf(f,">end \t%d sequences",d->ns);
if (sw->trna || sw->mtrna) fprintf(f," %d tRNA genes",sw->ngene[tRNA]);
if (sw->tmrna) fprintf(f," %d tmRNA genes",sw->ngene[tmRNA]);
- fputc('\n',f); } }
+ if (d->nf > 0)
+ { sens = 100.0*(d->ns - d->nf)/d->ns;
+ fprintf(f,", nothing found in %d sequences, (%.2lf%% sensitivity)",d->nf,sens); }
+ fputc('\n',f); }
+ if (sw->updatetmrnatags) report_new_tmrna_tags(sw);
+}
void aragorn_help_menu()
-{ printf("\n");
- printf("----------------------------\n");
- printf("ARAGORN v1.2.36 Dean Laslett\n");
- printf("----------------------------\n");
- printf("\n");
- printf("Please reference the following papers if you use this\n");
- printf("program as part of any published research.\n\n");
- printf("Laslett, D. and Canback, B. (2004) ARAGORN, a\n");
- printf("program for the detection of transfer RNA and transfer-messenger\n");
- printf("RNA genes in nucleotide sequences\n");
- printf("Nucleic Acids Research, 32;11-16\n\n");
- printf("Laslett, D. and Canback, B. (2008) ARWEN: a\n");
- printf("program to detect tRNA genes in metazoan mitochondrial\n");
- printf("nucleotide sequences\n");
- printf("Bioinformatics, 24(2); 172-175.\n\n\n");
- printf("ARAGORN detects tRNA, mtRNA, and tmRNA genes.\n");
- printf("\n");
- printf("Usage:\n");
- printf("aragorn -v -s -d -c -l -a -w -j -ifro<min>,<max> -t -mt -m");
- printf(" -tv -gc -seq -br -fasta -fo -o <outfile> <filename>\n\n");
- printf("<filename> is assumed to contain one or more sequences\n");
- printf("in FASTA format. Results of the search are printed to\n");
- printf("STDOUT. All switches are optional and case-insensitive.\n");
- printf("Unless -i is specified, tRNA genes containing introns\n");
- printf("are not detected. \n");
- printf("\n");
- printf(" -m Search for tmRNA genes.\n");
- printf(" -t Search for tRNA genes.\n");
- printf(" By default, both are detected. If one of\n");
- printf(" -m or -t is specified, then the other\n");
- printf(" is not detected unless specified as well.\n");
- printf(" -mt Search for Metazoan mitochondrial tRNA\n");
- printf(" genes. -i switch ignored. Composite\n");
- printf(" Metazoan mitochondrial genetic code used.\n");
- printf(" -mtmam Search for Mammalian mitochondrial tRNA\n");
- printf(" genes. -i switch ignored. -tv switch set.\n");
- printf(" Mammalian mitochondrial genetic code used.\n");
- printf(" -mtx Same as -mt but low scoring tRNA genes are\n");
- printf(" not reported.\n");
- printf(" -gc<num> Use the GenBank transl_table = <num> genetic code.\n");
- printf(" -gcstd Use standard genetic code.\n");
- printf(" -gcmet Use composite Metazoan mitochondrial genetic code.\n");
- printf(" -gcvert Use Vertebrate mitochondrial genetic code.\n");
- printf(" -gcinvert Use Invertebrate mitochondrial genetic code.\n");
- printf(" -gcyeast Use Yeast mitochondrial genetic code.\n");
- printf(" -gcprot Use Mold/Protozoan/Coelenterate");
- printf(" mitochondrial genetic code.\n");
- printf(" -gcciliate Use Ciliate genetic code.\n");
- printf(" -gcflatworm Use Echinoderm/Flatworm mitochondrial genetic code.\n");
- printf(" -gceuplot Use Euplotid genetic code.\n");
- printf(" -gcbact Use Bacterial/Plant Chloroplast genetic code.\n");
- printf(" -gcaltyeast Use alternative Yeast genetic code.\n");
- printf(" -gcascid Use Ascidian Mitochondrial genetic code.\n");
- printf(" -gcaltflat Use alternative Flatworm Mitochondrial genetic code.\n");
- printf(" -gcblep Use Blepharisma genetic code.\n");
- printf(" -gcchloroph Use Chlorophycean Mitochondrial genetic code.\n");
- printf(" -gctrem Use Trematode Mitochondrial genetic code.\n");
- printf(" -gcscen Use Scenedesmus obliquus Mitochondrial genetic code.\n");
- printf(" -gcthraust Use Thraustochytrium Mitochondrial genetic code.\n");
- printf(" Individual modifications can be appended using\n");
- printf(" ,BBB=<aa> B = A,C,G, or T. <aa> is the three letter\n");
- printf(" code for an amino-acid. More than one modification\n");
- printf(" can be specified. eg -gcvert,aga=Trp,agg=Trp uses\n");
- printf(" the Vertebrate Mitochondrial code and the codons\n");
- printf(" AGA and AGG changed to Tryptophan.\n");
- printf(" -tv Do not search for mitochondrial ");
- printf("TV replacement\n");
- printf(" loop tRNA genes. Only relevant if -mt used. \n");
- printf(" -i Search for tRNA genes with introns in\n");
- printf(" anticodon loop with maximum length %d\n",
- MAXINTRONLEN);
- printf(" bases. Minimum intron length is 0 bases.\n");
- printf(" Ignored if -m is specified.\n");
- printf(" -i<max> Search for tRNA genes with introns in\n");
- printf(" anticodon loop with maximum length <max>\n");
- printf(" bases. Minimum intron length is 0 bases.\n");
- printf(" Ignored if -m is specified.\n");
- printf(" -i<min>,<max> Search for tRNA genes with introns in\n");
- printf(" anticodon loop with maximum length <max>\n");
- printf(" bases, and minimum length <min> bases.\n");
- printf(" Ignored if -m is specified.\n");
- printf(" -io Same as -i, but allow tRNA genes with long\n");
- printf(" introns to overlap shorter tRNA genes.\n");
- printf(" -if Same as -i, but fix intron between positions\n");
- printf(" 37 and 38 on C-loop (one base after anticodon).\n");
- printf(" -ifo Same as -if and -io combined.\n");
- printf(" -ir Same as -i, but search for tRNA genes with minimum intron\n");
- printf(" length 0 bases, and only report tRNA genes with minimum\n");
- printf(" intron length <min> bases.\n");
- printf(" -c Assume that each sequence has a circular\n");
- printf(" topology. Search wraps around each end.\n");
- printf(" Default setting.\n");
- printf(" -l Assume that each sequence has a linear\n");
- printf(" topology. Search does not wrap.\n");
- printf(" -d Double. Search both strands of each\n");
- printf(" sequence. Default setting.\n");
- printf(" -s or -s+ Single. Do not search the complementary\n");
- printf(" (antisense) strand of each sequence.\n");
- printf(" -sc or -s- Single complementary. Do not search the sense\n");
- printf(" strand of each sequence.\n");
- printf(" -ss Use the stricter canonical 1-2 bp spacer1 and\n");
- printf(" 1 bp spacer2. Ignored if -mt set. Default is to\n");
- printf(" allow 3 bp spacer1 and 0-2 bp spacer2, which may\n");
- printf(" degrade selectivity.\n");
- printf(" -ps Lower scoring thresholds to 95%% of default levels.\n");
- printf(" -ps<num> Change scoring thresholds to <num> percent of default levels.\n");
- printf(" -rp Flag possible pseudogenes (score < 100 or tRNA anticodon\n");
- printf(" loop <> 7 bases long). Note that genes with score < 100\n");
- printf(" will not be detected or flagged if scoring thresholds are not\n");
- printf(" also changed to below 100%% (see -ps switch).\n");
- printf(" -seq Print out primary sequence.\n");
- printf(" -br Show secondary structure of tRNA gene primary\n");
- printf(" sequence with round brackets.\n");
- printf(" -fasta Print out primary sequence in fasta format.\n");
- printf(" -fo Print out primary sequence in fasta format only\n");
- printf(" (no secondary structure).\n");
- printf(" -fon Same as -fo, with sequence and gene numbering in header.\n");
- printf(" -fos Same as -fo, with no spaces in header.\n");
- printf(" -fons Same as -fo, with sequence and gene numbering, but no spaces.\n");
- printf(" -j Display 4-base sequence on 3' end of astem\n");
- printf(" regardless of predicted amino-acyl acceptor\n");
- printf(" length.\n");
- printf(" -jr Allow some divergence of 3' ");
- printf("amino-acyl acceptor\n");
- printf(" sequence from NCCA.\n");
- printf(" -jr4 Allow some divergence of 3' ");
- printf("amino-acyl acceptor\n");
- printf(" sequence from NCCA, and display 4 bases.\n");
- printf(" -v Verbose. Prints out search progress\n");
- printf(" to STDERR.\n");
- printf(" -a Print out tRNA domain for tmRNA genes\n");
- printf(" -o <outfile> print output into <outfile>. If <outfile>\n");
- printf(" exists, it is overwritten.\n");
- printf(" By default, output goes to STDOUT.\n");
- printf(" -w Print out genes in batch mode.\n");
- printf(" For tRNA genes, output is in the form:\n\n");
- printf(" Sequence name\n");
- printf(" N genes found\n");
- printf(" 1 tRNA-<species> [locus 1]");
- printf(" <Apos> (nnn)\n");
- printf(" i(<intron position>,<intron length>)\n");
- printf(" . \n");
- printf(" . \n");
- printf(" N tRNA-<species> [Locus N]");
- printf(" <Apos> (nnn)\n");
- printf(" i(<intron position>,<intron length>)\n");
- printf("\n N is the number of genes found\n");
- printf(" <species> is the tRNA iso-acceptor species\n");
- printf(" <Apos> is the tRNA anticodon ");
- printf("relative position\n");
- printf(" (nnn) is the tRNA anticodon base triplet\n");
- printf(" i means the tRNA gene has a C-loop intron\n");
- printf("\n For tmRNA genes, output is in the form:\n");
- printf("\n n tmRNA(p) [Locus n] <tag offset>,");
- printf("<tag end offset>\n");
- printf(" <tag peptide>\n\n");
- printf(" p means the tmRNA gene is permuted\n");
- printf("\n\n"); }
+{
+int h;
+for (h = 0; h < NHELPLINE; h++) printf("%s\n",helpmenu[h]);
+}
void error_report(int n, char *s)
{ switch(n)
@@ -10185,7 +11398,7 @@ void process_genecode_switch(char *s, csw *sw)
"CILIATE","DELETED","DELETED","FLATWORM","EUPLOT",
"BACT","ALTYEAST","ASCID","ALTFLAT","BLEP",
"CHLOROPH","DELETED","DELETED","DELETED","DELETED",
- "TREM","SCEN","THRAUST" };
+ "TREM","SCEN","THRAUST","PTERO","GRAC" };
sw->geneticcode = STANDARD;
sw->gcfix = 1;
c = *s;
@@ -10257,11 +11470,13 @@ int main(int z, char *v[])
char c1,c2,c3,c4,*s;
data_set d;
static csw sw =
- { NULL,0,0,0,0,0,0,0,1,0,0,
+ { {"tRNA","tmRNA","","","CDS","overall"},
+ NULL,0,0,0,0,0,0,0,0,1,0,0,
STANDARD,0,{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},0,METAZOAN_MT,
1,0,5,5,1,0,0,0,2,0,0,0,0,0,0,3,0,2,1,1,0,0,0,0,0,0,0,0,1,
- 0,0,0,0,0,0,0,{0,0,0,0,0},0,0,{0,0,0,0,0},0,0,0,0,0,0,0,0,0L,
- tRNAthresh,4.0,29.0,26.0,7.5,8.0,
+ 0,0,0,0,0,0,0,{0,0,0,0,0,0},0,0,0,0,NTAG,10,30,
+ {0,0,0,0,0,0},{0,0,0,0,0,0},{0,0,0,0,0,0},0,0,0,0,0L,
+ 100.0,tRNAthresh,4.0,29.0,26.0,7.5,8.0,
mtRNAtthresh,mtRNAdthresh,mtRNAdtthresh,-7.9,-6.0,
tmRNAthresh,14.0,10.0,25.0,9.0,srpRNAthresh,CDSthresh,
{tRNAthresh,tmRNAthresh,srpRNAthresh,0.0,CDSthresh },
@@ -10269,7 +11484,7 @@ int main(int z, char *v[])
45, 45, 45, 45, 45, 45, 45, 45, 45, 45,
45, 45, 45, 45, 45, 45, 45, 45, 45, 45,
10, 65, 82, 65, 71, 79, 82, 78, 32,
- 118, 49, 46, 50, 46, 51, 54, 32, 32, 32,
+ 118, 49, 46, 50, 46, 51, 55, 32, 32, 32,
68, 101, 97,110, 32, 76, 97, 115, 108,
101, 116, 116, 10,
45, 45, 45, 45, 45, 45, 45, 45, 45, 45,
@@ -10277,6 +11492,7 @@ int main(int z, char *v[])
45, 45, 45, 45, 45, 45, 45, 45, 45, 45,
10, TERM }};
sw.f = stdout;
+ d.bugmode = 0;
filecounter = 0;
i = 0;
while (++i < z)
@@ -10294,14 +11510,43 @@ int main(int z, char *v[])
case 'A': if (c2 == '7') sw.extastem = 0;
else
if (c2 == 'A') sw.matchacceptor = 1;
- else sw.secstructdisp = 1;
+ else
+ if (c2 == 'M')
+ { l = 1L;
+ if (c3 == 'T')
+ { if (lv > 4)
+ { s = lconvert(s+3,&l);
+ if (l < 1L) l = 1L;
+ sw.trnalenmisthresh = (int)l; }
+ else sw.trnalenmisthresh = 1; }
+ else if (c3 == 'M')
+ { if (lv > 4)
+ { s = lconvert(s+3,&l);
+ if (l < 1L) l = 1L;
+ sw.tmrnalenmisthresh = (int)l; }
+ else sw.tmrnalenmisthresh = 1; }
+ else if (lv > 3)
+ { s = lconvert(s+2,&l);
+ if (l < 1L) l = 1L;
+ sw.trnalenmisthresh = (int)l;
+ sw.tmrnalenmisthresh = (int)l; }
+ else
+ { sw.trnalenmisthresh = 1;
+ sw.tmrnalenmisthresh = 1; }}
+ else sw.secstructdisp = 1;
break;
case 'B': if (c2 == 'R') sw.seqdisp = 2;
else sw.libflag = 1;
break;
case 'X': sw.libflag = 2;
break;
- case 'W': if (sw.batch < 1) sw.batch = 1;
+ case 'W': if (c2 == 'U')
+ if (c3 == 'N')
+ if (c4 == 'I')
+ { d.bugmode = 1;
+ break; }
+ if (sw.batch < 1) sw.batch = 1;
+ if (c2 == 'A') sw.batchfullspecies = 1;
break;
case 'V': sw.verbose = 1;
break;
@@ -10419,8 +11664,12 @@ int main(int z, char *v[])
if (*s == ',') dconvert(s+1,&sw.mtdarmthresh); }}
else
{ sw.tmrna = 1;
- if (lv > 2)
- dconvert(s+1,&sw.tmrnathresh); }
+ if (c2 == 'U')
+ if (c3 == 'T')
+ { sw.updatetmrnatags = 1;
+ lv -= 2;
+ s += 2; }
+ if (lv > 2) dconvert(s+1,&sw.tmrnathresh); }
break;
case 'P': if (c2 == 'S')
{ if (c3 != '-')
@@ -10438,7 +11687,10 @@ int main(int z, char *v[])
break;
case 'R': if (c2 == 'N') sw.repeatsn = 1;
else
- if (c2 == 'P') sw.reportpseudogenes = 1;
+ if (c2 == 'P')
+ { sw.reportpseudogenes = 1;
+ if (lv > 3)
+ dconvert(s+2,&sw.pseudogenethresh); }
else sw.tmstrict = 0;
break;
case 'Q': sw.showconfig = 0;
diff --git a/manpage.1.src b/manpage.1.src
new file mode 100644
index 0000000..26b9c6d
--- /dev/null
+++ b/manpage.1.src
@@ -0,0 +1,273 @@
+ARAGORN(1)
+==========
+
+NAME
+----
+
+aragorn - detect tRNA genes in nucleotide sequences
+
+
+SYNOPSIS
+--------
+
+*aragorn* ['OPTION']... 'FILE'
+
+
+OPTIONS
+-------
+
+*-m*::
+ Search for tmRNA genes.
+
+*-t*::
+ Search for tRNA genes.
+ By default, all are detected. If one of
+ *-m* or *-t* is specified, then the other
+ is not detected unless specified as well.
+*-mt*::
+ Search for Metazoan mitochondrial tRNA genes.
+ tRNA genes with introns not detected. *-i*, *-sr* switchs
+ ignored. Composite Metazoan mitochondrial
+ genetic code used.
+
+*-mtmam*::
+ Search for Mammalian mitochondrial tRNA
+ genes. *-i*, *-sr* switchs ignored. *-tv* switch set.
+ Mammalian mitochondrial genetic code used.
+
+*-mtx*::
+ Same as *-mt* but low scoring tRNA genes are
+ not reported.
+
+*-mtd*::
+ Overlapping metazoan mitochondrial tRNA genes
+ on opposite strands are reported.
+
+*-gc*['num']::
+ Use the GenBank transl_table = ['num'] genetic code.
+ Individual modifications can be appended using
+ ',BBB'=<aa> B = A,C,G, or T. <aa> is the three letter
+ code for an amino-acid. More than one modification
+ can be specified. eg *-gcvert*,aga=Trp,agg=Trp uses
+ the Vertebrate Mitochondrial code and the codons
+ AGA and AGG changed to Tryptophan.
+
+*-gcstd*::
+ Use standard genetic code.
+*-gcmet*::
+ Use composite Metazoan mitochondrial genetic code.
+*-gcvert*::
+ Use Vertebrate mitochondrial genetic code.
+*-gcinvert*::
+ Use Invertebrate mitochondrial genetic code.
+*-gcyeast*::
+ Use Yeast mitochondrial genetic code.
+*-gcprot*::
+ Use Mold/Protozoan/Coelenterate mitochondrial genetic code.
+*-gcciliate*::
+ Use Ciliate genetic code.
+*-gcflatworm*::
+ Use Echinoderm/Flatworm mitochondrial genetic code
+*-gceuplot*::
+ Use Euplotid genetic code.
+*-gcbact*::
+ Use Bacterial/Plant Chloroplast genetic code.
+*-gcaltyeast*::
+ Use alternative Yeast genetic code.
+*-gcascid*::
+ Use Ascidian Mitochondrial genetic code.
+*-gcaltflat*::
+ Use alternative Flatworm Mitochondrial genetic code.
+*-gcblep*::
+ Use Blepharisma genetic code.
+*-gcchloroph*::
+ Use Chlorophycean Mitochondrial genetic code.
+*-gctrem*::
+ Use Trematode Mitochondrial genetic code.
+*-gcscen*::
+ Use Scenedesmus obliquus Mitochondrial genetic code.
+*-gcthraust*::
+ Use Thraustochytrium Mitochondrial genetic code.
+
+*-tv*::
+ Do not search for mitochondrial TV replacement loop tRNA genes. Only relevant if *-mt* used.
+
+*-c7*::
+ Search for tRNA genes with 7 base C-loops only.
+
+*-i*::
+ Search for tRNA genes with introns in
+ anticodon loop with maximum length 3000
+ bases. Minimum intron length is 0 bases.
+ Ignored if *-m* is specified.
+
+*-i*['max']::
+ Search for tRNA genes with introns in
+ anticodon loop with maximum length ['max']
+ bases. Minimum intron length is 0 bases.
+ Ignored if *-m* is specified.
+
+*-i*['min'],['max']::
+ Search for tRNA genes with introns in
+ anticodon loop with maximum length ['max']
+ bases, and minimum length ['min'] bases.
+ Ignored if *-m* is specified.
+
+*-io*::
+ Same as *-i*, but allow tRNA genes with long
+ introns to overlap shorter tRNA genes.
+
+*-if*::
+ Same as *-i*, but fix intron between positions
+ 37 and 38 on C-loop (one base after anticodon).
+
+*-ifo*::
+ Same as *-if* and *-io* combined.
+
+*-ir*::
+ Same as *-i*, but report tRNA genes with minimum
+ length ['min'] bases rather than search for
+ tRNA genes with minimum length ['min'] bases.
+ With this switch, ['min'] acts as an output filter,
+ minimum intron length for searching is still 0 bases.
+
+*-c*::
+ Assume that each sequence has a circular
+ topology. Search wraps around each end.
+ Default setting.
+
+*-l*::
+ Assume that each sequence has a linear
+ topology. Search does not wrap.
+
+*-d*::
+ Double. Search both strands of each
+ sequence. Default setting.
+
+*-s* or *-s+*::
+ Single. Do not search the complementary
+ (antisense) strand of each sequence.
+
+*-sc* or *-s-*::
+ Single complementary. Do not search the sense
+ strand of each sequence.
+
+*-ps*::
+ Lower scoring thresholds to 95% of default levels.
+
+*-ps*['num']::
+ Change scoring thresholds to ['num'] percent of
+ default levels.
+
+*-rp*::
+ Flag possible pseudogenes (score < 100 or tRNA anticodon
+ loop <> 7 bases long). Note that genes with score < 100
+ will not be detected or flagged if scoring thresholds are not
+ also changed to below 100% (see -ps switch).
+
+*-seq*::
+ Print out primary sequence.
+
+*-br*::
+ Show secondary structure of tRNA gene primary sequence
+ using round brackets.
+
+*-fasta*::
+ Print out primary sequence in fasta format.
+*-fo*::
+ Print out primary sequence in fasta format only
+ (no secondary structure).
+
+*-fon*::
+ Same as *-fo*, with sequence and gene numbering in header.
+
+*-fos*::
+ Same as *-fo*, with no spaces in header.
+
+*-fons*::
+ Same as *-fo*, with sequence and gene numbering, but no
+ spaces.
+
+*-w*::
+ Print out in Batch mode.
+
+*-ss*::
+ Use the stricter canonical 1-2 bp spacer1 and
+ 1 bp spacer2. Ignored if *-mt* set. Default is to
+ allow 3 bp spacer1 and 0-2 bp spacer2, which may
+ degrade selectivity.
+
+*-v*::
+ Verbose. Prints out information during
+ search to STDERR.
+
+*-a*::
+ Print out tRNA domain for tmRNA genes.
+
+*-a7*::
+ Restrict tRNA astem length to a maximum of 7 bases
+
+*-aa*::
+ Display message if predicted iso-acceptor species
+ does not match species in sequence name (if present).
+
+*-j*::
+ Display 4-base sequence on 3' end of astem
+ regardless of predicted amino-acyl acceptor length.
+
+*-jr*::
+ Allow some divergence of 3' amino-acyl acceptor
+ sequence from NCCA.
+
+*-jr4*::
+ Allow some divergence of 3' amino-acyl acceptor
+ sequence from NCCA, and display 4 bases.
+
+*-q*::
+ Dont print configuration line (which switchs
+ and files were used).
+*-rn*::
+ Repeat sequence name before summary information.
+
+*-O* ['outfile']::
+ Print output to ['outfile]'. If ['outfile']
+ already exists, it is overwritten. By default
+ all output goes to stdout.
+
+DESCRIPTION
+-----------
+
+aragorn detects tRNA, mtRNA, and tmRNA genes.
+A minimum requirement is at least a 32 bit compiler architecture
+(variable types int and unsigned int are at least 4 bytes long).
+
+['FILE'] is assumed to contain one or more sequences
+in FASTA format. Results of the search are printed to
+STDOUT. All switches are optional and case-insensitive.
+Unless -i is specified, tRNA genes containing introns
+are not detected.
+
+
+AUTHORS
+------
+
+Bjorn Canback <bcanback at acgt.se>, Dean Laslett <gaiaquark at gmail.com>
+
+
+REFERENCES
+----------
+
+Laslett, D. and Canback, B. (2004) ARAGORN, a
+program for the detection of transfer RNA and transfer-messenger
+RNA genes in nucleotide sequences
+Nucleic Acids Research, 32;11-16
+
+Laslett, D. and Canback, B. (2008) ARWEN: a
+program to detect tRNA genes in metazoan mitochondrial
+nucleotide sequences
+Bioinformatics, 24(2); 172-175.
+
+
+
+
+
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/aragorn.git
More information about the debian-med-commit
mailing list