[med-svn] [Git][med-team/kalign][upstream] New upstream version 3.3.1

Andreas Tille (@tille) gitlab at salsa.debian.org
Fri Oct 1 12:21:41 BST 2021



Andreas Tille pushed to branch upstream at Debian Med / kalign


Commits:
7ebdac5a by Andreas Tille at 2021-10-01T13:15:34+02:00
New upstream version 3.3.1
- - - - -


11 changed files:

- ChangeLog
- README
- configure.ac
- dev/run_io_test.sh
- src/alignment_parameters.c
- src/alignment_parameters.h
- src/aln_run.c
- src/run_kalign.c
- src/rwalign.c
- src/weave_alignment.c
- src/weave_alignment.h


Changes:

=====================================
ChangeLog
=====================================
@@ -1,3 +1,25 @@
+2021-04-16  Timo Lassmann  <timo.lassmann at telethonkids.org.au>
+
+	* version 3.3.1 - Bug Fix
+	The previous version kalign checked the top 50 sequences in inputs to determine
+	whether the sequences are aligned or not. If the first 50 sequences are not aligned,
+	but following sequences contain gaps (or other characters!) kalign can crash. In this
+	version (3.3.1) kalign checks all sequences, thereby avoiding this issue.
+	To alert users to the situation described above and to warn users about the presence of
+	odd characters, kalign now produces a warning message like this:
+
+	[Date Time] :     LOG : Start io tests.
+	[Date Time] :     LOG : reading: dev/data/a2m.good.1
+	[Date Time] :     LOG : Detected protein sequences.
+	[Date Time] : WARNING : -------------------------------------------- (rwalign.c line 505)
+	[Date Time] : WARNING : The input sequences contain gap characters:  (rwalign.c line 506)
+	[Date Time] : WARNING : "-" :   36 found                             (rwalign.c line 510)
+	[Date Time] : WARNING : BUT the sequences do not seem to be aligned! (rwalign.c line 514)
+	[Date Time] : WARNING :                                              (rwalign.c line 515)
+	[Date Time] : WARNING : Kalign will remove the gap characters and    (rwalign.c line 516)
+	[Date Time] : WARNING : align the sequences.                         (rwalign.c line 517)
+	[Date Time] : WARNING : -------------------------------------------- (rwalign.c line 518)
+
 2020-11-06  Timo Lassmann  <timo.lassmann at telethonkids.org.au>
 
 	* version 3.3 - Threading and more


=====================================
README
=====================================
@@ -1,48 +1,106 @@
 -----------------------------------------------------------------------
-  Kalign version 2.03, Copyright (C) 2006 Timo Lassmann
-
-  http://msa.cgb.ki.se/
-  timolassmann at gmail.com
-
-	This program is free software; you can redistribute it and/or modify
-	it under the terms of the GNU General Public License as published by
-	the Free Software Foundation; either version 2 of the License, or
-	any later version.
-
-	This program is distributed in the hope that it will be useful,
-	but WITHOUT ANY WARRANTY; without even the implied warranty of
-	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-	GNU General Public License for more details.
-
-	You should have received a copy of the GNU General Public License
-	along with this program; if not, write to the Free Software
-	Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
-  
-	A copy of this license is in the COPYING file.
+    Kalign - a multiple sequence alignment program
+
+    Copyright 2006, 2019, 2020, 2021 Timo Lassmann
+
+    This file is part of kalign.
+
+    Kalign is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
 -----------------------------------------------------------------------
-Installation:
 
-% ./configure 
-% make
+Kalign is a fast multiple sequence alignment program for biological sequences.
+
+1) Installation
+
+1.1) Release Tarball
+
+Download tarball from [releases](https://github.com/TimoLassmann/kalign/releases). Then:
+
+tar -zxvf kalign-<version>.tar.gz
+cd kalign-<version>
+./autogen.sh
+./configure
+make
+make check
+make install
+
+
+1.2) Homebrew
+
+brew install brewsci/bio/kalign
+
+1.3) Developer version
+
+git clone https://github.com/TimoLassmann/kalign.git
+cd kalign
+./autogen.sh
+./configure
+make
+make check
+make install
+
+
+1.4) on macOS, install [brew](https://brew.sh/) then:
+
+brew install libtool
+brew install automake
+git clone https://github.com/TimoLassmann/kalign.git
+cd kalign
+./autogen.sh
+./configure
+make
+make check
+make install
+
+
+2) Usage
+
+Usage: kalign  -i <seq file> -o <out aln>
+
+Options:
+
+   --format           : Output format. [Fasta]
+   --reformat         : Reformat existing alignment. [NA]
+   --version          : Print version and exit
+
+Kalign expects the input to be a set of unaligned sequences in fasta format or aligned sequences in aligned fasta, MSF or clustal format. Kalign automatically detects whether the input sequences are protein, RNA or DNA.
+
+Since version 3.2.0 kalign supports passing sequence in via stdin and support alignment of sequences from multiple files.
+
+3) Examples
+
+Passing sequences via stdin:
+
+   cat input.fa | kalign -f fasta > out.afa
+
+Combining multiple input files:
+
+   kalign seqsA.fa seqsB.fa seqsC.fa -f fasta > combined.afa
+
+Align sequences and output the alignment in MSF format:
+
+   kalign -i BB11001.tfa -f msf  -o out.msf
 
-and as root:
+Align sequences and output the alignment in clustal format:
 
-% make install
+   kalign -i BB11001.tfa -f clu -o out.clu
 
+Re-align sequences in an existing alignment:
 
-Usage:
+   kalign -i BB11001.msf  -o out.afa
 
-        kalign [Options]  infile.fasta outfile.fasta
-        
-        or:
-        
-        kalign [Options] -i infile.fasta -o outfile.fasta
-        
-        or:
-        
-        kalign [Options] < infile.fasta > outfile.fasta
+Reformat existing alignment:
 
-        Options:
-        
-	type: kalign -h 
-	
\ No newline at end of file
+   kalign -i BB11001.msf -r afa -o out.afa


=====================================
configure.ac
=====================================
@@ -1,4 +1,4 @@
-AC_INIT(kalign, 3.3)
+AC_INIT(kalign, 3.3.1)
 
 #AC_CONFIG_AUX_DIR([.])
 


=====================================
dev/run_io_test.sh
=====================================
@@ -18,5 +18,4 @@ do
         printf "with ERROR $status and Message:\n\n$error\n\n";
         exit 1;
     fi
-
 done


=====================================
src/alignment_parameters.c
=====================================
@@ -30,7 +30,7 @@ int set_param_number(struct aln_param* ap,int L, int sel);
 
 int new_aln_matrices(struct aln_param* ap);
 
-int init_ap(struct aln_param** aln_param, struct parameters* param, int numseq,int L)
+int init_ap(struct aln_param** aln_param, struct parameters* param,int L)
 {
         struct aln_param* ap = NULL;
         int i,j;


=====================================
src/alignment_parameters.h
=====================================
@@ -55,6 +55,6 @@ struct aln_param{
 
 };
 
-extern int init_ap(struct aln_param** aln_param, struct parameters* param, int numseq,int L);
+extern int init_ap(struct aln_param** aln_param, struct parameters* param,int L);
 extern void free_ap(struct aln_param* ap);
 #endif


=====================================
src/aln_run.c
=====================================
@@ -868,6 +868,7 @@ int do_align_serial(struct msa* msa,struct aln_tasks* t,struct aln_mem* m, int t
         MFREE(t->profile[b]);
 
         t->profile[c] = tmp;
+
         RUN(make_seq(msa,a,b,m->path));
 
         msa->plen[c] = m->path[0];


=====================================
src/run_kalign.c
=====================================
@@ -1,7 +1,7 @@
 /*
     Kalign - a multiple sequence alignment program
 
-    Copyright 2006, 2019, 2020 Timo Lassmann
+    Copyright 2006, 2019, 2020, 2021 Timo Lassmann
 
     This file is part of kalign.
 
@@ -132,7 +132,7 @@ int print_kalign_header(void)
         fprintf(stdout,"\n");
         fprintf(stdout,"Kalign (%s)\n", PACKAGE_VERSION);
         fprintf(stdout,"\n");
-        fprintf(stdout,"Copyright (C) 2006,2019,2020 Timo Lassmann\n");
+        fprintf(stdout,"Copyright (C) 2006,2019,2020,2021 Timo Lassmann\n");
         fprintf(stdout,"\n");
         fprintf(stdout,"This program comes with ABSOLUTELY NO WARRANTY; for details type:\n");
         fprintf(stdout,"`kalign -showw'.\n");
@@ -520,7 +520,7 @@ int run_kalign(struct parameters* param)
         }
 
         /* allocate aln parameters  */
-        RUN(init_ap(&ap,param,msa->numseq,msa->L ));
+        RUN(init_ap(&ap,param,msa->L ));
 
         if(param->dump_internal){
                 double* s;
@@ -555,7 +555,7 @@ int run_kalign(struct parameters* param)
                 RUN(convert_msa_to_internal(msa, ALPHA_ambigiousPROTEIN));
         }
         /* allocate aln parameters  */
-        RUN(init_ap(&ap,param,msa->numseq,msa->L ));
+        RUN(init_ap(&ap,param,msa->L ));
 
         /* Start alignment stuff */
         DECLARE_TIMER(t1);


=====================================
src/rwalign.c
=====================================
@@ -65,29 +65,29 @@ struct out_line{
 
 
 
+static int aln_unknown_warning_message(struct msa* msa);
+static int read_fasta(struct in_buffer* b, struct msa** msa);
+static int read_msf(struct in_buffer* b, struct msa** msa);
+static int read_clu(struct in_buffer* b, struct msa** msa);
 
-int read_fasta(struct in_buffer* b, struct msa** msa);
-int read_msf(struct in_buffer* b, struct msa** msa);
-int read_clu(struct in_buffer* b, struct msa** msa);
-
-int write_msa_fasta(struct msa* msa,char* outfile);
-int write_msa_clustal(struct msa* msa,char* outfile);
-int write_msa_msf(struct msa* msa,char* outfile);
+static int write_msa_fasta(struct msa* msa,char* outfile);
+static int write_msa_clustal(struct msa* msa,char* outfile);
+static int write_msa_msf(struct msa* msa,char* outfile);
 
 /* memory functions  */
-struct msa* alloc_msa(void);
-int resize_msa(struct msa* msa);
+static struct msa* alloc_msa(void);
+static int resize_msa(struct msa* msa);
 
 
-struct msa_seq* alloc_msa_seq(void);
-int resize_msa_seq(struct msa_seq* seq);
-void free_msa_seq(struct msa_seq* seq);
+static struct msa_seq* alloc_msa_seq(void);
+static int resize_msa_seq(struct msa_seq* seq);
+static void free_msa_seq(struct msa_seq* seq);
 
 
 
-struct line_buffer* alloc_line_buffer(int max_line_len);
-int resize_line_buffer(struct line_buffer* lb);
-void free_line_buffer(struct line_buffer* lb);
+static struct line_buffer* alloc_line_buffer(int max_line_len);
+static int resize_line_buffer(struct line_buffer* lb);
+static void free_line_buffer(struct line_buffer* lb);
 
 static int read_file_stdin(struct in_buffer** buffer,char* infile);
 static int alloc_in_buffer(struct in_buffer** buffer, int n);
@@ -106,8 +106,6 @@ static int GCGMultchecksum(struct msa* msa);
 /* Taken from squid library by Sean Eddy  */
 static int GCGchecksum(char *seq, int len);
 
-
-
 static int sort_by_name(const void *a, const void *b);
 static int sort_by_chksum(const void *a, const void *b);
 
@@ -248,7 +246,6 @@ int read_input(char* infile,struct msa** msa)
         STOP_TIMER(timer);
         GET_TIMING(timer);
         DESTROY_TIMER(timer);
-        //LOG_MSG("Done reading input sequences in %f seconds.", GET_TIMING(timer));
         *msa = m;
         return OK;
 ERROR:
@@ -465,7 +462,8 @@ int detect_aligned(struct msa* msa)
         min_len = INT32_MAX;
         max_len = 0;
         gaps = 0;
-        n = MACRO_MIN(50, msa->numseq);
+        /* n = MACRO_MIN(50, msa->numseq); */
+        n = msa->numseq;
         for(i = 0; i < n;i++){
                 l = 0;
                 for (j = 0; j <= msa->sequences[i]->len;j++){
@@ -480,12 +478,17 @@ int detect_aligned(struct msa* msa)
                 if(min_len == max_len){ /* sequences have gaps and total length is identical - clearly aligned  */
                         msa->aligned = ALN_STATUS_ALIGNED;
                 }else{          /* odd there are gaps but total length differs - unknown status  */
+                        aln_unknown_warning_message(msa);
+
                         msa->aligned = ALN_STATUS_UNKNOWN;
                 }
         }else{
                 if(min_len == max_len){ /* no gaps and sequences have same length. Can' tell if they are aligned  */
+                        aln_unknown_warning_message(msa);
                         msa->aligned = ALN_STATUS_UNKNOWN;
                 }else{          /* No gaps and sequences have different lengths - unaligned */
+
+
                         msa->aligned = ALN_STATUS_UNALIGNED;
                 }
         }
@@ -493,15 +496,36 @@ int detect_aligned(struct msa* msa)
         return OK;
 }
 
+static int aln_unknown_warning_message(struct msa* msa)
+{
+        int i;
+        WARNING_MSG("--------------------------------------------");
+        WARNING_MSG("The input sequences contain gap characters: ");
+
+        for(i = 0; i < 128;i++){
+                if(msa->letter_freq[i] && ispunct(i)){
+                         WARNING_MSG("\"%c\" : %4d found                            ", (char)i,msa->letter_freq[i] );
+                }
+        }
+
+        WARNING_MSG("BUT the sequences do not seem to be aligned!");
+        WARNING_MSG("                                            ");
+        WARNING_MSG("Kalign will remove the gap characters and   ");
+        WARNING_MSG("align the sequences.                        ");
+        WARNING_MSG("--------------------------------------------");
+        return OK;
+}
+
+
 /* Checks if sequence names are duplicated */
 /* Checks if sequences are duplicated */
 int run_extra_checks_on_msa(struct msa* msa)
 {
         char* tmp_name = NULL;
-        char* tmp_ptr;
+        /* char* tmp_ptr; */
         struct sort_struct_name_chksum** a = NULL;
         int i;
-        int j;
+        /* int j; */
         int c;
         int l;
 
@@ -1174,50 +1198,30 @@ int read_clu(struct in_buffer* b , struct msa** m)
 {
         struct msa* msa = NULL;
         struct msa_seq* seq_ptr = NULL;
-        //FILE* f_ptr = NULL;
+
         char* line = NULL;
-        //size_t b_len = 0;
-        //ssize_t nread;
         int i,j;
         char* p;
         int active_seq = 0;
         int line_len;
         int nl,ni;
-        /* sanity checks  */
-        //if(!my_file_exists(infile)){
-        //ERROR_MSG("File: %s does not exist.",infile);
-        //}
+
         if(msa == NULL){
                 msa = alloc_msa();
         }
 
-        //RUNP(f_ptr = fopen(infile, "r"));
-        //LOG_MSG("GAGA");
-        /* scan through first line header  */
-        //while(fgets(line, BUFFER_LEN, f_ptr)){
-        //while ((nread = getline(&line, &b_len, f_ptr)) != -1){
-                //fprintf(stdout,"LINE: %s", line);
-                //line_len = strnlen(line, BUFFER_LEN);
         ni =0;
         for(nl = 0; nl < b->n_lines;nl++){
                 line = b->l[nl]->line;
                 line_len = b->l[nl]->len;
                 ni++;
-                //line_len = nread;
-                //line[line_len-1] = 0;
-                /* line_len--; */
                 break;
         }
         active_seq =0;
         for(nl = ni; nl < b->n_lines;nl++){
                 line = b->l[nl]->line;
                 line_len = b->l[nl]->len;
-                //while ((nread = getline(&line, &b_len, f_ptr)) != -1){
-                //while(fgets(line, BUFFER_LEN, f_ptr)){
-                //line_len = strnlen(line, BUFFER_LEN);
-                //line_len = nread;
-                //line[line_len-1] = 0;
-                /* line_len--;     /\* last character is newline  *\/ */
+
                 if(!line_len){
                         active_seq = 0;
                 }else{
@@ -1226,9 +1230,6 @@ int read_clu(struct in_buffer* b , struct msa** m)
                                         RUN(resize_msa(msa));
                                 }
                                 seq_ptr = msa->sequences[active_seq];
-                                //p = strstr(line,seq_ptr->name);
-                                //if(p){
-                                //LOG_MSG("Found bitsof seq %s", seq_ptr->name);
 
                                 p = line;
                                 j = 0;
@@ -1254,11 +1255,8 @@ int read_clu(struct in_buffer* b , struct msa** m)
                                 }
                                 active_seq++;
                                 msa->numseq = MACRO_MAX(msa->numseq, active_seq);
-
                         }
-
                 }
-                //fprintf(stdout,"%d \"%s\"\n",line_len,line);
         }
         RUN(null_terminate_sequences(msa));
 
@@ -1372,35 +1370,19 @@ int read_fasta( struct in_buffer* b,struct msa** m)
 {
         struct msa* msa = NULL;
         struct msa_seq* seq_ptr = NULL;
-        //FILE* f_ptr = NULL;
-        char* line = NULL;
-        //size_t b_len = 0;
-        //ssize_t nread;
 
-        //char line[BUFFER_LEN];
+        char* line = NULL;
         int line_len;
         int i;
         int nl;
 
-        /* sanity checks  */
-        //if(!my_file_exists(infile)){
-        //ERROR_MSG("File: %s does not exist.",infile);
-        //}
         if(msa == NULL){
                 msa = alloc_msa();
         }
 
-
         for(nl = 0; nl < b->n_lines;nl++){
                 line = b->l[nl]->line;
                 line_len = b->l[nl]->len;
-                //RUNP(f_ptr = fopen(infile, "r"));
-
-                //while ((nread = getline(&line, &b_len, f_ptr)) != -1){
-                //while(fgets(line, BUFFER_LEN, f_ptr)){
-                //line_len = nread;
-
-                //fprintf(stdout,"%d %s\n",line_len,line);
                 if(line[0] == '>'){
                         /* alloc seq if buffer is full */
                         if(msa->alloc_numseq == msa->numseq){
@@ -1424,12 +1406,11 @@ int read_fasta( struct in_buffer* b,struct msa** m)
                                         if(!seq_ptr){
                                                 ERROR_MSG("Encountered a sequence before encountering it's name");
                                         }
+                                        seq_ptr->seq[seq_ptr->len] = line[i];
+                                        seq_ptr->len++;
                                         if(seq_ptr->alloc_len == seq_ptr->len){
                                                 resize_msa_seq(seq_ptr);
                                         }
-
-                                        seq_ptr->seq[seq_ptr->len] = line[i];
-                                        seq_ptr->len++;
                                 }else if(ispunct((int)line[i])){
                                         seq_ptr->gaps[seq_ptr->len]++;
                                 }
@@ -1440,17 +1421,10 @@ int read_fasta( struct in_buffer* b,struct msa** m)
 
 
         *m = msa;
-        //fclose(f_ptr);
-        //MFREE(line);
+
         return OK;
 ERROR:
         free_msa(msa);
-        //if(line){
-        //MFREE(line);
-        //}
-        //if(f_ptr){
-        //fclose(f_ptr);
-        //}
         return FAIL;
 }
 


=====================================
src/weave_alignment.c
=====================================
@@ -28,32 +28,6 @@
 //int update_gaps(int old_len,int*gis,int new_len,int *newgaps);
 int update_gaps(int old_len,int*gis,int *newgaps);
 
-int weave(struct msa* msa,struct aln_tasks*t)
-{
-        int i;
-        int a,b,c;
-
-        //RUN(clean_aln(aln)
-
-        for(i = 0; i < t->n_tasks;i++){
-                a = t->list[i]->a;
-                b = t->list[i]->b;
-                c = t->list[i]->c;
-                /* fprintf(stdout,"%3d %3d -> %3d (p: %d)\n", t->list[i]->a, t->list[i]->b, t->list[i]->c, t->list[i]->p); */
-                /* RUN(make_seq(msa,a,b,t->map[c])); */
-        }
-
-        /*for (i = 0; i < (msa->numseq-1)*3;i +=3){
-                a = tree[i];
-                b = tree[i+1];
-                RUN(make_seq(msa,a,b,map[tree[i+2]]));
-                }*/
-
-        return OK;
-ERROR:
-        return FAIL;
-}
-
 int clean_aln(struct msa* msa)
 {
         int i,j;


=====================================
src/weave_alignment.h
=====================================
@@ -29,7 +29,7 @@
 
 //extern int weave(struct msa* msa, int** map, int* tree);
 
-extern int weave(struct msa* msa,struct aln_tasks*t);
+/* extern int weave(struct aln_tasks* t); */
 extern int make_seq(struct msa* msa,int a,int b,int* path);
 
 extern int clean_aln(struct msa* msa);



View it on GitLab: https://salsa.debian.org/med-team/kalign/-/commit/7ebdac5ad2b6823ab8972de5e525d333aa9b4d51

-- 
View it on GitLab: https://salsa.debian.org/med-team/kalign/-/commit/7ebdac5ad2b6823ab8972de5e525d333aa9b4d51
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20211001/50ff751d/attachment-0001.htm>


More information about the debian-med-commit mailing list