[med-svn] [Git][med-team/clonalframeml][master] 9 commits: routine-update: New upstream version
Andreas Tille
gitlab at salsa.debian.org
Thu Feb 27 10:42:21 GMT 2020
Andreas Tille pushed to branch master at Debian Med / clonalframeml
Commits:
7e602c86 by Andreas Tille at 2020-02-27T11:28:24+01:00
routine-update: New upstream version
- - - - -
81343d66 by Andreas Tille at 2020-02-27T11:28:26+01:00
New upstream version 1.12
- - - - -
7cf60f51 by Andreas Tille at 2020-02-27T11:28:30+01:00
Update upstream source from tag 'upstream/1.12'
Update to upstream version '1.12'
with Debian dir 6677f11ebc4415064b6392c9c486b4aeb3e820a5
- - - - -
57e9e956 by Andreas Tille at 2020-02-27T11:28:30+01:00
routine-update: Standards-Version: 4.5.0
- - - - -
ee6357a6 by Andreas Tille at 2020-02-27T11:28:30+01:00
routine-update: debhelper-compat 12
- - - - -
bf7973f2 by Andreas Tille at 2020-02-27T11:38:57+01:00
Update patches
- - - - -
f0e59aea by Andreas Tille at 2020-02-27T11:39:32+01:00
routine-update: Add salsa-ci file
- - - - -
b851784f by Andreas Tille at 2020-02-27T11:39:44+01:00
Set upstream metadata fields: Bug-Database, Bug-Submit, Repository, Repository-Browse.
- - - - -
b25865a2 by Andreas Tille at 2020-02-27T11:41:08+01:00
routine-update: Ready to upload to unstable
- - - - -
30 changed files:
- .gitignore
- README.md
- debian/changelog
- − debian/compat
- debian/control
- debian/patches/cross.patch
- − debian/patches/fix_clean_target.patch
- − debian/patches/hardening.patch
- debian/patches/series
- − debian/patches/use_debian_revision_as_version.patch
- + debian/salsa-ci.yml
- debian/upstream/metadata
- − src/README.txt
- src/brent.h
- src/coalesce/coalescent_record.h
- src/main.cpp
- src/main.h
- src/make.sh
- − src/make_win.bat
- src/makefile
- src/myutils/DNA.h
- src/myutils/argumentwizard.h
- src/myutils/matrix.h
- src/myutils/mydouble.h
- src/myutils/myutils.h
- src/myutils/newick.h
- src/myutils/random.h
- src/myutils/vector.h
- src/powell.h
- src/xmfa.h
Changes:
=====================================
.gitignore
=====================================
@@ -1,8 +1,4 @@
-
src/ClonalFrameML
-
src/main.o
-
-src/version.h
-
+src/main
src/.vscode/*
=====================================
README.md
=====================================
@@ -2,9 +2,9 @@
# Introduction #
-This is the homepage of ClonalFrameML, a software package that performs efficient inference of recombination in bacterial genomes. ClonalFrameML was created by [Xavier Didelot](http://www.imperial.ac.uk/medicine/people/x.didelot/) and [Daniel Wilson](http://www.danielwilson.me.uk/). ClonalFrameML can be applied to any type of aligned sequence data, but is especially aimed at analysis of whole genome sequences. It is able to compare hundreds of whole genomes in a matter of hours on a standard Desktop computer. There are three main outputs from a run of ClonalFrameML: a phylogeny with branch lengths corrected to account for recombination, an estimation of the key parameters of the recombination process, and a genomic map of where recombination took place for each branch of the phylogeny.
+This is the homepage of ClonalFrameML, a software package that performs efficient inference of recombination in bacterial genomes. ClonalFrameML was created by [Xavier Didelot](http://xavierdidelot.github.io) and [Daniel Wilson](http://www.danielwilson.me.uk/). ClonalFrameML can be applied to any type of aligned sequence data, but is especially aimed at analysis of whole genome sequences. It is able to compare hundreds of whole genomes in a matter of hours on a standard Desktop computer. There are three main outputs from a run of ClonalFrameML: a phylogeny with branch lengths corrected to account for recombination, an estimation of the key parameters of the recombination process, and a genomic map of where recombination took place for each branch of the phylogeny.
-ClonalFrameML is a maximum likelihood implementation of the Bayesian software [ClonalFrame](http://www.xavierdidelot.xtreemhost.com/clonalframe.htm) which was previously described by [Didelot and Falush (2007)](http://www.genetics.org/cgi/content/abstract/175/3/1251). The recombination model underpinning ClonalFrameML is exactly the same as for ClonalFrame, but this new implementation is a lot faster, is able to deal with much larger genomic dataset, and does not suffer from MCMC convergence issues. A scientific paper describing ClonalFrameML in detail has been published, see [Didelot X, Wilson DJ (2015) ClonalFrameML: Efficient Inference of Recombination in Whole Bacterial Genomes. PLoS Comput Biol 11(2): e1004041. doi:10.1371/journal.pcbi.1004041](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004041).
+ClonalFrameML is a maximum likelihood implementation of the Bayesian software [ClonalFrame](http://xavierdidelot.github.io/clonalframe.html) which was previously described by [Didelot and Falush (2007)](http://www.genetics.org/cgi/content/abstract/175/3/1251). The recombination model underpinning ClonalFrameML is exactly the same as for ClonalFrame, but this new implementation is a lot faster, is able to deal with much larger genomic dataset, and does not suffer from MCMC convergence issues. A scientific paper describing ClonalFrameML in detail has been published, see [Didelot X, Wilson DJ (2015) ClonalFrameML: Efficient Inference of Recombination in Whole Bacterial Genomes. PLoS Comput Biol 11(2): e1004041. doi:10.1371/journal.pcbi.1004041](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004041).
# Download and Installation #
@@ -30,4 +30,4 @@ The user guide for ClonalFrameML is available [here](https://github.com/xavierdi
# Getting help #
-If you need assistance using ClonalFrameML, you can get in touch by emailing either [Xavier Didelot](http://www.xavierdidelot.xtreemhost.com/contact.htm) or [Daniel Wilson](http://www.danielwilson.me.uk/contact.html).
+If you need assistance using ClonalFrameML, you can get in touch by emailing either [Xavier Didelot](http://xavierdidelot.github.io/contact.html) or [Daniel Wilson](http://www.danielwilson.me.uk/contact.html).
=====================================
debian/changelog
=====================================
@@ -1,3 +1,15 @@
+clonalframeml (1.12-1) unstable; urgency=medium
+
+ * New upstream version
+ * Standards-Version: 4.5.0 (routine-update)
+ * debhelper-compat 12 (routine-update)
+ * Update patches
+ * Add salsa-ci file (routine-update)
+ * Set upstream metadata fields: Bug-Database, Bug-Submit, Repository,
+ Repository-Browse.
+
+ -- Andreas Tille <tille at debian.org> Thu, 27 Feb 2020 11:39:45 +0100
+
clonalframeml (1.11-3) unstable; urgency=medium
* set standard compiler variable
=====================================
debian/compat deleted
=====================================
@@ -1 +0,0 @@
-11
=====================================
debian/control
=====================================
@@ -3,8 +3,8 @@ Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.
Uploaders: Andreas Tille <tille at debian.org>
Section: science
Priority: optional
-Build-Depends: debhelper (>= 11~)
-Standards-Version: 4.2.1
+Build-Depends: debhelper-compat (= 12)
+Standards-Version: 4.5.0
Vcs-Browser: https://salsa.debian.org/med-team/clonalframeml
Vcs-Git: https://salsa.debian.org/med-team/clonalframeml.git
Homepage: https://github.com/xavierdidelot/ClonalFrameML
=====================================
debian/patches/cross.patch
=====================================
@@ -6,14 +6,14 @@ Description: set standard compiler variable
--- a/src/makefile
+++ b/src/makefile
@@ -1,5 +1,5 @@
- # Make file for ClonalFrameML
+ # Makefile for ClonalFrameML
-CC = g++
+CXX = g++
- CFLAGS += -O3 -I./ -I./myutils -I./coalesce
+ CFLAGS += -O3
OBJECTS = main.o
- HEADERS = main.h brent.h powell.h version.h
-@@ -9,10 +9,10 @@ HEADERS = main.h brent.h powell.h versio
- all: version ClonalFrameML
+ HEADERS = main.h brent.h powell.h
+@@ -9,10 +9,10 @@ HEADERS = main.h brent.h powell.h
+ all: ClonalFrameML
ClonalFrameML: $(OBJECTS)
- $(CC) $(LDFLAGS) -o ClonalFrameML $(OBJECTS)
@@ -23,5 +23,5 @@ Description: set standard compiler variable
- $(CC) $(CFLAGS) -c -o main.o main.cpp
+ $(CXX) $(CFLAGS) -c -o main.o main.cpp
- version: version.h
- # /bin/echo "#define ClonalFrameML_GITRevision \"`git describe --tags`\"" > version.h
+ clean:
+ rm -f $(OBJECTS)
=====================================
debian/patches/fix_clean_target.patch deleted
=====================================
@@ -1,12 +0,0 @@
-Author: Andreas Tille <tille at debian.org>
-Last-Update: Thu, 21 Sep 2017 10:30:54 +0200
-Description: Do not fail when cleaning clean source tree
-
---- a/src/makefile
-+++ b/src/makefile
-@@ -19,4 +19,4 @@ version:
- /bin/echo "#define ClonalFrameML_GITRevision \"`git describe --tags`\"" > version.h
-
- clean:
-- rm $(OBJECTS)
-+ rm -f $(OBJECTS)
=====================================
debian/patches/hardening.patch deleted
=====================================
@@ -1,15 +0,0 @@
-Author: Andreas Tille <tille at debian.org>
-Last-Update: Thu, 21 Sep 2017 10:30:54 +0200
-Description: Propagate hardening options
-
---- a/src/makefile
-+++ b/src/makefile
-@@ -1,7 +1,6 @@
- # Make file for ClonalFrameML
- CC = g++
--CFLAGS = -O3 -I./ -I./myutils -I./coalesce
--LDFLAGS =
-+CFLAGS += -O3 -I./ -I./myutils -I./coalesce
- OBJECTS = main.o
- HEADERS = main.h brent.h powell.h version.h
-
=====================================
debian/patches/series
=====================================
@@ -1,4 +1 @@
-fix_clean_target.patch
-use_debian_revision_as_version.patch
-hardening.patch
cross.patch
=====================================
debian/patches/use_debian_revision_as_version.patch deleted
=====================================
@@ -1,20 +0,0 @@
-Author: Andreas Tille <tille at debian.org>
-Last-Update: Thu, 21 Sep 2017 10:30:54 +0200
-Description: Prevent a useless call to Git since we need to use
- the Debian revision as version mark anyway since we do not build
- out of the Git repository
-
---- a/src/makefile
-+++ b/src/makefile
-@@ -15,8 +15,9 @@ ClonalFrameML: $(OBJECTS)
- main.o: main.cpp $(HEADERS)
- $(CC) $(CFLAGS) -c -o main.o main.cpp
-
--version:
-- /bin/echo "#define ClonalFrameML_GITRevision \"`git describe --tags`\"" > version.h
-+version: version.h
-+ # /bin/echo "#define ClonalFrameML_GITRevision \"`git describe --tags`\"" > version.h
-+ cat version.h
-
- clean:
- rm -f $(OBJECTS)
=====================================
debian/salsa-ci.yml
=====================================
@@ -0,0 +1,4 @@
+---
+include:
+ - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/salsa-ci.yml
+ - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/pipeline-jobs.yml
=====================================
debian/upstream/metadata
=====================================
@@ -1,3 +1,5 @@
+Bug-Database: https://github.com/xavierdidelot/ClonalFrameML/issues
+Bug-Submit: https://github.com/xavierdidelot/ClonalFrameML/issues/new
Reference:
Author: Xavier Didelot and Daniel J. Wilson
Title: >
@@ -23,3 +25,5 @@ Registry:
Entry: OMICS_14623
- Name: conda:bioconda
Entry: clonalframeml
+Repository: https://github.com/xavierdidelot/ClonalFrameML.git
+Repository-Browse: https://github.com/xavierdidelot/ClonalFrameML
=====================================
src/README.txt deleted
=====================================
@@ -1,45 +0,0 @@
-ClonalFrameML
-Xavier Didelot and Daniel Wilson. 2015
-
-This program reads in a Newick tree and FASTA file and, for all variable sites, reconstructs
-the joint maximum likelihood sequences at all nodes (including, for the purposes of imputation, tips)
-using the HKY85 nucleotide substitution model and an algorithm described in:
-
- A Fast Algorithm for Joint Reconstruction of Ancestral Amino Acid Sequences
- Tal Pupko, Itsik Peer, Ron Shamir, and Dan Graur. Mol. Biol. Evol. 17(6):890–896. 2000
-
-Branch lengths of the tree are corrected for heterospecific horizontal gene transfer using a new maximum-
-likelihood algorithm implementing the ClonalFrame model that was described in:
-
- Inference of Bacterial Microevolution Using Multilocus Sequence Data
- Xavier Didelot, and Daniel Falush. Genetics 175(3):1251-1266. 2007
-
-Syntax: ClonalFrameML newick_file fasta_file output_file [OPTIONS]
-
-newick_file The tree specified in Newick format. It must be an unrooted bifurcating tree. All
- tips should be uniquely labelled and the internal nodes must not be labelled. Note that the
- branch lengths must be scaled in units of expected number of substitutions per site.
- Failure to provide appropriately scaled branch lengths will adversely affect results.
-fasta_file The nucleotide sequences specified in FASTA format, with labels exactly matching those in
- the newick_file. The letter codes A, C, G and T are interpreted directly, U is converted
- to T, and N, -, ? and X are treated equivalently as ambiguity codes. No other codes are
- allowed.
-output_file The prefix for the output files, described below.
-[OPTIONS] Run ClonalFrameML with no arguments to see the options available.
-
-The program reports the empirical nucleotide frequencies and the joint log-likelihood of the reconstructed
-sequences for variable sites. Files are output with the following suffixes:
-
-.labelled_tree.newick The corrected Newick tree is ouput with internal nodes labelled so that they
- correspond with the reconstructed ancestral sequence file.
-.ML_sequence.fasta The reconstructed sequences (ancestral and, for the purposes of imputation,
- observed) in FASTA format with letter codes A, C, G and T only. The labels
- match exactly those in the output Newick tree.
-.position_cross_reference.txt A vector of comma-separated values equal in length to the input FASTA file
- relating the positions of (variable) sites in the input FASTA file to the
- positions of their reconstructed sequences in the output FASTA file, starting
- with position 1. Sites in the input file not reconstructed are assigned a 0.
-.importation_status.txt A FASTA file representing the inferred importation status of every site
- coded as 0 (unimported) 1 (imported) 2 (unimported homoplasy/multiallelic)
- 3 (imported homoplasy/multiallelic) 4 (untested compatible) 5 (untested
- homoplasy).
=====================================
src/brent.h
=====================================
@@ -1,19 +1,19 @@
-/* Copyright 2013 Daniel Wilson.
- *
+/*
* brent.h
+ * Part of ClonalFrameML
*
- * The myutils library is free software: you can redistribute it and/or modify
+ * ClonalFrameML is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
- * The myutils library is distributed in the hope that it will be useful,
+ * ClonalFrameML is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
- * along with the myutils library. If not, see <http://www.gnu.org/licenses/>.
+ * along with ClonalFrameML. If not, see <http://www.gnu.org/licenses/>..
*
* Parts of this code are based on code in Numerical Recipes in C++
* WH Press, SA Teukolsky, WT Vetterling, BP Flannery (2002).
=====================================
src/coalesce/coalescent_record.h
=====================================
@@ -3,18 +3,18 @@
* coalescent_record.h
* Part of the coalesce library.
*
- * The myutils library is free software: you can redistribute it and/or modify
+ * The coalesce library is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
- * The myutils library is distributed in the hope that it will be useful,
+ * The coalesce library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
- * along with the myutils library. If not, see <http://www.gnu.org/licenses/>.
+ * along with the coalesce library. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _RECORD_H_
#define _RECORD_H_
=====================================
src/main.cpp
=====================================
@@ -1,5 +1,4 @@
-/* Copyright 2013 Daniel Wilson and Xavier Didelot.
- *
+/*
* main.cpp
* Part of ClonalFrameML
*
@@ -21,7 +20,8 @@
int main (const int argc, const char* argv[]) {
clock_t start_time = clock();
- cout << "ClonalFrameML " << ClonalFrameML_GITRevision << endl;
+ cout << "ClonalFrameML " << ClonalFrameML_version << endl;
+ if (argc==2 && (strcmp(argv[1],"-version")==0||strcmp(argv[1],"-v")==0)) return 0;
// Process the command line arguments
if(argc<4) {
stringstream errTxt;
@@ -43,7 +43,6 @@ int main (const int argc, const char* argv[]) {
errTxt << "-chromosome_name name, eg \"chr\" Output importation status file in BED format using given chromosome name." << endl;
errTxt << "-min_branch_length value > 0 (default 1e-7) Minimum branch length." << endl;
errTxt << "-reconstruct_invariant_sites true or false (default) Reconstruct the ancestral states at invariant sites." << endl;
-// errTxt << "-compress_reconstructed_sites true (default) or false Reduce the number of columns in the output FASTA file." << endl; // Alternative not currently implemented, so not optional
errTxt << "-label_uncorrected_tree true or false (default) Regurgitate the uncorrected Newick tree with internal nodes labelled." << endl;
errTxt << "Options affecting -em and -embranch:" << endl;
errTxt << "-prior_mean df \"0.1 0.001 0.1 0.0001\" Prior mean for R/theta, 1/delta, nu and M." << endl;
@@ -52,10 +51,12 @@ int main (const int argc, const char* argv[]) {
errTxt << "-guess_initial_m true (default) or false Initialize M and nu jointly in the EM algorithms." << endl;
errTxt << "-emsim value >= 0 (default 0) Number of simulations to estimate uncertainty in the EM results." << endl;
errTxt << "-embranch_dispersion value > 0 (default .01) Dispersion in parameters among branches in the -embranch model." << endl;
+ errTxt << "-output_filtered true of false (default) Output a filtered alignment including only non-recombinant sites." << endl;
errTxt << "Options affecting -rescale_no_recombination:" << endl;
errTxt << "-brent_tolerance tolerance (default .001) Set the tolerance of the Brent routine for -rescale_no_recombination." << endl;
errTxt << "-powell_tolerance tolerance (default .001) Set the tolerance of the Powell routine for -rescale_no_recombination." << endl;
- error(errTxt.str().c_str());
+ cout << errTxt.str().c_str()<<endl;
+ return 0;
}
// Process required arguments
const char* newick_file = argv[1];
@@ -64,6 +65,7 @@ int main (const int argc, const char* argv[]) {
string tree_out_file = string(out_file) + ".labelled_tree.newick";
string oritree_out_file = string(out_file) + ".labelled_uncorrected_tree.newick";
string fasta_out_file = string(out_file) + ".ML_sequence.fasta";
+ string fasta_filtered_file = string(out_file) + ".filtered.fasta";
string xref_out_file = string(out_file) + ".position_cross_reference.txt";
string import_out_file = string(out_file) + ".importation_status.txt";
string em_out_file = string(out_file) + ".em.txt";
@@ -73,7 +75,8 @@ int main (const int argc, const char* argv[]) {
arg.case_sensitive = false;
string fasta_file_list="false", xmfa_file="false", imputation_only="false", ignore_incomplete_sites="false", ignore_user_sites="", reconstruct_invariant_sites="false";
string use_incompatible_sites="true", rescale_no_recombination="false";
- string show_progress="false", compress_reconstructed_sites="true";
+ string show_progress="false";
+ string output_filtered="false";
string string_prior_mean="0.1 0.001 0.1 0.0001", string_prior_sd="0.1 0.001 0.1 0.0001", string_initial_values = "0.1 0.001 0.05";
string guess_initial_m="true", em="true", embranch="false", label_original_tree="false", chr_name="";
double brent_tolerance = 1.0e-3, powell_tolerance = 1.0e-3, global_min_branch_length = 1.0e-7;
@@ -92,7 +95,6 @@ int main (const int argc, const char* argv[]) {
arg.add_item("powell_tolerance", TP_DOUBLE, &powell_tolerance);
arg.add_item("rescale_no_recombination", TP_STRING, &rescale_no_recombination);
arg.add_item("show_progress", TP_STRING, &show_progress);
- arg.add_item("compress_reconstructed_sites",TP_STRING, &compress_reconstructed_sites);
arg.add_item("min_branch_length", TP_DOUBLE, &global_min_branch_length);
arg.add_item("prior_mean", TP_STRING, &string_prior_mean);
arg.add_item("prior_sd", TP_STRING, &string_prior_sd);
@@ -104,6 +106,7 @@ int main (const int argc, const char* argv[]) {
arg.add_item("embranch_dispersion", TP_DOUBLE, &embranch_dispersion);
arg.add_item("kappa", TP_DOUBLE, &kappa);
arg.add_item("label_uncorrected_tree", TP_STRING, &label_original_tree);
+ arg.add_item("output_filtered", TP_STRING, &output_filtered);
arg.read_input(argc-3,argv+3);
bool FASTA_FILE_LIST = string_to_bool(fasta_file_list, "fasta_file_list");
bool XMFA_FILE = string_to_bool(xmfa_file, "xmfa_file");
@@ -113,11 +116,11 @@ int main (const int argc, const char* argv[]) {
bool USE_INCOMPATIBLE_SITES = string_to_bool(use_incompatible_sites, "use_incompatible_sites");
bool RESCALE_NO_RECOMBINATION = string_to_bool(rescale_no_recombination, "rescale_no_recombination");
bool SHOW_PROGRESS = string_to_bool(show_progress, "show_progress");
- bool COMPRESS_RECONSTRUCTED_SITES = string_to_bool(compress_reconstructed_sites, "compress_reconstructed_sites");
bool GUESS_INITIAL_M = string_to_bool(guess_initial_m, "guess_initial_m");
bool EM = string_to_bool(em, "em");
bool EMBRANCH = string_to_bool(embranch, "embranch");
bool LABEL_ORIGINAL_TREE = string_to_bool(label_original_tree, "label_uncorrected_tree");
+ bool OUTPUT_FILTERED = string_to_bool(output_filtered, "output_filtered");
bool MULTITHREAD = false;
if(brent_tolerance<=0.0 || brent_tolerance>=0.1) {
stringstream errTxt;
@@ -328,7 +331,6 @@ int main (const int argc, const char* argv[]) {
// Report the ML
cout << "Maximum log-likelihood for imputation and ancestral state reconstruction = " << ML.LOG() << endl;
- if(!COMPRESS_RECONSTRUCTED_SITES) cout << "WARNING: -compress_reconstructed_sites=false not yet implemented, ignoring." << endl;
// Output the ML reconstructed sequences
write_ancestral_fasta(node_nuc, ctree_node_labels, fasta_out_file.c_str());
// For every position in the original FASTA file, output the corresponding position in the output FASTA file, or -1 (not included)
@@ -466,6 +468,11 @@ int main (const int argc, const char* argv[]) {
// Output the importation status
write_importation_status_intervals(is_imported,ctree_node_labels,isBLC,compat,import_out_file.c_str(),root_node,chr_name.c_str());
cout << "Wrote inferred importation status to " << import_out_file << endl;
+ if (OUTPUT_FILTERED) {
+ // Output the filtered alignment
+ write_filtered_fasta(is_imported, &fa, ignore_site, fasta_filtered_file.c_str());
+ cout << "Wrote filtered alignment to " << fasta_filtered_file << endl;
+ }
// If required, simulate under the point estimates and output posterior samples of the parameters
if(emsim>0) {
@@ -548,6 +555,11 @@ int main (const int argc, const char* argv[]) {
// Output the importation status
write_importation_status_intervals(is_imported,ctree_node_labels,isBLC,compat,import_out_file.c_str(),root_node,chr_name.c_str());
cout << "Wrote inferred importation status to " << import_out_file << endl;
+ if (OUTPUT_FILTERED) {
+ // Output the filtered alignment
+ write_filtered_fasta(is_imported, &fa, ignore_site, fasta_filtered_file.c_str());
+ cout << "Wrote filtered alignment to " << fasta_filtered_file << endl;
+ }
// If required, simulate under the point estimates and output posterior samples of the parameters
if(emsim>0) {
@@ -1721,17 +1733,7 @@ void write_ancestral_fasta(Matrix<Nucleotide> &nuc, vector<string> &all_node_nam
errTxt << "write_ancestral_fasta(): could not open file " << file_name << " for writing";
error(errTxt.str().c_str());
}
- write_ancestral_fasta(nuc,all_node_names,fout);
- fout.close();
-}
-
-void write_ancestral_fasta(Matrix<Nucleotide> &nuc, vector<string> &all_node_names, ofstream &fout) {
static const char AGCTN[5] = {'A','G','C','T','N'};
- if(!fout) {
- stringstream errTxt;
- errTxt << "write_ancestral_fasta(): could not open file stream for writing";
- error(errTxt.str().c_str());
- }
if(nuc.nrows()!=all_node_names.size()) {
stringstream errTxt;
errTxt << "write_ancestral_fasta(): number of sequences (" << nuc.nrows() << ") does not equal number of node labels (" << all_node_names.size() << ")";
@@ -1745,23 +1747,37 @@ void write_ancestral_fasta(Matrix<Nucleotide> &nuc, vector<string> &all_node_nam
}
fout << endl;
}
+ fout.close();
}
-void write_position_cross_reference(vector<bool> &iscompat, vector<int> &ipat, const char* file_name) {
+void write_filtered_fasta(vector< vector<ImportationState> > &imported, DNA * fa,vector<bool> &ignore_site, const char* file_name) {
ofstream fout(file_name);
if(!fout) {
stringstream errTxt;
- errTxt << "write_position_cross_reference(): could not open file " << file_name << " for writing";
+ errTxt << "write_filtered_fasta(): could not open file " << file_name << " for writing";
error(errTxt.str().c_str());
}
- write_position_cross_reference(iscompat,ipat,fout);
- fout.close();
+ int n,pos;
+ vector<bool> tokeep(fa->lseq);
+ for (pos=0;pos<fa->lseq;pos++) {
+ tokeep[pos]=true;
+ if (ignore_site[pos]) tokeep[pos]=false;
+ for (n=0;n<imported.size();n++) if(imported[n][pos]==Imported) tokeep[pos]=false;
+ }
+ for(n=0;n<fa->nseq;n++)
+ {
+ fout << ">" << fa->label[n] << endl;
+ for(pos=0;pos<fa->lseq;pos++)
+ if (tokeep[pos]) fout << fa->sequence[n][pos];
+ fout << endl;
+ } fout.close();
}
-void write_position_cross_reference(vector<bool> &iscompat, vector<int> &ipat, ofstream &fout) {
+void write_position_cross_reference(vector<bool> &iscompat, vector<int> &ipat, const char* file_name) {
+ ofstream fout(file_name);
if(!fout) {
stringstream errTxt;
- errTxt << "write_position_cross_reference(): could not open file stream for writing";
+ errTxt << "write_position_cross_reference(): could not open file " << file_name << " for writing";
error(errTxt.str().c_str());
}
int i,j,pat;
@@ -1780,6 +1796,7 @@ void write_position_cross_reference(vector<bool> &iscompat, vector<int> &ipat, o
fout << pat+1;
}
fout << endl;
+ fout.close();
}
mydouble likelihood_branch(const int dec_id, const int anc_id, const Matrix<Nucleotide> &node_nuc, const vector<int> &pat1, const vector<int> &cpat, const double kappa, const vector<double> &pinuc, const double branch_length) {
@@ -1810,55 +1827,6 @@ bool string_to_bool(const string s, const string label) {
return false;
}
-void write_importation_status(vector< vector<ImportationState> > &imported, vector<string> &all_node_names, vector<bool> &isBLC, vector<int> &compat, const char* file_name, const int root_node) {
- ofstream fout(file_name);
- if(!fout) {
- stringstream errTxt;
- errTxt << "write_importation_status(): could not open file " << file_name << " for writing";
- error(errTxt.str().c_str());
- }
- write_importation_status(imported,all_node_names,isBLC,compat,fout,root_node);
- fout.close();
-}
-
-void write_importation_status(vector< vector<ImportationState> > &imported, vector<string> &all_node_names, vector<bool> &isBLC, vector<int> &compat, ofstream &fout, const int root_node) {
- if(!fout) {
- stringstream errTxt;
- errTxt << "write_importation_status(): could not open file stream for writing";
- error(errTxt.str().c_str());
- }
- if(imported.size()!=root_node) {
- stringstream errTxt;
- errTxt << "write_importation_status(): number of lineages (" << imported.size() << ") does not equal the number of non-root node labels (" << root_node << ")";
- error(errTxt.str().c_str());
- }
- if(all_node_names.size()<root_node) {
- stringstream errTxt;
- errTxt << "write_importation_status(): number of non-root lineages (" << root_node << ") exceeds the number of node labels (" << all_node_names.size() << ")";
- error(errTxt.str().c_str());
- }
- int i,pos;
- for(i=0;i<root_node;i++) {
- fout << ">" << all_node_names[i] << endl;
- int k = 0;
- for(pos=0;pos<isBLC.size();pos++) {
- if(isBLC[pos]) {
- // If used in branch length correction, 0 (unimported), 1 (imported), 2 (homoplasy/multiallelic unimported), 3 (homoplasy/multiallelic imported)
- int out = 2*(compat[pos]>0) + (int)imported[i][k];
- fout << out;
- ++k;
- } else if(compat[pos]<=0) {
- // If compatible but not used in branch length correction, 4
- fout << 4;
- } else {
- // If homoplasy/multiallelic and not used in branch length correction, 5
- fout << 5;
- }
- }
- fout << endl;
- }
-}
-
void write_importation_status_intervals(vector< vector<ImportationState> > &imported, vector<string> &all_node_names, vector<bool> &isBLC, vector<int> &compat, const char* file_name, const int root_node, const char* chr_name) {
ofstream fout(file_name);
if(!fout) {
@@ -1866,16 +1834,6 @@ void write_importation_status_intervals(vector< vector<ImportationState> > &impo
errTxt << "write_importation_status_intervals(): could not open file " << file_name << " for writing";
error(errTxt.str().c_str());
}
- write_importation_status_intervals(imported,all_node_names,isBLC,compat,fout,root_node, chr_name);
- fout.close();
-}
-
-void write_importation_status_intervals(vector< vector<ImportationState> > &imported, vector<string> &all_node_names, vector<bool> &isBLC, vector<int> &compat, ofstream &fout, const int root_node, const char* chr_name) {
- if(!fout) {
- stringstream errTxt;
- errTxt << "write_importation_status_intervals(): could not open file stream for writing";
- error(errTxt.str().c_str());
- }
if(imported.size()!=root_node) {
stringstream errTxt;
errTxt << "write_importation_status_intervals(): number of lineages (" << imported.size() << ") does not equal the number of non-root node labels (" << root_node << ")";
@@ -1916,6 +1874,7 @@ void write_importation_status_intervals(vector< vector<ImportationState> > &impo
else fout << chr_name << tab << interval_beg+1 << tab << pos << tab << all_node_names[i] << endl;
}
}
+ fout.close();
}
mydouble maximum_likelihood_ClonalFrame_branch_allsites(const int dec_id, const int anc_id, const Matrix<Nucleotide> &node_nuc, const vector<bool> &iscompat, const vector<int> &ipat, const double kappa, const vector<double> &pinuc, const double branch_length, const double rho_over_theta, const double mean_import_length, const double import_divergence, vector<ImportationState> &is_imported) {
=====================================
src/main.h
=====================================
@@ -1,5 +1,4 @@
-/* Copyright 2013 Daniel Wilson and Xavier Didelot.
- *
+/*
* main.h
* Part of ClonalFrameML
*
@@ -22,23 +21,20 @@
#include <iostream>
#include <string.h>
#include "myutils/newick.h"
-//#include "coalesce/coalesce.h"
#include "coalesce/coalescent_record.h"
#include <sstream>
-//#include "myutils/myutils.h"
#include "xmfa.h"
#include <fstream>
#include <algorithm>
#include "myutils/DNA.h"
#include "myutils/mydouble.h"
-//#include "coalesce/mutation.h"
#include "powell.h"
#include "myutils/argumentwizard.h"
#include <time.h>
#include "myutils/random.h"
#include <limits>
#include <iomanip>
-#include "version.h"
+#define ClonalFrameML_version "v1.12"
using std::cout;
using myutils::NewickTree;
@@ -68,15 +64,12 @@ void write_newick(const marginal_tree &ctree, const vector<string> &all_node_nam
void write_newick(const marginal_tree &ctree, const vector<string> &all_node_names, ofstream &fout);
void write_newick_node(const mt_node *node, const vector<string> &all_node_names, ofstream &fout);
void write_ancestral_fasta(Matrix<Nucleotide> &nuc, vector<string> &all_node_names, const char* file_name);
-void write_ancestral_fasta(Matrix<Nucleotide> &nuc, vector<string> &all_node_names, ofstream &fout);
+void write_filtered_fasta(vector< vector<ImportationState> > &imported, DNA * fa,vector<bool> & ignore_site, const char* file_name);
void write_position_cross_reference(vector<bool> &iscompat, vector<int> &ipat, const char* file_name);
void write_position_cross_reference(vector<bool> &iscompat, vector<int> &ipat, ofstream &fout);
mydouble likelihood_branch(const int dec_id, const int anc_id, const Matrix<Nucleotide> &node_nuc, const vector<int> &pat1, const vector<int> &cpat, const double kappa, const vector<double> &pinuc, const double branch_length);
bool string_to_bool(const string s, const string label="");
-void write_importation_status(vector< vector<ImportationState> > &imported, vector<string> &all_node_names, vector<bool> &isBLC, vector<int> &compat, const char* file_name, const int root_node);
-void write_importation_status(vector< vector<ImportationState> > &imported, vector<string> &all_node_names, vector<bool> &isBLC, vector<int> &compat, ofstream &fout, const int root_node);
void write_importation_status_intervals(vector< vector<ImportationState> > &imported, vector<string> &all_node_names, vector<bool> &isBLC, vector<int> &compat, const char* file_name, const int root_node,const char* chr_name);
-void write_importation_status_intervals(vector< vector<ImportationState> > &imported, vector<string> &all_node_names, vector<bool> &isBLC, vector<int> &compat, ofstream &fout, const int root_node, const char* chr_name);
double Baum_Welch(const marginal_tree &tree, const Matrix<Nucleotide> &node_nuc, const vector<double> &position, const vector<int> &ipat, const double kappa, const vector<double> &pinuc, const vector<bool> &informative, const vector<double> &prior_a, const vector<double> &prior_b, vector<double> &full_param, vector<double> &posterior_a, int &neval, const bool coutput, double &priorL);
double Baum_Welch0(const marginal_tree &tree, const Matrix<Nucleotide> &node_nuc, const vector<double> &position, const vector<int> &ipat, const double kappa, const vector<double> &pinuc, const vector<bool> &informative, const vector<double> &prior_a, const vector<double> &prior_b, const vector<double> &full_param, const vector<double> &posterior_a, const bool coutput);
double gamma_loglikelihood(const double x, const double a, const double b);
=====================================
src/make.sh
=====================================
@@ -1,3 +1 @@
-echo "#define ClonalFrameML_GITRevision \"`git describe --tags`\"" > version.h
-g++ main.cpp -o ClonalFrameML -I ./ -I ./myutils -I ./coalesce -O3
-
+g++ main.cpp -o ClonalFrameML -O3
=====================================
src/make_win.bat deleted
=====================================
@@ -1,11 +0,0 @@
- at echo off
-rem This creates the version.h file.
-rem You need git installed (obviously)
-rem And to be in the folder where the ".git" directory exists.
-
-FOR /F "delims=" %%i IN ('git describe --tags') DO set GITRESULT=%%i
-echo #define ClonalFrameML_GITRevision %GITRESULT% > version.h
-
-rem The linux make.sh file now compiles the code.
-rem If you're in VS, remember you need _CRT_SECURE_NO_WARNINGS in the
-rem Pre-Processor code.
=====================================
src/makefile
=====================================
@@ -1,13 +1,12 @@
-# Make file for ClonalFrameML
+# Makefile for ClonalFrameML
CC = g++
-CFLAGS = -O3 -I./ -I./myutils -I./coalesce
-LDFLAGS =
+CFLAGS += -O3
OBJECTS = main.o
-HEADERS = main.h brent.h powell.h version.h
+HEADERS = main.h brent.h powell.h
-.PHONY: clean version
+.PHONY: clean
-all: version ClonalFrameML
+all: ClonalFrameML
ClonalFrameML: $(OBJECTS)
$(CC) $(LDFLAGS) -o ClonalFrameML $(OBJECTS)
@@ -15,8 +14,5 @@ ClonalFrameML: $(OBJECTS)
main.o: main.cpp $(HEADERS)
$(CC) $(CFLAGS) -c -o main.o main.cpp
-version:
- /bin/echo "#define ClonalFrameML_GITRevision \"`git describe --tags`\"" > version.h
-
clean:
- rm $(OBJECTS)
+ rm -f $(OBJECTS)
=====================================
src/myutils/DNA.h
=====================================
@@ -31,7 +31,7 @@
#include <string>
#include <fstream>
#include <iostream>
-#include "myutils/myutils.h"
+#include "myutils.h"
#include <map>
#include <sstream>
#include <algorithm>
=====================================
src/myutils/argumentwizard.h
=====================================
@@ -33,7 +33,7 @@
#include <vector>
#include <iostream>
#include <ctype.h>
-#include "myutils/myerror.h"
+#include "myerror.h"
#include <sstream>
namespace myutils
=====================================
src/myutils/matrix.h
=====================================
@@ -27,8 +27,8 @@
#include <stdlib.h>
#include <stdio.h>
-#include "myutils/vector.h"
-#include "myutils/utils.h"
+#include "vector.h"
+#include "utils.h"
/****************************************************************/
/* myutils::Matrix */
=====================================
src/myutils/mydouble.h
=====================================
@@ -21,7 +21,7 @@
#include <limits>
#include <math.h>
-#include "myutils/myerror.h"
+#include "myerror.h"
using myutils::error;
=====================================
src/myutils/myutils.h
=====================================
@@ -27,26 +27,12 @@
#pragma warning(disable: 4786)
-/*Includes all header files in the myutils directory*/
-/*#include "cmatrix.h"
+#include "myerror.h"
+#include "utils.h"
+#include "vector.h"
#include "matrix.h"
+#include "lotri_matrix.h"
#include "random.h"
-#include "error.h"
#include "DNA.h"
-#include "vector.h"*/
-
-#include "myutils/myerror.h"
-#include "myutils/utils.h"
-//#include "myutils/cmatrix.h"
-#include "myutils/vector.h"
-#include "myutils/matrix.h"
-#include "myutils/lotri_matrix.h"
-#include "myutils/random.h"
-#include "myutils/DNA.h"
-//#include "myutils/pause.h"
-//#include "myutils/sort.h"
-
-//#include "controlwizard.h" /* has problems in Linux with pointers */
-//#include "pause.h" /* removed because conio.h is not standard */
#endif
=====================================
src/myutils/newick.h
=====================================
@@ -10,7 +10,7 @@
#define _NEWICK_H_
#include <vector>
#include <string>
-#include "myutils/myerror.h"
+#include "myerror.h"
#include <sstream>
#include <iostream>
=====================================
src/myutils/random.h
=====================================
@@ -28,11 +28,11 @@
#include <cmath>
#include <time.h>
#include <vector>
-#include "myutils/vector.h"
-#include "myutils/matrix.h"
-#include "myutils/lotri_matrix.h"
+#include "vector.h"
+#include "matrix.h"
+#include "lotri_matrix.h"
-#include "myutils/myerror.h"
+#include "myerror.h"
namespace myutils {
class Random {
=====================================
src/myutils/vector.h
=====================================
@@ -25,7 +25,7 @@
#ifndef _MYUTILS_VECTOR_H_
#define _MYUTILS_VECTOR_H_
-#include "myutils/myerror.h"
+#include "myerror.h"
#include <stdlib.h>
#include <stdio.h>
//#include <myutils.h>
=====================================
src/powell.h
=====================================
@@ -1,19 +1,20 @@
-/* Copyright 2013 Daniel Wilson.
- *
+/*
* powell.h
+ * Part of ClonalFrameML
+ *
*
- * The myutils library is free software: you can redistribute it and/or modify
+ * ClonalFrameML is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
- * The myutils library is distributed in the hope that it will be useful,
+ * ClonalFrameML is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
- * along with the myutils library. If not, see <http://www.gnu.org/licenses/>.
+ * along with ClonalFrameML. If not, see <http://www.gnu.org/licenses/>..
*
* Parts of this code are based on code in Numerical Recipes in C++
* WH Press, SA Teukolsky, WT Vetterling, BP Flannery (2002).
=====================================
src/xmfa.h
=====================================
@@ -1,8 +1,8 @@
-/* Copyright 2013 Daniel Wilson and Xavier Didelot.
- *
+/*
* xmfa.h
* Part of ClonalFrameML
*
+ *
* ClonalFrameML is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
View it on GitLab: https://salsa.debian.org/med-team/clonalframeml/-/compare/f2619e22be9950b409a617892af45e9ea8f4b3e6...b25865a2c95f8f8cd9c569087cdaad0dde221250
--
View it on GitLab: https://salsa.debian.org/med-team/clonalframeml/-/compare/f2619e22be9950b409a617892af45e9ea8f4b3e6...b25865a2c95f8f8cd9c569087cdaad0dde221250
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200227/e316fdac/attachment-0001.html>
More information about the debian-med-commit
mailing list