[med-svn] [gubbins] 01/07: Imported Upstream version 2.0.0
Sascha Steinbiss
satta at debian.org
Wed Jul 20 12:50:10 UTC 2016
This is an automated email from the git hooks/post-receive script.
satta pushed a commit to branch master
in repository gubbins.
commit 8cafa0e6072ba6fb512811986350f226dd600e31
Author: Sascha Steinbiss <satta at debian.org>
Date: Wed Jul 20 12:11:48 2016 +0000
Imported Upstream version 2.0.0
---
CHANGELOG | 7 +
INSTALL.md | 16 +-
README.md | 2 +-
VERSION | 2 +-
depcomp | 487 +++++++++++++--------
install-userspace.sh | 2 +-
install_dependencies.sh | 26 --
python/gubbins/Fastml.py | 63 ---
python/gubbins/RAxMLExecutable.py | 105 +++++
python/gubbins/RAxMLSequenceReconstruction.py | 176 ++++++++
python/gubbins/__init__.py | 1 -
python/gubbins/common.py | 133 +-----
python/gubbins/tests/bin/dummy_custom_fastml2 | 62 ---
python/gubbins/tests/bin/dummy_fastml2 | 60 ---
python/gubbins/tests/bin/dummy_fastml3 | 51 ---
python/gubbins/tests/data/destination_tree.tre | 1 +
.../tests/data/expected_renamed_output_tree | 1 +
.../data/raxml_sequence_reconstruction/1.fasta | 4 +
.../data/raxml_sequence_reconstruction/2.fasta | 8 +
.../expected_ancestor_sequence_from_raxml | 22 +
.../expected_combined_1_2.fasta | 12 +
.../expected_marginalAncestralStates.fasta | 10 +
.../expected_rooted_tree.newick | 1 +
.../input_alignment.fasta | 12 +
.../raw_marginalAncestralStates.phylip | 5 +
.../unrooted_tree.newick | 1 +
python/gubbins/tests/data/source_tree.tre | 1 +
python/gubbins/tests/test_external_dependancies.py | 19 +-
python/gubbins/tests/test_fastml.py | 43 --
python/gubbins/tests/test_pre_process_fasta.py | 9 +-
.../tests/test_raxml_sequence_reconstruction.py | 94 ++++
python/gubbins/tests/test_string_construction.py | 9 -
python/gubbins/tests/test_tree_python_methods.py | 5 -
.../gubbins/tests/test_validate_starting_tree.py | 1 -
release/manifests/trustyvm.pp | 8 -
35 files changed, 782 insertions(+), 677 deletions(-)
diff --git a/CHANGELOG b/CHANGELOG
index cfeba0b..69f2409 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,3 +1,10 @@
+v2.0.0 - 26 May 2016
+------
+Reconstruct internal sequences by default using RAxML rather than fastML.
+Addresses speed issues raised by users.
+RAxML version 8 now required as a minimum.
+Remove pairwise support since RAxML wont support it. Need at least 4 genomes as input.
+
v1.4.9 - 15 Apr 2016
------
If sequences are 100% identical, filter out duplicates.
diff --git a/INSTALL.md b/INSTALL.md
index 75901c9..41e524f 100644
--- a/INSTALL.md
+++ b/INSTALL.md
@@ -6,7 +6,7 @@ There are a few ways to install Gubbins and its dependancies. The simpliest way
* OSX - Mavericks (10.9) & Yosemite (10.10) & El Capitan (10.11)
* OSX - Mountain Lion (10.8)
* Linux - Ubuntu Trusty (14.04) & Precise (12.04)
-* Linux - Debian (unstable)
+* Linux - Ubuntu Xenial (16.04) & Debian (unstable)
* Linux - CentOS 7
* Linux - CentOS 6
* OSX/Linux - from source
@@ -23,18 +23,11 @@ brew install gubbins
## OSX - Mountain Lion (10.8)
Install [HomeBrew](http://brew.sh/). It requires a minimum of Xcode 5.1.1 (xcodebuild -version).
-Manually install [FastML](http://fastml.tau.ac.il/source.php) and include the binary in your PATH. For example:
-```
-wget 'http://fastml.tau.ac.il/source/FastML.v3.1.tgz'
-tar -xzf FastML.v3.1.tgz
-cd FastML.v3.1
-export PATH=${HOME}/FastML.v3.1/bin:$PATH
-```
Then run:
```
brew tap homebrew/science
brew install python3
-brew install gubbins --without-fastml
+brew install gubbins
```
## OSX - It failed to install
@@ -53,8 +46,8 @@ brew install python3
brew install gubbins
```
-## Linux - Debian (unstable)
-Gubbins has been packed by the Debian Med team and will eventually be the easiest way to install on Debian based systems.
+## Linux - Ubuntu Xenial (16.04) & Debian (unstable)
+Gubbins has been packaged by the Debian Med team and is trivial to install using apt.
```
sudo apt-get install gubbins
```
@@ -96,7 +89,6 @@ This is the most difficult method and is only suitable for someone with advanced
Install the dependances and include them in your PATH:
* [FastTree](http://www.microbesonline.org/fasttree/#Install) ( >=2.1.4 )
* [RAxML](https://github.com/stamatak/standard-RAxML) ( >=8.0 )
-* [FASTML](http://fastml.tau.ac.il/source.php) ( >=2.02 )
* Python modules: Biopython (> 1.59), DendroPy (>=4.0), Reportlab, nose, pillow
* Standard build environment tools (e.g. python3, pip3, make, autoconf, libtool, gcc, check, etc...)
diff --git a/README.md b/README.md
index 8bdc5af..c9ade1c 100644
--- a/README.md
+++ b/README.md
@@ -67,7 +67,7 @@ Print debugging messages. Default is off.
--no_cleanup, -n
-Do not remove files from intermediate iterations. This option will also keep other files created by RAxML, fastml and fasttree, which would otherwise be deleted. Default is to only keep files from the final iteration.
+Do not remove files from intermediate iterations. This option will also keep other files created by RAxML and fasttree, which would otherwise be deleted. Default is to only keep files from the final iteration.
Output files
==========
diff --git a/VERSION b/VERSION
index 4ea2b1f..227cea2 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-1.4.9
+2.0.0
diff --git a/depcomp b/depcomp
index bd0ac08..fc98710 100755
--- a/depcomp
+++ b/depcomp
@@ -1,10 +1,9 @@
#! /bin/sh
# depcomp - compile a program generating dependencies as side-effects
-scriptversion=2011-12-04.11; # UTC
+scriptversion=2013-05-30.07; # UTC
-# Copyright (C) 1999, 2000, 2003, 2004, 2005, 2006, 2007, 2009, 2010,
-# 2011 Free Software Foundation, Inc.
+# Copyright (C) 1999-2014 Free Software Foundation, Inc.
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -28,9 +27,9 @@ scriptversion=2011-12-04.11; # UTC
case $1 in
'')
- echo "$0: No command. Try \`$0 --help' for more information." 1>&2
- exit 1;
- ;;
+ echo "$0: No command. Try '$0 --help' for more information." 1>&2
+ exit 1;
+ ;;
-h | --h*)
cat <<\EOF
Usage: depcomp [--help] [--version] PROGRAM [ARGS]
@@ -40,8 +39,8 @@ as side-effects.
Environment variables:
depmode Dependency tracking mode.
- source Source file read by `PROGRAMS ARGS'.
- object Object file output by `PROGRAMS ARGS'.
+ source Source file read by 'PROGRAMS ARGS'.
+ object Object file output by 'PROGRAMS ARGS'.
DEPDIR directory where to store dependencies.
depfile Dependency file to output.
tmpdepfile Temporary file to use when outputting dependencies.
@@ -57,6 +56,66 @@ EOF
;;
esac
+# Get the directory component of the given path, and save it in the
+# global variables '$dir'. Note that this directory component will
+# be either empty or ending with a '/' character. This is deliberate.
+set_dir_from ()
+{
+ case $1 in
+ */*) dir=`echo "$1" | sed -e 's|/[^/]*$|/|'`;;
+ *) dir=;;
+ esac
+}
+
+# Get the suffix-stripped basename of the given path, and save it the
+# global variable '$base'.
+set_base_from ()
+{
+ base=`echo "$1" | sed -e 's|^.*/||' -e 's/\.[^.]*$//'`
+}
+
+# If no dependency file was actually created by the compiler invocation,
+# we still have to create a dummy depfile, to avoid errors with the
+# Makefile "include basename.Plo" scheme.
+make_dummy_depfile ()
+{
+ echo "#dummy" > "$depfile"
+}
+
+# Factor out some common post-processing of the generated depfile.
+# Requires the auxiliary global variable '$tmpdepfile' to be set.
+aix_post_process_depfile ()
+{
+ # If the compiler actually managed to produce a dependency file,
+ # post-process it.
+ if test -f "$tmpdepfile"; then
+ # Each line is of the form 'foo.o: dependency.h'.
+ # Do two passes, one to just change these to
+ # $object: dependency.h
+ # and one to simply output
+ # dependency.h:
+ # which is needed to avoid the deleted-header problem.
+ { sed -e "s,^.*\.[$lower]*:,$object:," < "$tmpdepfile"
+ sed -e "s,^.*\.[$lower]*:[$tab ]*,," -e 's,$,:,' < "$tmpdepfile"
+ } > "$depfile"
+ rm -f "$tmpdepfile"
+ else
+ make_dummy_depfile
+ fi
+}
+
+# A tabulation character.
+tab=' '
+# A newline character.
+nl='
+'
+# Character ranges might be problematic outside the C locale.
+# These definitions help.
+upper=ABCDEFGHIJKLMNOPQRSTUVWXYZ
+lower=abcdefghijklmnopqrstuvwxyz
+digits=0123456789
+alpha=${upper}${lower}
+
if test -z "$depmode" || test -z "$source" || test -z "$object"; then
echo "depcomp: Variables source, object and depmode must be set" 1>&2
exit 1
@@ -69,6 +128,9 @@ tmpdepfile=${tmpdepfile-`echo "$depfile" | sed 's/\.\([^.]*\)$/.T\1/'`}
rm -f "$tmpdepfile"
+# Avoid interferences from the environment.
+gccflag= dashmflag=
+
# Some modes work just like other modes, but use different flags. We
# parameterize here, but still list the modes in the big case below,
# to make depend.m4 easier to write. Note that we *cannot* use a case
@@ -80,26 +142,32 @@ if test "$depmode" = hp; then
fi
if test "$depmode" = dashXmstdout; then
- # This is just like dashmstdout with a different argument.
- dashmflag=-xM
- depmode=dashmstdout
+ # This is just like dashmstdout with a different argument.
+ dashmflag=-xM
+ depmode=dashmstdout
fi
cygpath_u="cygpath -u -f -"
if test "$depmode" = msvcmsys; then
- # This is just like msvisualcpp but w/o cygpath translation.
- # Just convert the backslash-escaped backslashes to single forward
- # slashes to satisfy depend.m4
- cygpath_u='sed s,\\\\,/,g'
- depmode=msvisualcpp
+ # This is just like msvisualcpp but w/o cygpath translation.
+ # Just convert the backslash-escaped backslashes to single forward
+ # slashes to satisfy depend.m4
+ cygpath_u='sed s,\\\\,/,g'
+ depmode=msvisualcpp
fi
if test "$depmode" = msvc7msys; then
- # This is just like msvc7 but w/o cygpath translation.
- # Just convert the backslash-escaped backslashes to single forward
- # slashes to satisfy depend.m4
- cygpath_u='sed s,\\\\,/,g'
- depmode=msvc7
+ # This is just like msvc7 but w/o cygpath translation.
+ # Just convert the backslash-escaped backslashes to single forward
+ # slashes to satisfy depend.m4
+ cygpath_u='sed s,\\\\,/,g'
+ depmode=msvc7
+fi
+
+if test "$depmode" = xlc; then
+ # IBM C/C++ Compilers xlc/xlC can output gcc-like dependency information.
+ gccflag=-qmakedep=gcc,-MF
+ depmode=gcc
fi
case "$depmode" in
@@ -122,8 +190,7 @@ gcc3)
done
"$@"
stat=$?
- if test $stat -eq 0; then :
- else
+ if test $stat -ne 0; then
rm -f "$tmpdepfile"
exit $stat
fi
@@ -131,13 +198,17 @@ gcc3)
;;
gcc)
+## Note that this doesn't just cater to obsosete pre-3.x GCC compilers.
+## but also to in-use compilers like IMB xlc/xlC and the HP C compiler.
+## (see the conditional assignment to $gccflag above).
## There are various ways to get dependency output from gcc. Here's
## why we pick this rather obscure method:
## - Don't want to use -MD because we'd like the dependencies to end
## up in a subdir. Having to rename by hand is ugly.
## (We might end up doing this anyway to support other compilers.)
## - The DEPENDENCIES_OUTPUT environment variable makes gcc act like
-## -MM, not -M (despite what the docs say).
+## -MM, not -M (despite what the docs say). Also, it might not be
+## supported by the other compilers which use the 'gcc' depmode.
## - Using -M directly means running the compiler twice (even worse
## than renaming).
if test -z "$gccflag"; then
@@ -145,33 +216,31 @@ gcc)
fi
"$@" -Wp,"$gccflag$tmpdepfile"
stat=$?
- if test $stat -eq 0; then :
- else
+ if test $stat -ne 0; then
rm -f "$tmpdepfile"
exit $stat
fi
rm -f "$depfile"
echo "$object : \\" > "$depfile"
- alpha=ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
-## The second -e expression handles DOS-style file names with drive letters.
+ # The second -e expression handles DOS-style file names with drive
+ # letters.
sed -e 's/^[^:]*: / /' \
-e 's/^['$alpha']:\/[^:]*: / /' < "$tmpdepfile" >> "$depfile"
-## This next piece of magic avoids the `deleted header file' problem.
+## This next piece of magic avoids the "deleted header file" problem.
## The problem is that when a header file which appears in a .P file
## is deleted, the dependency causes make to die (because there is
## typically no way to rebuild the header). We avoid this by adding
## dummy dependencies for each header file. Too bad gcc doesn't do
## this for us directly.
- tr ' ' '
-' < "$tmpdepfile" |
-## Some versions of gcc put a space before the `:'. On the theory
+## Some versions of gcc put a space before the ':'. On the theory
## that the space means something, we add a space to the output as
## well. hp depmode also adds that space, but also prefixes the VPATH
## to the object. Take care to not repeat it in the output.
## Some versions of the HPUX 10.20 sed can't process this invocation
## correctly. Breaking it into two sed invocations is a workaround.
- sed -e 's/^\\$//' -e '/^$/d' -e "s|.*$object$||" -e '/:$/d' \
- | sed -e 's/$/ :/' >> "$depfile"
+ tr ' ' "$nl" < "$tmpdepfile" \
+ | sed -e 's/^\\$//' -e '/^$/d' -e "s|.*$object$||" -e '/:$/d' \
+ | sed -e 's/$/ :/' >> "$depfile"
rm -f "$tmpdepfile"
;;
@@ -189,8 +258,7 @@ sgi)
"$@" -MDupdate "$tmpdepfile"
fi
stat=$?
- if test $stat -eq 0; then :
- else
+ if test $stat -ne 0; then
rm -f "$tmpdepfile"
exit $stat
fi
@@ -198,43 +266,41 @@ sgi)
if test -f "$tmpdepfile"; then # yes, the sourcefile depend on other files
echo "$object : \\" > "$depfile"
-
# Clip off the initial element (the dependent). Don't try to be
# clever and replace this with sed code, as IRIX sed won't handle
# lines with more than a fixed number of characters (4096 in
# IRIX 6.2 sed, 8192 in IRIX 6.5). We also remove comment lines;
- # the IRIX cc adds comments like `#:fec' to the end of the
+ # the IRIX cc adds comments like '#:fec' to the end of the
# dependency line.
- tr ' ' '
-' < "$tmpdepfile" \
- | sed -e 's/^.*\.o://' -e 's/#.*$//' -e '/^$/ d' | \
- tr '
-' ' ' >> "$depfile"
+ tr ' ' "$nl" < "$tmpdepfile" \
+ | sed -e 's/^.*\.o://' -e 's/#.*$//' -e '/^$/ d' \
+ | tr "$nl" ' ' >> "$depfile"
echo >> "$depfile"
-
# The second pass generates a dummy entry for each header file.
- tr ' ' '
-' < "$tmpdepfile" \
- | sed -e 's/^.*\.o://' -e 's/#.*$//' -e '/^$/ d' -e 's/$/:/' \
- >> "$depfile"
+ tr ' ' "$nl" < "$tmpdepfile" \
+ | sed -e 's/^.*\.o://' -e 's/#.*$//' -e '/^$/ d' -e 's/$/:/' \
+ >> "$depfile"
else
- # The sourcefile does not contain any dependencies, so just
- # store a dummy comment line, to avoid errors with the Makefile
- # "include basename.Plo" scheme.
- echo "#dummy" > "$depfile"
+ make_dummy_depfile
fi
rm -f "$tmpdepfile"
;;
+xlc)
+ # This case exists only to let depend.m4 do its work. It works by
+ # looking at the text of this script. This case will never be run,
+ # since it is checked for above.
+ exit 1
+ ;;
+
aix)
# The C for AIX Compiler uses -M and outputs the dependencies
# in a .u file. In older versions, this file always lives in the
- # current directory. Also, the AIX compiler puts `$object:' at the
+ # current directory. Also, the AIX compiler puts '$object:' at the
# start of each line; $object doesn't have directory information.
# Version 6 uses the directory in both cases.
- dir=`echo "$object" | sed -e 's|/[^/]*$|/|'`
- test "x$dir" = "x$object" && dir=
- base=`echo "$object" | sed -e 's|^.*/||' -e 's/\.o$//' -e 's/\.lo$//'`
+ set_dir_from "$object"
+ set_base_from "$object"
if test "$libtool" = yes; then
tmpdepfile1=$dir$base.u
tmpdepfile2=$base.u
@@ -247,9 +313,7 @@ aix)
"$@" -M
fi
stat=$?
-
- if test $stat -eq 0; then :
- else
+ if test $stat -ne 0; then
rm -f "$tmpdepfile1" "$tmpdepfile2" "$tmpdepfile3"
exit $stat
fi
@@ -258,44 +322,100 @@ aix)
do
test -f "$tmpdepfile" && break
done
- if test -f "$tmpdepfile"; then
- # Each line is of the form `foo.o: dependent.h'.
- # Do two passes, one to just change these to
- # `$object: dependent.h' and one to simply `dependent.h:'.
- sed -e "s,^.*\.[a-z]*:,$object:," < "$tmpdepfile" > "$depfile"
- # That's a tab and a space in the [].
- sed -e 's,^.*\.[a-z]*:[ ]*,,' -e 's,$,:,' < "$tmpdepfile" >> "$depfile"
- else
- # The sourcefile does not contain any dependencies, so just
- # store a dummy comment line, to avoid errors with the Makefile
- # "include basename.Plo" scheme.
- echo "#dummy" > "$depfile"
+ aix_post_process_depfile
+ ;;
+
+tcc)
+ # tcc (Tiny C Compiler) understand '-MD -MF file' since version 0.9.26
+ # FIXME: That version still under development at the moment of writing.
+ # Make that this statement remains true also for stable, released
+ # versions.
+ # It will wrap lines (doesn't matter whether long or short) with a
+ # trailing '\', as in:
+ #
+ # foo.o : \
+ # foo.c \
+ # foo.h \
+ #
+ # It will put a trailing '\' even on the last line, and will use leading
+ # spaces rather than leading tabs (at least since its commit 0394caf7
+ # "Emit spaces for -MD").
+ "$@" -MD -MF "$tmpdepfile"
+ stat=$?
+ if test $stat -ne 0; then
+ rm -f "$tmpdepfile"
+ exit $stat
fi
+ rm -f "$depfile"
+ # Each non-empty line is of the form 'foo.o : \' or ' dep.h \'.
+ # We have to change lines of the first kind to '$object: \'.
+ sed -e "s|.*:|$object :|" < "$tmpdepfile" > "$depfile"
+ # And for each line of the second kind, we have to emit a 'dep.h:'
+ # dummy dependency, to avoid the deleted-header problem.
+ sed -n -e 's|^ *\(.*\) *\\$|\1:|p' < "$tmpdepfile" >> "$depfile"
rm -f "$tmpdepfile"
;;
-icc)
- # Intel's C compiler understands `-MD -MF file'. However on
- # icc -MD -MF foo.d -c -o sub/foo.o sub/foo.c
- # ICC 7.0 will fill foo.d with something like
- # foo.o: sub/foo.c
- # foo.o: sub/foo.h
- # which is wrong. We want:
- # sub/foo.o: sub/foo.c
- # sub/foo.o: sub/foo.h
- # sub/foo.c:
- # sub/foo.h:
- # ICC 7.1 will output
+## The order of this option in the case statement is important, since the
+## shell code in configure will try each of these formats in the order
+## listed in this file. A plain '-MD' option would be understood by many
+## compilers, so we must ensure this comes after the gcc and icc options.
+pgcc)
+ # Portland's C compiler understands '-MD'.
+ # Will always output deps to 'file.d' where file is the root name of the
+ # source file under compilation, even if file resides in a subdirectory.
+ # The object file name does not affect the name of the '.d' file.
+ # pgcc 10.2 will output
# foo.o: sub/foo.c sub/foo.h
- # and will wrap long lines using \ :
+ # and will wrap long lines using '\' :
# foo.o: sub/foo.c ... \
# sub/foo.h ... \
# ...
+ set_dir_from "$object"
+ # Use the source, not the object, to determine the base name, since
+ # that's sadly what pgcc will do too.
+ set_base_from "$source"
+ tmpdepfile=$base.d
+
+ # For projects that build the same source file twice into different object
+ # files, the pgcc approach of using the *source* file root name can cause
+ # problems in parallel builds. Use a locking strategy to avoid stomping on
+ # the same $tmpdepfile.
+ lockdir=$base.d-lock
+ trap "
+ echo '$0: caught signal, cleaning up...' >&2
+ rmdir '$lockdir'
+ exit 1
+ " 1 2 13 15
+ numtries=100
+ i=$numtries
+ while test $i -gt 0; do
+ # mkdir is a portable test-and-set.
+ if mkdir "$lockdir" 2>/dev/null; then
+ # This process acquired the lock.
+ "$@" -MD
+ stat=$?
+ # Release the lock.
+ rmdir "$lockdir"
+ break
+ else
+ # If the lock is being held by a different process, wait
+ # until the winning process is done or we timeout.
+ while test -d "$lockdir" && test $i -gt 0; do
+ sleep 1
+ i=`expr $i - 1`
+ done
+ fi
+ i=`expr $i - 1`
+ done
+ trap - 1 2 13 15
+ if test $i -le 0; then
+ echo "$0: failed to acquire lock after $numtries attempts" >&2
+ echo "$0: check lockdir '$lockdir'" >&2
+ exit 1
+ fi
- "$@" -MD -MF "$tmpdepfile"
- stat=$?
- if test $stat -eq 0; then :
- else
+ if test $stat -ne 0; then
rm -f "$tmpdepfile"
exit $stat
fi
@@ -307,8 +427,8 @@ icc)
sed "s,^[^:]*:,$object :," < "$tmpdepfile" > "$depfile"
# Some versions of the HPUX 10.20 sed can't process this invocation
# correctly. Breaking it into two sed invocations is a workaround.
- sed 's,^[^:]*: \(.*\)$,\1,;s/^\\$//;/^$/d;/:$/d' < "$tmpdepfile" |
- sed -e 's/$/ :/' >> "$depfile"
+ sed 's,^[^:]*: \(.*\)$,\1,;s/^\\$//;/^$/d;/:$/d' < "$tmpdepfile" \
+ | sed -e 's/$/ :/' >> "$depfile"
rm -f "$tmpdepfile"
;;
@@ -319,9 +439,8 @@ hp2)
# 'foo.d', which lands next to the object file, wherever that
# happens to be.
# Much of this is similar to the tru64 case; see comments there.
- dir=`echo "$object" | sed -e 's|/[^/]*$|/|'`
- test "x$dir" = "x$object" && dir=
- base=`echo "$object" | sed -e 's|^.*/||' -e 's/\.o$//' -e 's/\.lo$//'`
+ set_dir_from "$object"
+ set_base_from "$object"
if test "$libtool" = yes; then
tmpdepfile1=$dir$base.d
tmpdepfile2=$dir.libs/$base.d
@@ -332,8 +451,7 @@ hp2)
"$@" +Maked
fi
stat=$?
- if test $stat -eq 0; then :
- else
+ if test $stat -ne 0; then
rm -f "$tmpdepfile1" "$tmpdepfile2"
exit $stat
fi
@@ -343,77 +461,61 @@ hp2)
test -f "$tmpdepfile" && break
done
if test -f "$tmpdepfile"; then
- sed -e "s,^.*\.[a-z]*:,$object:," "$tmpdepfile" > "$depfile"
- # Add `dependent.h:' lines.
+ sed -e "s,^.*\.[$lower]*:,$object:," "$tmpdepfile" > "$depfile"
+ # Add 'dependent.h:' lines.
sed -ne '2,${
- s/^ *//
- s/ \\*$//
- s/$/:/
- p
- }' "$tmpdepfile" >> "$depfile"
+ s/^ *//
+ s/ \\*$//
+ s/$/:/
+ p
+ }' "$tmpdepfile" >> "$depfile"
else
- echo "#dummy" > "$depfile"
+ make_dummy_depfile
fi
rm -f "$tmpdepfile" "$tmpdepfile2"
;;
tru64)
- # The Tru64 compiler uses -MD to generate dependencies as a side
- # effect. `cc -MD -o foo.o ...' puts the dependencies into `foo.o.d'.
- # At least on Alpha/Redhat 6.1, Compaq CCC V6.2-504 seems to put
- # dependencies in `foo.d' instead, so we check for that too.
- # Subdirectories are respected.
- dir=`echo "$object" | sed -e 's|/[^/]*$|/|'`
- test "x$dir" = "x$object" && dir=
- base=`echo "$object" | sed -e 's|^.*/||' -e 's/\.o$//' -e 's/\.lo$//'`
-
- if test "$libtool" = yes; then
- # With Tru64 cc, shared objects can also be used to make a
- # static library. This mechanism is used in libtool 1.4 series to
- # handle both shared and static libraries in a single compilation.
- # With libtool 1.4, dependencies were output in $dir.libs/$base.lo.d.
- #
- # With libtool 1.5 this exception was removed, and libtool now
- # generates 2 separate objects for the 2 libraries. These two
- # compilations output dependencies in $dir.libs/$base.o.d and
- # in $dir$base.o.d. We have to check for both files, because
- # one of the two compilations can be disabled. We should prefer
- # $dir$base.o.d over $dir.libs/$base.o.d because the latter is
- # automatically cleaned when .libs/ is deleted, while ignoring
- # the former would cause a distcleancheck panic.
- tmpdepfile1=$dir.libs/$base.lo.d # libtool 1.4
- tmpdepfile2=$dir$base.o.d # libtool 1.5
- tmpdepfile3=$dir.libs/$base.o.d # libtool 1.5
- tmpdepfile4=$dir.libs/$base.d # Compaq CCC V6.2-504
- "$@" -Wc,-MD
- else
- tmpdepfile1=$dir$base.o.d
- tmpdepfile2=$dir$base.d
- tmpdepfile3=$dir$base.d
- tmpdepfile4=$dir$base.d
- "$@" -MD
- fi
-
- stat=$?
- if test $stat -eq 0; then :
- else
- rm -f "$tmpdepfile1" "$tmpdepfile2" "$tmpdepfile3" "$tmpdepfile4"
- exit $stat
- fi
-
- for tmpdepfile in "$tmpdepfile1" "$tmpdepfile2" "$tmpdepfile3" "$tmpdepfile4"
- do
- test -f "$tmpdepfile" && break
- done
- if test -f "$tmpdepfile"; then
- sed -e "s,^.*\.[a-z]*:,$object:," < "$tmpdepfile" > "$depfile"
- # That's a tab and a space in the [].
- sed -e 's,^.*\.[a-z]*:[ ]*,,' -e 's,$,:,' < "$tmpdepfile" >> "$depfile"
- else
- echo "#dummy" > "$depfile"
- fi
- rm -f "$tmpdepfile"
- ;;
+ # The Tru64 compiler uses -MD to generate dependencies as a side
+ # effect. 'cc -MD -o foo.o ...' puts the dependencies into 'foo.o.d'.
+ # At least on Alpha/Redhat 6.1, Compaq CCC V6.2-504 seems to put
+ # dependencies in 'foo.d' instead, so we check for that too.
+ # Subdirectories are respected.
+ set_dir_from "$object"
+ set_base_from "$object"
+
+ if test "$libtool" = yes; then
+ # Libtool generates 2 separate objects for the 2 libraries. These
+ # two compilations output dependencies in $dir.libs/$base.o.d and
+ # in $dir$base.o.d. We have to check for both files, because
+ # one of the two compilations can be disabled. We should prefer
+ # $dir$base.o.d over $dir.libs/$base.o.d because the latter is
+ # automatically cleaned when .libs/ is deleted, while ignoring
+ # the former would cause a distcleancheck panic.
+ tmpdepfile1=$dir$base.o.d # libtool 1.5
+ tmpdepfile2=$dir.libs/$base.o.d # Likewise.
+ tmpdepfile3=$dir.libs/$base.d # Compaq CCC V6.2-504
+ "$@" -Wc,-MD
+ else
+ tmpdepfile1=$dir$base.d
+ tmpdepfile2=$dir$base.d
+ tmpdepfile3=$dir$base.d
+ "$@" -MD
+ fi
+
+ stat=$?
+ if test $stat -ne 0; then
+ rm -f "$tmpdepfile1" "$tmpdepfile2" "$tmpdepfile3"
+ exit $stat
+ fi
+
+ for tmpdepfile in "$tmpdepfile1" "$tmpdepfile2" "$tmpdepfile3"
+ do
+ test -f "$tmpdepfile" && break
+ done
+ # Same post-processing that is required for AIX mode.
+ aix_post_process_depfile
+ ;;
msvc7)
if test "$libtool" = yes; then
@@ -424,8 +526,7 @@ msvc7)
"$@" $showIncludes > "$tmpdepfile"
stat=$?
grep -v '^Note: including file: ' "$tmpdepfile"
- if test "$stat" = 0; then :
- else
+ if test $stat -ne 0; then
rm -f "$tmpdepfile"
exit $stat
fi
@@ -443,14 +544,15 @@ msvc7)
p
}' | $cygpath_u | sort -u | sed -n '
s/ /\\ /g
-s/\(.*\)/ \1 \\/p
+s/\(.*\)/'"$tab"'\1 \\/p
s/.\(.*\) \\/\1:/
H
$ {
- s/.*/ /
+ s/.*/'"$tab"'/
G
p
}' >> "$depfile"
+ echo >> "$depfile" # make sure the fragment doesn't end with a backslash
rm -f "$tmpdepfile"
;;
@@ -478,7 +580,7 @@ dashmstdout)
shift
fi
- # Remove `-o $object'.
+ # Remove '-o $object'.
IFS=" "
for arg
do
@@ -498,18 +600,18 @@ dashmstdout)
done
test -z "$dashmflag" && dashmflag=-M
- # Require at least two characters before searching for `:'
+ # Require at least two characters before searching for ':'
# in the target name. This is to cope with DOS-style filenames:
- # a dependency such as `c:/foo/bar' could be seen as target `c' otherwise.
+ # a dependency such as 'c:/foo/bar' could be seen as target 'c' otherwise.
"$@" $dashmflag |
- sed 's:^[ ]*[^: ][^:][^:]*\:[ ]*:'"$object"'\: :' > "$tmpdepfile"
+ sed "s|^[$tab ]*[^:$tab ][^:][^:]*:[$tab ]*|$object: |" > "$tmpdepfile"
rm -f "$depfile"
cat < "$tmpdepfile" > "$depfile"
- tr ' ' '
-' < "$tmpdepfile" | \
-## Some versions of the HPUX 10.20 sed can't process this invocation
-## correctly. Breaking it into two sed invocations is a workaround.
- sed -e 's/^\\$//' -e '/^$/d' -e '/:$/d' | sed -e 's/$/ :/' >> "$depfile"
+ # Some versions of the HPUX 10.20 sed can't process this sed invocation
+ # correctly. Breaking it into two sed invocations is a workaround.
+ tr ' ' "$nl" < "$tmpdepfile" \
+ | sed -e 's/^\\$//' -e '/^$/d' -e '/:$/d' \
+ | sed -e 's/$/ :/' >> "$depfile"
rm -f "$tmpdepfile"
;;
@@ -562,11 +664,12 @@ makedepend)
# makedepend may prepend the VPATH from the source file name to the object.
# No need to regex-escape $object, excess matching of '.' is harmless.
sed "s|^.*\($object *:\)|\1|" "$tmpdepfile" > "$depfile"
- sed '1,2d' "$tmpdepfile" | tr ' ' '
-' | \
-## Some versions of the HPUX 10.20 sed can't process this invocation
-## correctly. Breaking it into two sed invocations is a workaround.
- sed -e 's/^\\$//' -e '/^$/d' -e '/:$/d' | sed -e 's/$/ :/' >> "$depfile"
+ # Some versions of the HPUX 10.20 sed can't process the last invocation
+ # correctly. Breaking it into two sed invocations is a workaround.
+ sed '1,2d' "$tmpdepfile" \
+ | tr ' ' "$nl" \
+ | sed -e 's/^\\$//' -e '/^$/d' -e '/:$/d' \
+ | sed -e 's/$/ :/' >> "$depfile"
rm -f "$tmpdepfile" "$tmpdepfile".bak
;;
@@ -583,7 +686,7 @@ cpp)
shift
fi
- # Remove `-o $object'.
+ # Remove '-o $object'.
IFS=" "
for arg
do
@@ -602,10 +705,10 @@ cpp)
esac
done
- "$@" -E |
- sed -n -e '/^# [0-9][0-9]* "\([^"]*\)".*/ s:: \1 \\:p' \
- -e '/^#line [0-9][0-9]* "\([^"]*\)".*/ s:: \1 \\:p' |
- sed '$ s: \\$::' > "$tmpdepfile"
+ "$@" -E \
+ | sed -n -e '/^# [0-9][0-9]* "\([^"]*\)".*/ s:: \1 \\:p' \
+ -e '/^#line [0-9][0-9]* "\([^"]*\)".*/ s:: \1 \\:p' \
+ | sed '$ s: \\$::' > "$tmpdepfile"
rm -f "$depfile"
echo "$object : \\" > "$depfile"
cat < "$tmpdepfile" >> "$depfile"
@@ -637,23 +740,23 @@ msvisualcpp)
shift
;;
"-Gm"|"/Gm"|"-Gi"|"/Gi"|"-ZI"|"/ZI")
- set fnord "$@"
- shift
- shift
- ;;
+ set fnord "$@"
+ shift
+ shift
+ ;;
*)
- set fnord "$@" "$arg"
- shift
- shift
- ;;
+ set fnord "$@" "$arg"
+ shift
+ shift
+ ;;
esac
done
"$@" -E 2>/dev/null |
sed -n '/^#line [0-9][0-9]* "\([^"]*\)"/ s::\1:p' | $cygpath_u | sort -u > "$tmpdepfile"
rm -f "$depfile"
echo "$object : \\" > "$depfile"
- sed < "$tmpdepfile" -n -e 's% %\\ %g' -e '/^\(.*\)$/ s:: \1 \\:p' >> "$depfile"
- echo " " >> "$depfile"
+ sed < "$tmpdepfile" -n -e 's% %\\ %g' -e '/^\(.*\)$/ s::'"$tab"'\1 \\:p' >> "$depfile"
+ echo "$tab" >> "$depfile"
sed < "$tmpdepfile" -n -e 's% %\\ %g' -e '/^\(.*\)$/ s::\1\::p' >> "$depfile"
rm -f "$tmpdepfile"
;;
diff --git a/install-userspace.sh b/install-userspace.sh
index 8178bf1..94d3d95 100755
--- a/install-userspace.sh
+++ b/install-userspace.sh
@@ -13,7 +13,7 @@
#
py_pkgs=( "biopython" "dendropy" )
-deb_urls=( "http://uk.archive.ubuntu.com/ubuntu/pool/universe/r/raxml/raxml_7.2.8-2_amd64.deb" "https://launchpad.net/~ap13/+archive/ubuntu/gubbins/+files/fastml2_2.3~trusty1_amd64.deb" "https://launchpad.net/~ap13/+archive/ubuntu/gubbins/+files/gubbins_1.3.3~trusty1_amd64.deb" )
+deb_urls=( "http://uk.archive.ubuntu.com/ubuntu/pool/universe/r/raxml/raxml_7.2.8-2_amd64.deb" "https://launchpad.net/~ap13/+archive/ubuntu/gubbins/+files/gubbins_1.3.3~trusty1_amd64.deb" )
function check_platform {
# Ubuntu 14.04
diff --git a/install_dependencies.sh b/install_dependencies.sh
index 5ba857a..00706a6 100755
--- a/install_dependencies.sh
+++ b/install_dependencies.sh
@@ -6,11 +6,9 @@ set -e
start_dir=$(pwd)
RAXML_VERSION="8.1.21"
-FASTML_VERSION="3.1"
FASTTREE_VERSION="2.1.9"
RAXML_DOWNLOAD_URL="https://github.com/stamatak/standard-RAxML/archive/v${RAXML_VERSION}.tar.gz"
-FASTML_DOWNLOAD_URL="http://fastml.tau.ac.il/source/FastML.v${FASTML_VERSION}.tgz"
FASTTREE_DOWNLOAD_URL="http://www.microbesonline.org/fasttree/FastTree-${FASTTREE_VERSION}.c"
# Make an install location
@@ -34,7 +32,6 @@ download () {
}
download $RAXML_DOWNLOAD_URL "raxml-${RAXML_VERSION}.tgz"
-download $FASTML_DOWNLOAD_URL "fastml-${FASTML_VERSION}.tgz"
download $FASTTREE_DOWNLOAD_URL "fasttree-${FASTTREE_VERSION}.c"
# Update dependencies
@@ -67,28 +64,6 @@ fi
cd $build_dir
-## FASTML
-fastml_dir=$(pwd)/"FastML.v${FASTML_VERSION}"
-
-if [ ! -d $fastml_dir ]; then
- tar xzf fastml-${FASTML_VERSION}.tgz
- ls -al
-fi
-cd $fastml_dir
-if [ -e "${fastml_dir}/programs/fastml/fastml" ]; then
- echo "Already build FASTML; skipping build"
-else
- sed -i 's/getopt/fastml_getopt/g' libs/phylogeny/phylogeny.vcxproj
- sed -i 's/getopt/fastml_getopt/g' libs/phylogeny/phylogeny.vcproj
- mv libs/phylogeny/getopt.h libs/phylogeny/fastml_getopt.h
- mv libs/phylogeny/getopt.c libs/phylogeny/fastml_getopt.c
- mv libs/phylogeny/getopt1.c libs/phylogeny/fastml_getopt1.c
-
- make
-fi
-
-cd $build_dir
-
## FastTree
fasttree_dir=${build_dir}/fasttree-${FASTTREE_VERSION}
if [ ! -d $fasttree_dir ]; then
@@ -110,7 +85,6 @@ update_path () {
}
update_path ${raxml_dir}
-update_path ${fastml_dir}/programs/fastml
update_path ${fasttree_dir}
cd $start_dir
diff --git a/python/gubbins/Fastml.py b/python/gubbins/Fastml.py
deleted file mode 100644
index daf604b..0000000
--- a/python/gubbins/Fastml.py
+++ /dev/null
@@ -1,63 +0,0 @@
-import os
-import re
-import subprocess
-
-class Fastml(object):
- def __init__(self, fastml_exec = None):
- self.fastml_exec = fastml_exec
- self.fastml_version = None
- self.fastml_model = None
- self.fastml_parameters = self.__calculate_parameters__()
-
- def __calculate_parameters__(self):
- if(self.which(self.fastml_exec) == None):
- return None
-
- if re.search('nucgtr', str(self.__run_without_options__())):
- self.fastml_version = 3
- self.fastml_model = 'g'
- print("Using FastML 3 with GTR model\n")
- else:
- self.fastml_version = 2
-
- if re.search('General time Reversible', str(self.__run_with_fake_file__())):
- self.fastml_model = 'g'
- print("Using Gubbins patched FastML 2 with GTR model\n")
- else:
- self.fastml_model = 'n'
- print("Using FastML 2 with Jukes Cantor model\n")
-
- return self.fastml_exec + " -qf -b -a 0.00001 -m"+self.fastml_model+" "
-
-
- def __run_with_fake_file__(self):
-
- # Create a minimal FASTA file
- with open('.seq.aln','w') as out:
- out.writelines(['>1','A','>2','A'])
-
- cmd = self.fastml_exec + " -qf -b -a 0.00001 -mg -s .seq.aln -t doesnt_exist.tre"
- output = subprocess.Popen(cmd, stdout = subprocess.PIPE, shell=True).communicate()[0]
- os.remove('.seq.aln')
- return output
-
- def __run_without_options__(self):
- return subprocess.Popen(self.fastml_exec, stdout = subprocess.PIPE, shell=True).communicate()[0]
-
- def which(self,program):
- executable = program.split(" ")
- program = executable[0]
- def is_exe(fpath):
- return os.path.isfile(fpath) and os.access(fpath, os.X_OK)
- fpath, fname = os.path.split(program)
- if fpath:
- if is_exe(program):
- return program
- else:
- for path in os.environ["PATH"].split(os.pathsep):
- exe_file = os.path.join(path, program)
- if is_exe(exe_file):
- return exe_file
-
- return None
-
\ No newline at end of file
diff --git a/python/gubbins/RAxMLExecutable.py b/python/gubbins/RAxMLExecutable.py
new file mode 100644
index 0000000..a27188b
--- /dev/null
+++ b/python/gubbins/RAxMLExecutable.py
@@ -0,0 +1,105 @@
+# encoding: utf-8
+# Wellcome Trust Sanger Institute
+# Copyright (C) 2013 Wellcome Trust Sanger Institute
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+#
+
+import os
+import sys
+import subprocess
+import re
+
+class RAxMLExecutable(object):
+ def __init__(self, threads, verbose = False ):
+ self.verbose = verbose
+ self.threads = threads
+ self.single_threaded_executables = ['raxmlHPC-AVX','raxmlHPC-SSE3','raxmlHPC']
+ self.multi_threaded_executables = ['raxmlHPC-PTHREADS-AVX','raxmlHPC-PTHREADS-SSE3','raxmlHPC-PTHREADS']
+
+ self.raxml_executable = self.select_executable_based_on_threads()
+ self.tree_building_parameters = ' -f d -p 1 -m GTRGAMMA '
+ self.internal_sequence_parameters = ' -f A -p 1 -m GTRGAMMA '
+
+ def tree_building_command(self):
+ command = self.raxml_executable + self.threads_parameter() + self.tree_building_parameters
+ if self.verbose:
+ print("Tree building command: "+command)
+ return command
+
+ def internal_sequence_reconstruction_command(self):
+ command = self.raxml_executable + self.threads_parameter() + self.internal_sequence_parameters
+ if self.verbose:
+ print("Internal sequence reconstruction command: "+command)
+ return command
+
+ def choose_executable_from_list(self,list_of_executables):
+ flags = []
+ if os.path.exists('/proc/cpuinfo'):
+ output = subprocess.Popen('grep flags /proc/cpuinfo', stdout = subprocess.PIPE, shell=True).communicate()[0].decode("utf-8")
+ flags = output.split()
+
+ for executable in list_of_executables:
+ if os.path.exists('/proc/cpuinfo'):
+ if re.search('AVX', executable) and 'avx' not in flags:
+ continue
+ elif re.search('SSE3', executable) and 'ssse3' not in flags:
+ continue
+
+ if self.which(executable) != None:
+ return executable
+
+ return None
+
+ def which(self,program):
+ executable = program.split(" ")
+ program = executable[0]
+ def is_exe(fpath):
+ return os.path.isfile(fpath) and os.access(fpath, os.X_OK)
+ fpath, fname = os.path.split(program)
+ if fpath:
+ if is_exe(program):
+ return program
+ else:
+ for path in os.environ["PATH"].split(os.pathsep):
+ exe_file = os.path.join(path, program)
+ if is_exe(exe_file):
+ return exe_file
+
+ return None
+
+ def threads_parameter(self):
+ if self.threads > 1:
+ return " -T " + str(self.threads) + " "
+ else:
+ return ""
+
+ def select_executable_based_on_threads(self):
+ single_threaded_exec = self.choose_executable_from_list(self.single_threaded_executables)
+ multi_threaded_exec = self.choose_executable_from_list(self.multi_threaded_executables)
+
+ if self.threads == 1:
+ if single_threaded_exec != None:
+ return single_threaded_exec
+ else:
+ print("Trying multithreaded version of RAxML because no single threaded version of RAxML could be found. Just to warn you, this requires 2 threads.\n")
+ self.threads = 2
+
+ if self.threads > 1:
+ if multi_threaded_exec != None:
+ return multi_threaded_exec
+ else:
+ sys.exit("No usable version of RAxML could be found, please ensure one of these executables is in your PATH:\nraxmlHPC-PTHREADS-AVX\nraxmlHPC-PTHREADS-SSE3\nraxmlHPC-PTHREADS\n raxmlHPC-AVX\nraxmlHPC-SSE3\nraxmlHPC")
+
diff --git a/python/gubbins/RAxMLSequenceReconstruction.py b/python/gubbins/RAxMLSequenceReconstruction.py
new file mode 100644
index 0000000..b0b6fdf
--- /dev/null
+++ b/python/gubbins/RAxMLSequenceReconstruction.py
@@ -0,0 +1,176 @@
+# encoding: utf-8
+# Wellcome Trust Sanger Institute
+# Copyright (C) 2013 Wellcome Trust Sanger Institute
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+#
+
+import os
+import sys
+import tempfile
+import dendropy
+import subprocess
+import shutil
+import time
+from random import randint
+from Bio import AlignIO
+from Bio.Align import MultipleSeqAlignment
+
+class RAxMLSequenceReconstruction(object):
+ def __init__(self, input_alignment_filename, input_tree, output_alignment_filename, output_tree, raxml_internal_sequence_reconstruction_command, verbose = False ):
+ self.input_alignment_filename = os.path.abspath(input_alignment_filename)
+ self.input_tree = os.path.abspath(input_tree)
+ self.output_alignment_filename = os.path.abspath(output_alignment_filename)
+ self.output_tree = os.path.abspath(output_tree)
+ self.raxml_internal_sequence_reconstruction_command = raxml_internal_sequence_reconstruction_command
+ self.verbose = verbose
+
+ self.working_dir = tempfile.mkdtemp(dir=os.getcwd())
+ self.temp_rooted_tree = self.working_dir +'/' +'rooted_tree.newick'
+ self.temp_interal_fasta = self.working_dir +'/' +'internal.fasta'
+ self.internal_node_prefix = 'internal_'
+
+ def reconstruct_ancestor_sequences(self):
+ self.root_tree(self.input_tree, self.temp_rooted_tree)
+
+ self.run_raxml_ancestor_command(self.temp_rooted_tree)
+ self.convert_raw_ancestral_states_to_fasta(self.raw_internal_sequence_filename(), self.temp_interal_fasta)
+ self.combine_fastas(self.input_alignment_filename, self.temp_interal_fasta,self.output_alignment_filename)
+
+ if os.path.exists(self.temp_rooted_tree):
+ self.transfer_internal_names_to_tree(self.raw_internal_rooted_tree_filename(), self.temp_rooted_tree, self.output_tree)
+
+ shutil.rmtree(self.working_dir)
+
+ def run_raxml_ancestor_command(self,rooted_tree):
+ current_directory = os.getcwd()
+ if self.verbose > 0:
+ print(self.raxml_reconstruction_command(rooted_tree))
+ try:
+ os.chdir(self.working_dir)
+ subprocess.check_call(self.raxml_reconstruction_command(rooted_tree), shell=True)
+ os.chdir(current_directory)
+ except:
+ os.chdir(current_directory)
+ sys.exit("Something went wrong while creating the ancestor sequences using RAxML")
+ if self.verbose > 0:
+ print(int(time.time()))
+
+ def raw_internal_sequence_filename(self):
+ return self.working_dir +'/RAxML_marginalAncestralStates.internal'
+
+ def raw_internal_rooted_tree_filename(self):
+ return self.working_dir +'/RAxML_nodeLabelledRootedTree.internal'
+
+ def raxml_reconstruction_command(self,rooted_tree):
+ verbose_suffix = ''
+ if not self.verbose:
+ verbose_suffix = '> /dev/null 2>&1'
+
+ return " ".join([self.raxml_internal_sequence_reconstruction_command, ' -s', self.input_alignment_filename, '-t', rooted_tree, '-n', 'internal' ,verbose_suffix ])
+
+ def write_tree(self, tree, output_tree):
+ output_tree_string = tree.as_string(
+ schema='newick',
+ suppress_leaf_taxon_labels=False,
+ suppress_leaf_node_labels=True,
+ suppress_internal_taxon_labels=False,
+ suppress_internal_node_labels=False,
+ suppress_rooting=False,
+ suppress_edge_lengths=False,
+ unquoted_underscores=True,
+ preserve_spaces=False,
+ store_tree_weights=False,
+ suppress_annotations=True,
+ annotations_as_nhx=False,
+ suppress_item_comments=True,
+ node_label_element_separator=' '
+ )
+ with open(output_tree, 'w+') as output_file:
+ output_file.write(output_tree_string.replace('\'', ''))
+ output_file.closed
+
+ return output_tree
+
+ def transfer_internal_names_to_tree(self, source_tree, destination_tree, output_tree):
+ source_tree_obj = dendropy.Tree.get_from_path(source_tree, 'newick', preserve_underscores=True)
+ source_internal_node_labels = []
+ for source_internal_node in source_tree_obj.internal_nodes():
+ if source_internal_node.label:
+ source_internal_node_labels.append(source_internal_node.label)
+ else:
+ source_internal_node_labels.append('')
+
+ destination_tree_obj = dendropy.Tree.get_from_path(destination_tree, 'newick', preserve_underscores=True)
+ for index, destination_internal_node in enumerate(destination_tree_obj.internal_nodes()):
+ destination_internal_node.label = None
+ destination_internal_node.taxon = dendropy.Taxon(self.internal_node_prefix + str(source_internal_node_labels[index]))
+ self.write_tree( destination_tree_obj, output_tree)
+
+
+ def root_tree(self, input_tree_filename, output_tree):
+ # split bi nodes and root tree
+ tree = dendropy.Tree.get_from_path(input_tree_filename, 'newick', preserve_underscores=True)
+ self.split_all_non_bi_nodes(tree.seed_node)
+ self.write_tree( tree, output_tree)
+
+
+ def convert_raw_ancestral_states_to_fasta(self, input_filename, output_filename):
+ with open(input_filename, 'r') as infile:
+ with open(output_filename, 'w+') as outfile:
+ for sequence_line in infile:
+ [sequence_name, sequence_bases] = sequence_line.split(' ')
+ sequence_bases = sequence_bases.replace('?', 'N')
+ outfile.write('>'+sequence_name+'\n')
+ outfile.write(sequence_bases)
+
+ # Warning - recursion
+ def split_all_non_bi_nodes(self, node):
+ if node.is_leaf():
+ return None
+ elif len(node.child_nodes()) > 2:
+ self.split_child_nodes(node)
+ for child_node in node.child_nodes():
+ self.split_all_non_bi_nodes(child_node)
+ return None
+
+ def split_child_nodes(self,node):
+ all_child_nodes = node.child_nodes()
+ first_child = all_child_nodes.pop()
+ new_child_node = node.new_child(edge_length=0)
+ new_child_node.set_child_nodes(all_child_nodes)
+ node.set_child_nodes((first_child,new_child_node))
+
+ def combine_fastas(self, leaf_node_filename, internl_node_filename, output_file ):
+ with open(output_file, 'w') as output_handle:
+ # print out leafnodes as is
+ with open(leaf_node_filename, 'r') as input_handle:
+ alignments = AlignIO.parse(input_handle, "fasta")
+ AlignIO.write(alignments,output_handle, "fasta")
+ input_handle.closed
+
+ with open(internl_node_filename, 'r') as input_handle:
+ alignments = AlignIO.parse(input_handle, "fasta")
+ output_alignments = []
+ for alignment in alignments:
+ for record in alignment:
+ record.id = self.internal_node_prefix + str(record.id)
+ record.description = ''
+ output_alignments.append(record)
+
+ AlignIO.write(MultipleSeqAlignment(output_alignments),output_handle, "fasta")
+ input_handle.closed
+ output_handle.closed
+
\ No newline at end of file
diff --git a/python/gubbins/__init__.py b/python/gubbins/__init__.py
index b1face2..51ff429 100644
--- a/python/gubbins/__init__.py
+++ b/python/gubbins/__init__.py
@@ -13,7 +13,6 @@ import os
## Populate the 'gubbins' namespace
from gubbins import common
-from gubbins import Fastml
###############################################################################
## PACKAGE METADATA
diff --git a/python/gubbins/common.py b/python/gubbins/common.py
index 7b45b9b..e1ae21d 100644
--- a/python/gubbins/common.py
+++ b/python/gubbins/common.py
@@ -34,9 +34,11 @@ import subprocess
import sys
import tempfile
import time
-from gubbins.Fastml import Fastml
from gubbins.PreProcessFasta import PreProcessFasta
from gubbins.ValidateFastaAlignment import ValidateFastaAlignment
+from gubbins.RAxMLSequenceReconstruction import RAxMLSequenceReconstruction
+from gubbins.RAxMLExecutable import RAxMLExecutable
+
class GubbinsError(Exception):
def __init__(self, value,message):
@@ -91,17 +93,8 @@ class GubbinsCommon():
@staticmethod
def choose_executable(list_of_executables):
flags = []
- if os.path.exists('/proc/cpuinfo'):
- output = subprocess.Popen('grep flags /proc/cpuinfo', stdout = subprocess.PIPE, shell=True).communicate()[0].decode("utf-8")
- flags = output.split()
for executable in list_of_executables:
- if os.path.exists('/proc/cpuinfo'):
- if re.search('AVX', executable) and 'avx' not in flags:
- continue
- elif re.search('SSE3', executable) and 'ssse3' not in flags:
- continue
-
if GubbinsCommon.which(executable) != None:
return executable
@@ -109,21 +102,7 @@ class GubbinsCommon():
def parse_and_run(self):
# Default parameters
- raxml_executables = ['raxmlHPC-AVX','raxmlHPC-SSE3','raxmlHPC']
- if self.args.threads > 1:
- raxml_executables = ['raxmlHPC-PTHREADS-AVX','raxmlHPC-PTHREADS-SSE3','raxmlHPC-PTHREADS','raxmlHPC-AVX','raxmlHPC-SSE3','raxmlHPC']
- raxml_executable = GubbinsCommon.choose_executable(raxml_executables)
-
- # raxml PTHREADS needs 2 or more threads, however some systems dont come with the single threaded exec
- if self.args.threads == 1 and raxml_executable == "":
- self.args.threads = 2
- raxml_executables = ['raxmlHPC-PTHREADS-AVX','raxmlHPC-PTHREADS-SSE3','raxmlHPC-PTHREADS']
- print("Trying PTHREADS version of raxml because no single threaded version of raxml could be found. Just to warn you, this requires 2 threads.\n")
- raxml_executable = GubbinsCommon.choose_executable(raxml_executables)
-
- RAXML_EXEC = raxml_executable+' -f d -p 1 -m GTRGAMMA'
- if re.search('PTHREADS', str(RAXML_EXEC)) != None:
- RAXML_EXEC = RAXML_EXEC+" -T " +str(self.args.threads)
+ raxml_executable_obj = RAxMLExecutable(self.args.threads, self.args.verbose)
fasttree_executables = ['FastTree','fasttree']
FASTTREE_EXEC = GubbinsCommon.choose_executable(fasttree_executables)
@@ -131,8 +110,6 @@ class GubbinsCommon():
FASTTREE_PARAMS = '-nosupport -gtr -gamma -nt'
GUBBINS_EXEC = 'gubbins'
- FASTML_EXEC = Fastml('fastml').fastml_parameters
-
GUBBINS_BUNDLED_EXEC = '../src/gubbins'
# check that all the external executable dependancies are available
@@ -140,10 +117,6 @@ class GubbinsCommon():
GUBBINS_EXEC = GubbinsCommon.use_bundled_exec(GUBBINS_EXEC, GUBBINS_BUNDLED_EXEC)
if GubbinsCommon.which(GUBBINS_EXEC) is None:
sys.exit(GUBBINS_EXEC+" is not in your path")
- if GubbinsCommon.which(FASTML_EXEC) is None:
- sys.exit("fastml is not in your path")
- if (self.args.tree_builder == "raxml" or self.args.tree_builder == "hybrid") and GubbinsCommon.which(RAXML_EXEC) is None:
- sys.exit("RAxML is not in your path")
if self.args.tree_builder == "fasttree" or self.args.tree_builder == "hybrid":
if GubbinsCommon.which(FASTTREE_EXEC) is None:
@@ -206,11 +179,10 @@ class GubbinsCommon():
GubbinsCommon.reconvert_fasta_file(starting_base_filename+".gaps.snp_sites.aln",starting_base_filename+".start")
- # Perform pairwise comparison if there are only 2 sequences
+
number_of_sequences = GubbinsCommon.number_of_sequences_in_alignment(self.args.alignment_filename)
- if(number_of_sequences == 2):
- GubbinsCommon.pairwise_comparison(self.args.alignment_filename,starting_base_filename,GUBBINS_EXEC,self.args.alignment_filename,FASTML_EXEC,base_filename_without_ext)
- sys.exit()
+ if(number_of_sequences < 3):
+ sys.exit("4 or more sequences are required.")
latest_file_name = "latest_tree."+base_filename_without_ext+"."+str(current_time)+"tre"
tree_file_names = []
@@ -237,27 +209,24 @@ class GubbinsCommon():
previous_tree_name = GubbinsCommon.fasttree_previous_tree_name(base_filename, i)
current_tree_name = GubbinsCommon.fasttree_current_tree_name(base_filename, i)
tree_building_command = GubbinsCommon.fasttree_tree_building_command(i, self.args.starting_tree,current_tree_name,base_filename,previous_tree_name,FASTTREE_EXEC, FASTTREE_PARAMS,base_filename )
- fastml_command = GubbinsCommon.fasttree_fastml_command(FASTML_EXEC, starting_base_filename+".snp_sites.aln", base_filename, i)
gubbins_command = GubbinsCommon.fasttree_gubbins_command(base_filename,starting_base_filename+".gaps", i,self.args.alignment_filename,GUBBINS_EXEC,self.args.min_snps,self.args.alignment_filename, self.args.min_window_size,self.args.max_window_size)
elif i == 2:
previous_tree_name = current_tree_name
current_tree_name = GubbinsCommon.raxml_current_tree_name(base_filename_without_ext,current_time, i)
- tree_building_command = GubbinsCommon.raxml_tree_building_command(i,base_filename_without_ext,base_filename,current_time,RAXML_EXEC,previous_tree_name, self.args.verbose)
- fastml_command = GubbinsCommon.raxml_fastml_command(FASTML_EXEC, starting_base_filename+".snp_sites.aln", base_filename_without_ext,current_time, i)
+ tree_building_command = GubbinsCommon.raxml_tree_building_command(i,base_filename_without_ext,base_filename,current_time,raxml_executable_obj.tree_building_command(),previous_tree_name, self.args.verbose)
gubbins_command = GubbinsCommon.raxml_gubbins_command(base_filename_without_ext,starting_base_filename+".gaps",current_time, i,self.args.alignment_filename,GUBBINS_EXEC,self.args.min_snps,self.args.alignment_filename, self.args.min_window_size,self.args.max_window_size)
else:
previous_tree_name = GubbinsCommon.raxml_previous_tree_name(base_filename_without_ext,base_filename, current_time,i)
current_tree_name = GubbinsCommon.raxml_current_tree_name(base_filename_without_ext,current_time, i)
- tree_building_command = GubbinsCommon.raxml_tree_building_command(i,base_filename_without_ext,base_filename,current_time,RAXML_EXEC,previous_tree_name, self.args.verbose)
- fastml_command = GubbinsCommon.raxml_fastml_command(FASTML_EXEC, starting_base_filename+".snp_sites.aln", base_filename_without_ext,current_time, i)
+ tree_building_command = GubbinsCommon.raxml_tree_building_command(i,base_filename_without_ext,base_filename,current_time,raxml_executable_obj.tree_building_command(),previous_tree_name, self.args.verbose)
gubbins_command = GubbinsCommon.raxml_gubbins_command(base_filename_without_ext,starting_base_filename+".gaps",current_time, i,self.args.alignment_filename,GUBBINS_EXEC,self.args.min_snps,self.args.alignment_filename, self.args.min_window_size,self.args.max_window_size)
elif self.args.tree_builder == "raxml":
previous_tree_name = GubbinsCommon.raxml_previous_tree_name(base_filename_without_ext,base_filename, current_time,i)
current_tree_name = GubbinsCommon.raxml_current_tree_name(base_filename_without_ext,current_time, i)
- tree_building_command = GubbinsCommon.raxml_tree_building_command(i,base_filename_without_ext,base_filename,current_time,RAXML_EXEC,previous_tree_name, self.args.verbose)
- fastml_command = GubbinsCommon.raxml_fastml_command(FASTML_EXEC, starting_base_filename+".snp_sites.aln", base_filename_without_ext,current_time, i)
+ tree_building_command = GubbinsCommon.raxml_tree_building_command(i,base_filename_without_ext,base_filename,current_time,raxml_executable_obj.tree_building_command(),previous_tree_name, self.args.verbose)
+
gubbins_command = GubbinsCommon.raxml_gubbins_command(base_filename_without_ext,starting_base_filename+".gaps",current_time, i,self.args.alignment_filename,GUBBINS_EXEC,self.args.min_snps,self.args.alignment_filename, self.args.min_window_size,self.args.max_window_size)
elif self.args.tree_builder == "fasttree":
@@ -267,7 +236,6 @@ class GubbinsCommon():
current_tree_name = GubbinsCommon.fasttree_current_tree_name(base_filename, i)
tree_building_command = GubbinsCommon.fasttree_tree_building_command(i, self.args.starting_tree,current_tree_name,previous_tree_name,previous_tree_name,FASTTREE_EXEC,FASTTREE_PARAMS,base_filename )
- fastml_command = GubbinsCommon.fasttree_fastml_command(FASTML_EXEC, starting_base_filename+".snp_sites.aln", base_filename, i)
gubbins_command = GubbinsCommon.fasttree_gubbins_command(base_filename,starting_base_filename+".gaps", i,self.args.alignment_filename,GUBBINS_EXEC,self.args.min_snps,self.args.alignment_filename, self.args.min_window_size,self.args.max_window_size)
if self.args.verbose > 0:
@@ -287,22 +255,18 @@ class GubbinsCommon():
GubbinsCommon.reroot_tree(str(current_tree_name), self.args.outgroup)
- fastml_command_suffix = ' > /dev/null 2>&1'
- if self.args.verbose > 0:
- print(fastml_command)
- fastml_command_suffix = ''
+ try:
+ raxml_seq_recon = RAxMLSequenceReconstruction(starting_base_filename+".snp_sites.aln", current_tree_name, starting_base_filename+".seq.joint.txt", current_tree_name , raxml_executable_obj.internal_sequence_reconstruction_command(), self.args.verbose)
+ raxml_seq_recon.reconstruct_ancestor_sequences()
- try:
- subprocess.check_call(fastml_command+fastml_command_suffix, shell=True)
except:
- sys.exit("Failed while running FastML")
+ sys.exit("Failed while running RAxML internal sequence reconstruction")
- shutil.copyfile(current_tree_name+'.output_tree',current_tree_name)
shutil.copyfile(starting_base_filename+".start", starting_base_filename+".gaps.snp_sites.aln")
- GubbinsCommon.reinsert_gaps_into_fasta_file(current_tree_name+'.seq.joint.txt', starting_base_filename +".gaps.vcf", starting_base_filename+".gaps.snp_sites.aln")
+ GubbinsCommon.reinsert_gaps_into_fasta_file(starting_base_filename+".seq.joint.txt", starting_base_filename +".gaps.vcf", starting_base_filename+".gaps.snp_sites.aln")
if(GubbinsCommon.does_file_exist(starting_base_filename+".gaps.snp_sites.aln", 'Alignment File') == 0 or not ValidateFastaAlignment(starting_base_filename+".gaps.snp_sites.aln").is_input_fasta_file_valid() ):
- sys.exit("There is a problem with your FASTA file after running FASTML. Please check this intermediate file is valid: "+ str(starting_base_filename)+".gaps.snp_sites.aln")
+ sys.exit("There is a problem with your FASTA file after running RAxML internal sequence reconstruction. Please check this intermediate file is valid: "+ str(starting_base_filename)+".gaps.snp_sites.aln")
if self.args.verbose > 0:
print(int(time.time()))
@@ -629,21 +593,6 @@ class GubbinsCommon():
return GubbinsCommon.translation_of_filenames_to_final_filenames("RAxML_result."+GubbinsCommon.raxml_base_name(base_filename_without_ext,current_time)+str(max_intermediate_iteration), output_prefix)
@staticmethod
- def translation_of_filenames_to_final_filenames_pairwise(input_prefix, output_prefix):
- input_names_to_output_names = {
- str(input_prefix)+".vcf": str(output_prefix)+".summary_of_snp_distribution.vcf" ,
- str(input_prefix)+".branch_snps.tab": str(output_prefix)+".branch_base_reconstruction.embl" ,
- str(input_prefix)+".tab": str(output_prefix)+".recombination_predictions.embl" ,
- str(input_prefix)+".gff": str(output_prefix)+".recombination_predictions.gff" ,
- str(input_prefix)+".stats": str(output_prefix)+".per_branch_statistics.csv" ,
- str(input_prefix)+".snp_sites.aln": str(output_prefix)+".filtered_polymorphic_sites.fasta" ,
- str(input_prefix)+".phylip": str(output_prefix)+".filtered_polymorphic_sites.phylip",
- str(input_prefix)+".output_tree": str(output_prefix)+".node_labelled.tre",
- str(input_prefix)+".tre": str(output_prefix)+".final_tree.tre"
- }
- return input_names_to_output_names
-
- @staticmethod
def translation_of_filenames_to_final_filenames(input_prefix, output_prefix):
input_names_to_output_names = {
str(input_prefix)+".vcf": str(output_prefix)+".summary_of_snp_distribution.vcf" ,
@@ -684,29 +633,6 @@ class GubbinsCommon():
return gubbins_exec+" -r -v "+starting_base_filename+".vcf"+" -a "+str(min_window_size)+" -b "+str(max_window_size) + " -f "+original_aln+" -t "+str(current_tree_name)+" -m "+ str(min_snps)+" "+ starting_base_filename+".snp_sites.aln"
@staticmethod
- def fasttree_fastml_command(fastml_exec, alignment_filename, base_filename,i):
- current_tree_name = GubbinsCommon.fasttree_current_tree_name(base_filename, i)
- return GubbinsCommon.generate_fastml_command(fastml_exec, alignment_filename, current_tree_name)
-
- @staticmethod
- def raxml_fastml_command(fastml_exec, alignment_filename, base_filename_without_ext,current_time, i):
- current_tree_name = GubbinsCommon.raxml_current_tree_name(base_filename_without_ext,current_time, i)
- return GubbinsCommon.generate_fastml_command(fastml_exec, alignment_filename, current_tree_name)
-
- @staticmethod
- def generate_fastml_command(fastml_exec, alignment_filename, tree_filename):
-
- return (fastml_exec
- + " -s " + alignment_filename
- + " -t " + tree_filename
- + " -x " + tree_filename + ".output_tree"
- + " -y " + tree_filename + ".ancestor.tre"
- + " -j " + tree_filename + ".seq.joint.txt"
- + " -k " + tree_filename + ".seq.marginal.txt"
- + " -d " + tree_filename + ".prob.joint.txt"
- + " -e " + tree_filename + ".prob.marginal.txt")
-
- @staticmethod
def number_of_sequences_in_alignment(filename):
return len(GubbinsCommon.get_sequence_names_from_alignment(filename))
@@ -826,8 +752,6 @@ class GubbinsCommon():
output_handle.close()
return
-
- # reparsing a fasta file splits the lines which makes fastml work
@staticmethod
def reconvert_fasta_file(input_filename, output_filename):
with open(input_filename, "r") as input_handle:
@@ -839,29 +763,6 @@ class GubbinsCommon():
return
@staticmethod
- def pairwise_comparison(filename,base_filename,gubbins_exec,alignment_filename,fastml_exec,base_filename_without_ext):
- sequence_names = GubbinsCommon.get_sequence_names_from_alignment(filename)
- GubbinsCommon.create_pairwise_newick_tree(sequence_names, base_filename+".tre")
-
- try:
- subprocess.check_call(GubbinsCommon.generate_fastml_command(fastml_exec, base_filename+".gaps.snp_sites.aln", base_filename+".tre"), shell=True)
- except:
- sys.exit("Failed while running fastML")
- shutil.copyfile(base_filename+'.tre.output_tree',base_filename+".tre")
- shutil.copyfile(base_filename+'.tre.seq.joint.txt', base_filename+".snp_sites.aln")
- try:
- subprocess.check_call(gubbins_exec+" -r -v "+base_filename+".vcf -t "+base_filename+".tre -f "+ alignment_filename +" "+ base_filename+".snp_sites.aln", shell=True)
- except:
- sys.exit("Failed while running Gubbins")
- GubbinsCommon.rename_files(GubbinsCommon.translation_of_filenames_to_final_filenames_pairwise(base_filename, base_filename_without_ext))
-
- @staticmethod
- def create_pairwise_newick_tree(sequence_names, output_filename):
- stringio = StringIO("".join(('(',sequence_names[0], ',', sequence_names[1],')')))
- tree = Phylo.read(stringio, "newick")
- Phylo.write(tree, output_filename, 'newick')
-
- @staticmethod
def delete_files_based_on_list_of_regexes(directory_to_search, regex_for_file_deletions, verbose):
for dirname, dirnames, filenames in os.walk(directory_to_search):
for filename in filenames:
diff --git a/python/gubbins/tests/bin/dummy_custom_fastml2 b/python/gubbins/tests/bin/dummy_custom_fastml2
deleted file mode 100755
index f6a1585..0000000
--- a/python/gubbins/tests/bin/dummy_custom_fastml2
+++ /dev/null
@@ -1,62 +0,0 @@
-#!/usr/bin/env bash
-
-
-if test "$#" -eq 0; then
-cat << "EOF"
-START OF LOG FILE
-USAGE: fastml [-options]
- |-------------------------------- HELP: -------------------------------------+
- | VALUES IN [] ARE DEFAULT VALUES |
- |-h help |
- |-s sequence input file (for example use -s D:\mySequences\seq.txt ) |
- |-t tree input file |
- | (if tree is not given, a neighbor joining tree is computed). |
- |-g Assume among site rate variation model (Gamma) [By default the program |
- | will assume an homogenous model. very fast, but less accurate!] |
-|-m model name |
-|-mj [JTT] |
-|-mr mtREV (for mitochondrial genomes) |
-|-md DAY |
-|-mw WAG |
-|-mc cpREV (for chloroplasts genomes) |
-|-ma Jukes and Cantor (JC) for amino acids |
-|-mn Jukes and Cantor (JC) for nucleotides |
- +----------------------------------------------------------------------------+
- |Controling the output options: |
- |-x tree file output in Newick format [tree.newick.txt] |
- |-y tree file output in ANCESTOR format [tree.ancestor.txt] |
- |-j joint sequences output file [seq.joint.txt] |
- |-k marginal sequences output file [seq.marginal.txt] |
- |-d joint probabilities output file [prob.joint.txt] |
- |-e marginal probabilities output file [prob.marginal.txt] |
- |-q ancestral sequences output format. -qc = [CLUSTAL], -qf = FASTA |
- | -qm = MOLPHY, -qs = MASE, -qp = PHLIYP, -qn = Nexus |
- +----------------------------------------------------------------------------+
- |Advances options: |
- |-a Treshold for computing again marginal probabilities [0.9] |
- |-b Do not optimize branch lengths on starting tree |
- | [by default branches and alpha are ML optimized from the data] |
- |-c number of discrete Gamma categories for the gamma distribution [8] |
- |-f don't compute Joint reconstruction (good if the branch and bound |
- | algorithm takes too much time, and the goal is to compute the |
- | marginal reconstruction with Gamma). |
- |-z The bound used. -zs - bound based on sum. -zm based on max. -zb [both] |
- |-p user alpha parameter of the gamma distribution [if alpha is not given, |
- | alpha and branches will be evaluated from the data (override -b) |
- +----------------------------------------------------------------------------+
-EOF
-
-else
-
-cat << "PARAMS"
-START OF LOG FILE
-END OF LOG FILE
-Using homogenous model (no among site rate variation)
-Nucleotide substitution model is General time Reversible
-
-Error - unable to open tree file xxx
-System Error: No such file or directory
-Assertion failed: (0), function reportError, file errorMsg.cpp, line 41.
-Abort trap: 6
-PARAMS
-fi
\ No newline at end of file
diff --git a/python/gubbins/tests/bin/dummy_fastml2 b/python/gubbins/tests/bin/dummy_fastml2
deleted file mode 100755
index eb31211..0000000
--- a/python/gubbins/tests/bin/dummy_fastml2
+++ /dev/null
@@ -1,60 +0,0 @@
-#!/usr/bin/env bash
-
-if test "$#" -eq 0; then
-cat << "EOF"
-START OF LOG FILE
-USAGE: fastml [-options]
- |-------------------------------- HELP: -------------------------------------+
- | VALUES IN [] ARE DEFAULT VALUES |
- |-h help |
- |-s sequence input file (for example use -s D:\mySequences\seq.txt ) |
- |-t tree input file |
- | (if tree is not given, a neighbor joining tree is computed). |
- |-g Assume among site rate variation model (Gamma) [By default the program |
- | will assume an homogenous model. very fast, but less accurate!] |
-|-m model name |
-|-mj [JTT] |
-|-mr mtREV (for mitochondrial genomes) |
-|-md DAY |
-|-mw WAG |
-|-mc cpREV (for chloroplasts genomes) |
-|-ma Jukes and Cantor (JC) for amino acids |
-|-mn Jukes and Cantor (JC) for nucleotides |
- +----------------------------------------------------------------------------+
- |Controling the output options: |
- |-x tree file output in Newick format [tree.newick.txt] |
- |-y tree file output in ANCESTOR format [tree.ancestor.txt] |
- |-j joint sequences output file [seq.joint.txt] |
- |-k marginal sequences output file [seq.marginal.txt] |
- |-d joint probabilities output file [prob.joint.txt] |
- |-e marginal probabilities output file [prob.marginal.txt] |
- |-q ancestral sequences output format. -qc = [CLUSTAL], -qf = FASTA |
- | -qm = MOLPHY, -qs = MASE, -qp = PHLIYP, -qn = Nexus |
- +----------------------------------------------------------------------------+
- |Advances options: |
- |-a Treshold for computing again marginal probabilities [0.9] |
- |-b Do not optimize branch lengths on starting tree |
- | [by default branches and alpha are ML optimized from the data] |
- |-c number of discrete Gamma categories for the gamma distribution [8] |
- |-f don't compute Joint reconstruction (good if the branch and bound |
- | algorithm takes too much time, and the goal is to compute the |
- | marginal reconstruction with Gamma). |
- |-z The bound used. -zs - bound based on sum. -zm based on max. -zb [both] |
- |-p user alpha parameter of the gamma distribution [if alpha is not given, |
- | alpha and branches will be evaluated from the data (override -b) |
- +----------------------------------------------------------------------------+
-EOF
-else
-
-cat << "PARAMS"
-START OF LOG FILE
-END OF LOG FILE
-Using homogenous model (no among site rate variation)
-Nucleotide substitution model is JTT
-
-Error - unable to open tree file xxx
-System Error: No such file or directory
-Assertion failed: (0), function reportError, file errorMsg.cpp, line 41.
-Abort trap: 6
-PARAMS
-fi
\ No newline at end of file
diff --git a/python/gubbins/tests/bin/dummy_fastml3 b/python/gubbins/tests/bin/dummy_fastml3
deleted file mode 100755
index 1525770..0000000
--- a/python/gubbins/tests/bin/dummy_fastml3
+++ /dev/null
@@ -1,51 +0,0 @@
-#!/usr/bin/env bash
-
-cat << "EOF"
-START OF LOG FILE
-USAGE: /software/pathogen/external/apps/usr/local/FastML.v3.1/programs/fastml/fastml [-options]
- |-------------------------------- HELP: -------------------------------------+
- | VALUES IN [] ARE DEFAULT VALUES |
- |-h help |
- |-s sequence input file (for example use -s D:\mySequences\seq.txt ) |
- |-t tree input file |
- | (if tree is not given, a neighbor joining tree is computed). |
- |-g Assume among site rate variation model (Gamma) [By default the program |
- | will assume an homogenous model. very fast, but less accurate!] |
-|-m model name |
-|-mj [JTT] |
-|-ml LG |
-|-mr mtREV (for mitochondrial genomes) |
-|-md DAY |
-|-mw WAG |
-|-mc cpREV (for chloroplasts genomes) |
-|-ma Jukes and Cantor (JC) for amino acids |
-|-mn Jukes and Cantor (JC) for nucleotides |
-|-mh HKY Model for nucleotides |
-|-mg nucgtr Model for nucleotides |
-|-mt tamura92 Model for nucleotides |
-|-my yang M5 codons model |
-|-me empirical codon matrix |
- +----------------------------------------------------------------------------+
- |Controling the output options: |
- |-x tree file output in Newick format [tree.newick.txt] |
- |-y tree file output in ANCESTOR format [tree.ancestor.txt] |
- |-j joint sequences output file [seq.joint.txt] |
- |-k marginal sequences output file [seq.marginal.txt] |
- |-d joint probabilities output file [prob.joint.txt] |
- |-e marginal probabilities output file [prob.marginal.txt] |
- |-q ancestral sequences output format. -qc = [CLUSTAL], -qf = FASTA |
- | -qm = MOLPHY, -qs = MASE, -qp = PHLIYP, -qn = Nexus |
- +----------------------------------------------------------------------------+
- |Advances options: |
- |-a Treshold for computing again marginal probabilities [0.9] |
- |-b Do not optimize branch lengths on starting tree |
- | [by default branches and alpha are ML optimized from the data] |
- |-c number of discrete Gamma categories for the gamma distribution [8] |
- |-f don't compute Joint reconstruction (good if the branch and bound |
- | algorithm takes too much time, and the goal is to compute the |
- | marginal reconstruction with Gamma). |
- |-z The bound used. -zs - bound based on sum. -zm based on max. -zb [both] |
- |-p user alpha parameter of the gamma distribution [if alpha is not given, |
- | alpha and branches will be evaluated from the data (override -b) |
- +----------------------------------------------------------------------------+
-EOF
diff --git a/python/gubbins/tests/data/destination_tree.tre b/python/gubbins/tests/data/destination_tree.tre
new file mode 100644
index 0000000..77c8003
--- /dev/null
+++ b/python/gubbins/tests/data/destination_tree.tre
@@ -0,0 +1 @@
+((sequence_9:0.0,(sequence_5:0.0,(sequence_1:0.0,(sequence_2:0.0,(sequence_4:0.009495,sequence_3:0.0)N8:0.010546)N7:0.010943)N6:0.010778)N5:0.00946)N4:0.000204,(sequence_10:8.867316,(sequence_8:0.0,(sequence_6:0.008555,sequence_7:0.008284)N3:0.0)N2:0.008228):0)N1;
diff --git a/python/gubbins/tests/data/expected_renamed_output_tree b/python/gubbins/tests/data/expected_renamed_output_tree
new file mode 100644
index 0000000..34be226
--- /dev/null
+++ b/python/gubbins/tests/data/expected_renamed_output_tree
@@ -0,0 +1 @@
+((sequence_9:0.0,(sequence_5:0.0,(sequence_1:0.0,(sequence_2:0.0,(sequence_4:0.009495,sequence_3:0.0)internal_16:0.010546)internal_15:0.010943)internal_14:0.010778)internal_13:0.00946)internal_12:0.000204,(sequence_10:8.867316,(sequence_8:0.0,(sequence_6:0.008555,sequence_7:0.008284)internal_11:0.0)internal_18:0.008228)internal_17:0.0)internal_ROOT;
diff --git a/python/gubbins/tests/data/raxml_sequence_reconstruction/1.fasta b/python/gubbins/tests/data/raxml_sequence_reconstruction/1.fasta
new file mode 100644
index 0000000..4678c8b
--- /dev/null
+++ b/python/gubbins/tests/data/raxml_sequence_reconstruction/1.fasta
@@ -0,0 +1,4 @@
+>A
+CCAAA
+>B
+CCCCC
diff --git a/python/gubbins/tests/data/raxml_sequence_reconstruction/2.fasta b/python/gubbins/tests/data/raxml_sequence_reconstruction/2.fasta
new file mode 100644
index 0000000..7e53f51
--- /dev/null
+++ b/python/gubbins/tests/data/raxml_sequence_reconstruction/2.fasta
@@ -0,0 +1,8 @@
+>C
+CCCCT
+>D
+CCCTT
+>E
+CCTTT
+>F
+CCGGG
\ No newline at end of file
diff --git a/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_ancestor_sequence_from_raxml b/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_ancestor_sequence_from_raxml
new file mode 100644
index 0000000..121e3f5
--- /dev/null
+++ b/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_ancestor_sequence_from_raxml
@@ -0,0 +1,22 @@
+>A
+CCAAA
+>B
+CCCCC
+>C
+CCCCT
+>D
+CCCTT
+>E
+CCTTT
+>F
+CCGGG
+>10
+CCCTT
+>9
+CCCTT
+>8
+CCCTT
+>7
+CCGGG
+>ROOT
+CCGGG
diff --git a/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_combined_1_2.fasta b/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_combined_1_2.fasta
new file mode 100644
index 0000000..58934d5
--- /dev/null
+++ b/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_combined_1_2.fasta
@@ -0,0 +1,12 @@
+>A
+CCAAA
+>B
+CCCCC
+>internal_C
+CCCCT
+>internal_D
+CCCTT
+>internal_E
+CCTTT
+>internal_F
+CCGGG
diff --git a/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_marginalAncestralStates.fasta b/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_marginalAncestralStates.fasta
new file mode 100644
index 0000000..6d05ea4
--- /dev/null
+++ b/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_marginalAncestralStates.fasta
@@ -0,0 +1,10 @@
+>9
+CCGGG
+>8
+CCGGG
+>7
+CCCTT
+>10
+CCCTT
+>ROOT
+CCCTT
diff --git a/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_rooted_tree.newick b/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_rooted_tree.newick
new file mode 100644
index 0000000..04f3376
--- /dev/null
+++ b/python/gubbins/tests/data/raxml_sequence_reconstruction/expected_rooted_tree.newick
@@ -0,0 +1 @@
+((B:0.1,(C:0.1,(D:0.1,E:0.1))),(A:0.1,F:0.1):0);
diff --git a/python/gubbins/tests/data/raxml_sequence_reconstruction/input_alignment.fasta b/python/gubbins/tests/data/raxml_sequence_reconstruction/input_alignment.fasta
new file mode 100644
index 0000000..6d75ec7
--- /dev/null
+++ b/python/gubbins/tests/data/raxml_sequence_reconstruction/input_alignment.fasta
@@ -0,0 +1,12 @@
+>A
+CCAAA
+>B
+CCCCC
+>C
+CCCCT
+>D
+CCCTT
+>E
+CCTTT
+>F
+CCGGG
\ No newline at end of file
diff --git a/python/gubbins/tests/data/raxml_sequence_reconstruction/raw_marginalAncestralStates.phylip b/python/gubbins/tests/data/raxml_sequence_reconstruction/raw_marginalAncestralStates.phylip
new file mode 100644
index 0000000..276f13b
--- /dev/null
+++ b/python/gubbins/tests/data/raxml_sequence_reconstruction/raw_marginalAncestralStates.phylip
@@ -0,0 +1,5 @@
+9 CCGGG
+8 CCGGG
+7 CCCTT
+10 CCCTT
+ROOT CCCTT
diff --git a/python/gubbins/tests/data/raxml_sequence_reconstruction/unrooted_tree.newick b/python/gubbins/tests/data/raxml_sequence_reconstruction/unrooted_tree.newick
new file mode 100644
index 0000000..b43cdc3
--- /dev/null
+++ b/python/gubbins/tests/data/raxml_sequence_reconstruction/unrooted_tree.newick
@@ -0,0 +1 @@
+(A:0.1, F:0.1, (B:0.1, (C:0.1, (D:0.1, E:0.1))));
\ No newline at end of file
diff --git a/python/gubbins/tests/data/source_tree.tre b/python/gubbins/tests/data/source_tree.tre
new file mode 100644
index 0000000..4b469b2
--- /dev/null
+++ b/python/gubbins/tests/data/source_tree.tre
@@ -0,0 +1 @@
+((sequence_9,(sequence_5,(sequence_1,(sequence_2,(sequence_4,sequence_3)16)15)14)13)12,(sequence_10,(sequence_8,(sequence_6,sequence_7)11)18)17)ROOT;
diff --git a/python/gubbins/tests/test_external_dependancies.py b/python/gubbins/tests/test_external_dependancies.py
index 3188aa5..ef03d04 100644
--- a/python/gubbins/tests/test_external_dependancies.py
+++ b/python/gubbins/tests/test_external_dependancies.py
@@ -131,7 +131,7 @@ class TestExternalDependancies(unittest.TestCase):
assert os.path.exists('multiple_recombinations.recombination_predictions.gff')
assert os.path.exists('multiple_recombinations.branch_base_reconstruction.embl')
assert os.path.exists('multiple_recombinations.final_tree.tre')
- assert os.path.exists('multiple_recombinations.node_labelled.tre')
+ #assert os.path.exists('multiple_recombinations.node_labelled.tre')
self.cleanup()
@@ -168,22 +168,6 @@ class TestExternalDependancies(unittest.TestCase):
self.cleanup()
- def test_pairwise_comparison(self):
- shutil.copyfile('gubbins/tests/data/input_pairwise.aln.vcf','gubbins/tests/data/pairwise.aln.vcf' )
- shutil.copyfile('gubbins/tests/data/input_pairwise.aln.phylip','gubbins/tests/data/pairwise.aln.phylip' )
- common.GubbinsCommon.pairwise_comparison('gubbins/tests/data/pairwise.aln','gubbins/tests/data/pairwise.aln','../src/gubbins','gubbins/tests/data/pairwise.aln','fastml -mg -qf -b ','pairwise')
- # Check the tree file exists
- assert os.path.exists('pairwise.final_tree.tre')
-
- # Check the VCF file is as expected
- assert filecmp.cmp('pairwise.summary_of_snp_distribution.vcf','gubbins/tests/data/pairwise.aln.tre.vcf_expected')
-
- # Check the reconstruction of internal nodes
- assert filecmp.cmp('pairwise.filtered_polymorphic_sites.fasta','gubbins/tests/data/pairwise.aln.snp_sites.aln_expected');
-
- self.cleanup()
-
-
def test_delete_files_based_on_list_of_regexes(self):
open('gubbins/tests/data/AAA', 'w').close()
open('gubbins/tests/data/BBB', 'w').close()
@@ -201,7 +185,6 @@ class TestExternalDependancies(unittest.TestCase):
def test_use_bundled_exec(self):
assert re.search('raxmlHPC -f d -p 1 -m GTRGAMMA',common.GubbinsCommon.use_bundled_exec('raxmlHPC -f d -p 1 -m GTRGAMMA', 'raxmlHPC')) != None
- assert re.search('fastml -mg -qf -b ',common.GubbinsCommon.use_bundled_exec('fastml -mg -qf -b ', 'fastml')) != None
assert re.search('../src/gubbins',common.GubbinsCommon.use_bundled_exec('gubbins', '../src/gubbins')) != None
def cleanup(self):
diff --git a/python/gubbins/tests/test_fastml.py b/python/gubbins/tests/test_fastml.py
deleted file mode 100644
index 4439918..0000000
--- a/python/gubbins/tests/test_fastml.py
+++ /dev/null
@@ -1,43 +0,0 @@
-#! /usr/bin/env python3
-# encoding: utf-8
-
-"""
-Tests if we can detect which version of fastml is running so we can choose the correct model
-"""
-
-import unittest
-import re
-import os
-import subprocess
-from gubbins.Fastml import Fastml
-
-os.environ["PATH"] += os.pathsep + 'gubbins/tests/bin'
-
-class TestFastml(unittest.TestCase):
-
- def test_no_fastml_installed(self):
- fastml_check = Fastml('exec_doesnt_exist')
- self.assertEqual(fastml_check.fastml_version, None)
- self.assertEqual(fastml_check.fastml_model, None)
- self.assertEqual(fastml_check.fastml_parameters, None)
-
- def test_fastml_3_installed(self):
- fastml_check = Fastml('dummy_fastml3')
- self.assertEqual(fastml_check.fastml_version, 3)
- self.assertEqual(fastml_check.fastml_model,'g')
- self.assertEqual(fastml_check.fastml_parameters, 'dummy_fastml3 -qf -b -a 0.00001 -mg ')
-
- def test_fastml_2_installed(self):
- fastml_check = Fastml('dummy_fastml2')
- self.assertEqual(fastml_check.fastml_version, 2)
- self.assertEqual(fastml_check.fastml_model, 'n')
- self.assertEqual(fastml_check.fastml_parameters, 'dummy_fastml2 -qf -b -a 0.00001 -mn ')
-
- def test_custom_fastml_2_installed(self):
- fastml_check = Fastml('dummy_custom_fastml2')
- self.assertEqual(fastml_check.fastml_version, 2)
- self.assertEqual(fastml_check.fastml_model,'g')
- self.assertEqual(fastml_check.fastml_parameters, 'dummy_custom_fastml2 -qf -b -a 0.00001 -mg ')
-
-if __name__ == "__main__":
- unittest.main()
\ No newline at end of file
diff --git a/python/gubbins/tests/test_pre_process_fasta.py b/python/gubbins/tests/test_pre_process_fasta.py
index b96c2d9..10e5954 100644
--- a/python/gubbins/tests/test_pre_process_fasta.py
+++ b/python/gubbins/tests/test_pre_process_fasta.py
@@ -30,7 +30,6 @@ class TestPreProcessFasta(unittest.TestCase):
preprocessfasta.remove_duplicate_sequences_and_sequences_missing_too_much_data('output.aln')
self.assertTrue(filecmp.cmp('output.aln', 'gubbins/tests/data/preprocessfasta/no_duplicates.aln'))
- self.cleanup()
def test_input_file_with_one_duplicate_sequences(self):
preprocessfasta = PreProcessFasta('gubbins/tests/data/preprocessfasta/one_duplicate.aln')
@@ -43,7 +42,6 @@ class TestPreProcessFasta(unittest.TestCase):
preprocessfasta.remove_duplicate_sequences_and_sequences_missing_too_much_data('output.aln')
self.assertTrue(filecmp.cmp('output.aln', 'gubbins/tests/data/preprocessfasta/expected_one_duplicate.aln'))
- self.cleanup()
def test_input_file_with_multiple_duplicate_sequences(self):
preprocessfasta = PreProcessFasta('gubbins/tests/data/preprocessfasta/multiple_duplicates.aln')
@@ -55,7 +53,6 @@ class TestPreProcessFasta(unittest.TestCase):
preprocessfasta.remove_duplicate_sequences_and_sequences_missing_too_much_data('output.aln')
self.assertTrue(filecmp.cmp('output.aln', 'gubbins/tests/data/preprocessfasta/expected_multiple_duplicates.aln'))
- self.cleanup()
def test_input_file_with_all_duplicate_sequences(self):
preprocessfasta = PreProcessFasta('gubbins/tests/data/preprocessfasta/all_same_sequence.aln')
@@ -68,15 +65,13 @@ class TestPreProcessFasta(unittest.TestCase):
self.assertEqual(preprocessfasta.taxa_of_duplicate_sequences(),['sample1',
'sample2',
'sample3'])
- self.cleanup()
def test_filter_out_alignments_with_too_much_missing_data(self):
preprocessfasta = PreProcessFasta('gubbins/tests/data/preprocessfasta/missing_data.aln', False, 5)
preprocessfasta.remove_duplicate_sequences_and_sequences_missing_too_much_data('output.aln')
- self.assertTrue(filecmp.cmp('output.aln','gubbins/tests/data/preprocessfasta/expected_missing_data.aln'))
- self.cleanup()
+ self.assertTrue(filecmp.cmp('output.aln','gubbins/tests/data/preprocessfasta/expected_missing_data.aln'))
- def cleanup(self):
+ def tearDown(self):
for file_to_delete in ['output.aln']:
if os.path.exists(file_to_delete):
os.remove(file_to_delete)
diff --git a/python/gubbins/tests/test_raxml_sequence_reconstruction.py b/python/gubbins/tests/test_raxml_sequence_reconstruction.py
new file mode 100644
index 0000000..8f6d4c1
--- /dev/null
+++ b/python/gubbins/tests/test_raxml_sequence_reconstruction.py
@@ -0,0 +1,94 @@
+#! /usr/bin/env python3
+# encoding: utf-8
+# python3 setup.py test -s gubbins.tests.test_raxml_sequence_reconstruction.TestRAxMLSequenceReconstruction
+
+"""
+Tests for reconstructing internal sequences using raxml
+"""
+
+import unittest
+import re
+import os
+import sys
+import subprocess
+import filecmp
+import shutil
+import dendropy
+from gubbins.RAxMLSequenceReconstruction import RAxMLSequenceReconstruction
+from gubbins.RAxMLExecutable import RAxMLExecutable
+
+os.environ["PATH"] += os.pathsep + 'gubbins/tests/bin'
+
+class TestRAxMLSequenceReconstruction(unittest.TestCase):
+
+ def test_ancestor_raxml_command_no_verbose(self):
+ raxml_seq_recon = RAxMLSequenceReconstruction('input_alignment.fasta', 'input_tree',
+ 'output_alignment_filename', 'output_tree',
+ 'raxmlHPC -f A -p 1 -m GTRGAMMA',
+ verbose = False)
+ self.assertEqual(raxml_seq_recon.raxml_reconstruction_command(raxml_seq_recon.working_dir+'/rooted_tree.newick' ), 'raxmlHPC -f A -p 1 -m GTRGAMMA -s '+raxml_seq_recon.input_alignment_filename + ' -t ' + raxml_seq_recon.working_dir+'/rooted_tree.newick -n internal > /dev/null 2>&1')
+
+ def test_ancestor_raxml_command_verbose(self):
+ raxml_seq_recon = RAxMLSequenceReconstruction('input_alignment.fasta', 'input_tree',
+ 'output_alignment_filename', 'output_tree',
+ 'raxmlHPC -f A -p 1 -m GTRGAMMA',
+ verbose = True)
+ self.assertEqual(raxml_seq_recon.raxml_reconstruction_command(raxml_seq_recon.working_dir+'/rooted_tree.newick'), 'raxmlHPC -f A -p 1 -m GTRGAMMA -s '+raxml_seq_recon.input_alignment_filename+' -t ' + raxml_seq_recon.working_dir+'/rooted_tree.newick -n internal ')
+
+ def test_working_directory_construction(self):
+ raxml_seq_recon = RAxMLSequenceReconstruction('', '', '', '', '', False)
+ self.assertTrue( os.path.exists(raxml_seq_recon.working_dir) )
+
+ def test_root_input_tree(self):
+ raxml_seq_recon = RAxMLSequenceReconstruction('abc', 'gubbins/tests/data/raxml_sequence_reconstruction/unrooted_tree.newick', 'abc', 'abc', '', False)
+ output_tree = raxml_seq_recon.root_tree('gubbins/tests/data/raxml_sequence_reconstruction/unrooted_tree.newick',raxml_seq_recon.temp_rooted_tree)
+ self.assertTrue(filecmp.cmp(str(raxml_seq_recon.temp_rooted_tree), 'gubbins/tests/data/raxml_sequence_reconstruction/expected_rooted_tree.newick', shallow = False))
+
+ def test_run_raxml_ancestor_reconstruction(self):
+ raxml_seq_recon = RAxMLSequenceReconstruction('gubbins/tests/data/raxml_sequence_reconstruction/input_alignment.fasta',
+ 'gubbins/tests/data/raxml_sequence_reconstruction/unrooted_tree.newick',
+ 'outputfile', 'output_tree', RAxMLExecutable(1).internal_sequence_reconstruction_command(), False)
+ raxml_seq_recon.reconstruct_ancestor_sequences()
+
+ assert os.path.exists('outputfile')
+
+ def test_convert_raw_ancestral_file_to_fasta(self):
+ raxml_seq_recon = RAxMLSequenceReconstruction('', '', '', '', '', False)
+ raxml_seq_recon.convert_raw_ancestral_states_to_fasta('gubbins/tests/data/raxml_sequence_reconstruction/raw_marginalAncestralStates.phylip','outputfile')
+ self.assertTrue(filecmp.cmp('outputfile','gubbins/tests/data/raxml_sequence_reconstruction/expected_marginalAncestralStates.fasta', shallow = False))
+
+ def test_merging_fasta_files(self):
+ raxml_seq_recon = RAxMLSequenceReconstruction('', '', '', '', '', False)
+ raxml_seq_recon.combine_fastas('gubbins/tests/data/raxml_sequence_reconstruction/1.fasta','gubbins/tests/data/raxml_sequence_reconstruction/2.fasta','combined.fasta')
+ self.assertTrue(filecmp.cmp('combined.fasta','gubbins/tests/data/raxml_sequence_reconstruction/expected_combined_1_2.fasta', shallow = False))
+
+ def test_add_labels_to_tree(self):
+ raxml_seq_recon = RAxMLSequenceReconstruction('', '', '', '', '', False)
+ raxml_seq_recon.root_tree('gubbins/tests/data/raxml_sequence_reconstruction/unrooted_tree.newick', raxml_seq_recon.temp_rooted_tree)
+
+ tree = dendropy.Tree.get_from_path(raxml_seq_recon.temp_rooted_tree, 'newick', preserve_underscores=True)
+ self.assertEqual("((B:0.1,(C:0.1,(D:0.1,E:0.1))),(A:0.1,F:0.1):0.0);\n",tree.as_string(schema='newick'))
+
+ def test_transfer_internal_labels(self):
+ raxml_seq_recon = RAxMLSequenceReconstruction('', '', '', 'output_tree', '', False)
+ raxml_seq_recon.transfer_internal_names_to_tree('gubbins/tests/data/source_tree.tre', 'gubbins/tests/data/destination_tree.tre', 'renamed_output_tree')
+ assert os.path.exists('renamed_output_tree')
+ self.assertTrue(filecmp.cmp('renamed_output_tree','gubbins/tests/data/expected_renamed_output_tree', shallow = False))
+
+ def test_more_complex_tree(self):
+ raxml_seq_recon = RAxMLSequenceReconstruction('gubbins/tests/data/multiple_recombinations.aln',
+ 'gubbins/tests/data/expected_RAxML_result.multiple_recombinations.iteration_5.output_tree',
+ 'output_alignment', 'output_tree', RAxMLExecutable(1).internal_sequence_reconstruction_command(), False)
+ raxml_seq_recon.reconstruct_ancestor_sequences()
+
+ assert os.path.exists('output_alignment')
+ assert os.path.exists('output_tree')
+
+ def tearDown(self):
+ for file_to_delete in ['combined.fasta','outputfile','RAxML_nodeLabelledRootedTree.internal','RAxML_marginalAncestralProbabilities.internal', 'RAxML_info.internal', 'RAxML_flagCheck', 'output_tree','renamed_output_tree']:
+ if os.path.exists(file_to_delete):
+ os.remove(file_to_delete)
+
+if __name__ == "__main__":
+ unittest.main()
+
\ No newline at end of file
diff --git a/python/gubbins/tests/test_string_construction.py b/python/gubbins/tests/test_string_construction.py
index 9fbbcd2..3520eea 100644
--- a/python/gubbins/tests/test_string_construction.py
+++ b/python/gubbins/tests/test_string_construction.py
@@ -98,14 +98,5 @@ class TestStringConstruction(unittest.TestCase):
def test_fasttree_gubbins_command(self):
assert common.GubbinsCommon.fasttree_gubbins_command('AAA','BBB', 5,'CCC','DDD',3,'EEE', 10,100) == 'DDD -r -v BBB.vcf -a 10 -b 100 -f EEE -t AAA.iteration_5 -m 3 BBB.snp_sites.aln'
- def test_fasttree_fastml_command(self):
- assert common.GubbinsCommon.fasttree_fastml_command('AAA', 'BBB', 'CCC',2) == 'AAA -s BBB -t CCC.iteration_2 -x CCC.iteration_2.output_tree -y CCC.iteration_2.ancestor.tre -j CCC.iteration_2.seq.joint.txt -k CCC.iteration_2.seq.marginal.txt -d CCC.iteration_2.prob.joint.txt -e CCC.iteration_2.prob.marginal.txt'
-
- def test_raxml_fastml_command(self):
- assert common.GubbinsCommon.raxml_fastml_command('AAA', 'BBB', 'CCC',1234, 5) == 'AAA -s BBB -t RAxML_result.CCC.1234iteration_5 -x RAxML_result.CCC.1234iteration_5.output_tree -y RAxML_result.CCC.1234iteration_5.ancestor.tre -j RAxML_result.CCC.1234iteration_5.seq.joint.txt -k RAxML_result.CCC.1234iteration_5.seq.marginal.txt -d RAxML_result.CCC.1234iteration_5.prob.joint.txt -e RAxML_result.CCC.1234iteration_5.prob.marginal.txt'
-
- def test_generate_fastml_command(self):
- assert common.GubbinsCommon.generate_fastml_command('AAA', 'BBB', 'CCC') == 'AAA -s BBB -t CCC -x CCC.output_tree -y CCC.ancestor.tre -j CCC.seq.joint.txt -k CCC.seq.marginal.txt -d CCC.prob.joint.txt -e CCC.prob.marginal.txt'
-
if __name__ == "__main__":
unittest.main()
\ No newline at end of file
diff --git a/python/gubbins/tests/test_tree_python_methods.py b/python/gubbins/tests/test_tree_python_methods.py
index 8673138..166f197 100644
--- a/python/gubbins/tests/test_tree_python_methods.py
+++ b/python/gubbins/tests/test_tree_python_methods.py
@@ -110,11 +110,6 @@ class TestTreePythonMethods(unittest.TestCase):
os.remove(temp_working_dir + '/tree_with_internal_nodes.tre')
os.removedirs(temp_working_dir)
- def test_create_pairwise_newick_tree(self):
- common.GubbinsCommon.create_pairwise_newick_tree(['sequence_2','sequence_3'], 'gubbins/tests/data/pairwise_newick_tree.actual')
- assert os.path.exists('gubbins/tests/data/pairwise_newick_tree.actual')
- os.remove('gubbins/tests/data/pairwise_newick_tree.actual')
-
def test_remove_internal_node_labels(self):
common.GubbinsCommon.remove_internal_node_labels_from_tree('gubbins/tests/data/final_tree_with_internal_labels.tre', 'final_tree_with_internal_labels.tre')
assert os.path.exists('final_tree_with_internal_labels.tre')
diff --git a/python/gubbins/tests/test_validate_starting_tree.py b/python/gubbins/tests/test_validate_starting_tree.py
index dccf638..4ac5efc 100644
--- a/python/gubbins/tests/test_validate_starting_tree.py
+++ b/python/gubbins/tests/test_validate_starting_tree.py
@@ -22,7 +22,6 @@ class TestValidationOfStartingTree(unittest.TestCase):
def test_do_the_names_match_the_fasta_file(self):
assert common.GubbinsCommon.do_the_names_match_the_fasta_file('gubbins/tests/data/valid_newick_tree.tre', 'gubbins/tests/data/valid_newick_tree.aln' ) == 1
-
if __name__ == "__main__":
unittest.main()
diff --git a/release/manifests/trustyvm.pp b/release/manifests/trustyvm.pp
index 341453c..72c8dba 100644
--- a/release/manifests/trustyvm.pp
+++ b/release/manifests/trustyvm.pp
@@ -64,11 +64,3 @@ package {"python-dev":
include apt
-# we need to pull in a packaged version of fastml for building.
-# Supplied by Aidan Delaney <aidan at ontologyengineering.org>, so blame him.
-apt::ppa { 'ppa:ap13/gubbins': }
-
-package {"fastml2":
- ensure => "installed",
- require => Apt::Ppa['ppa:ap13/gubbins']
-}
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/gubbins.git
More information about the debian-med-commit
mailing list