[med-svn] [examl] 01/02: New upstream version 3.0.18
Andreas Tille
tille at debian.org
Wed Feb 15 16:08:47 UTC 2017
This is an automated email from the git hooks/post-receive script.
tille pushed a commit to branch master
in repository examl.
commit 2c674372298b9869a7c3d985b81a8372939a0266
Author: Andreas Tille <tille at debian.org>
Date: Wed Feb 15 17:07:52 2017 +0100
New upstream version 3.0.18
---
README.md | 29 +
README_MIC.txt | 80 +
codeDocumentation/PSR.txt | 2 +
codeDocumentation/startupIllustration.pdf | Bin 0 -> 75681 bytes
codeDocumentation/startupIllustration.svg | 818 ++++
examl/Makefile.AVX.gcc | 54 +
examl/Makefile.MIC.icc | 56 +
examl/Makefile.OMP.AVX.gcc | 54 +
examl/Makefile.OMP.SSE3.gcc | 51 +
examl/Makefile.SSE3.gcc | 52 +
examl/avxLikelihood.c | 4052 +++++++++++++++++++
examl/axml.c | 2782 +++++++++++++
examl/axml.h | 1418 +++++++
examl/bipartitionList.c | 592 +++
examl/byteFile.c | 435 ++
examl/byteFile.h | 60 +
examl/communication.c | 182 +
examl/evaluateGenericSpecial.c | 2083 ++++++++++
examl/evaluatePartialGenericSpecial.c | 1058 +++++
examl/globalVariables.h | 180 +
examl/makenewzGenericSpecial.c | 2747 +++++++++++++
examl/mic_native.h | 96 +
examl/mic_native_aa.c | 1323 ++++++
examl/mic_native_dna.c | 661 +++
examl/models.c | 4243 ++++++++++++++++++++
examl/newviewGenericSpecial.c | 6218 +++++++++++++++++++++++++++++
examl/optimizeModel.c | 3134 +++++++++++++++
examl/partitionAssignment.c | 693 ++++
examl/partitionAssignment.h | 64 +
examl/quartets.c | 615 +++
examl/restartHashTable.c | 357 ++
examl/searchAlgo.c | 2651 ++++++++++++
examl/topologies.c | 653 +++
examl/trash.c | 78 +
examl/treeIO.c | 1184 ++++++
gpl-3.0.txt | 674 ++++
manual/ExaML.backup.odt | Bin 0 -> 87753 bytes
manual/ExaML.odt | Bin 0 -> 102973 bytes
manual/ExaML.pdf | Bin 0 -> 412169 bytes
parser/Makefile.SSE3.gcc | 29 +
parser/Makefile.check.warnings | 26 +
parser/USAGE | 1 +
parser/axml.c | 2895 ++++++++++++++
parser/axml.h | 1295 ++++++
parser/globalVariables.h | 195 +
parser/parsePartitions.c | 1427 +++++++
testData/140 | 142 +
testData/140.model | 3 +
testData/140.tree | 1 +
testData/354.tree | 1 +
testData/49 | 50 +
testData/49.model | 4 +
testData/49.tree | 1 +
versionHeader/version.h | 4 +
54 files changed, 45503 insertions(+)
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..02092f2
--- /dev/null
+++ b/README.md
@@ -0,0 +1,29 @@
+ExaML
+=====
+
+Exascale Maximum Likelihood (ExaML) code for phylogenetic inference using MPI.
+
+This code implements the popular RAxML search algorithm for maximum likelihood based inference
+of phylogenetic trees.
+
+It uses a radically new MPI parallelization approach that yields improved parallel efficiency,
+in particular on partitioned multi-gene or whole-genome datasets.
+
+When using ExaML please cite the following paper:
+
+Alexey M. Kozlov, Andre J. Aberer, Alexandros Stamatakis: "ExaML Version 3: A Tool for Phylogenomic Analyses on Supercomputers." Bioinformatics (2015) 31 (15): 2577-2579.
+
+It is up to 4 times faster than RAxML-Light [1].
+
+As RAxML-Light, ExaML also implements checkpointing, SSE3, AVX vectorization and
+memory saving techniques.
+
+[1] A. Stamatakis, A.J. Aberer, C. Goll, S.A. Smith, S.A. Berger, F. Izquierdo-Carrasco:
+ "RAxML-Light: A Tool for computing TeraByte Phylogenies",
+ Bioinformatics 2012; doi: 10.1093/bioinformatics/bts309.
+
+
+Intel Xeon Phi
+--------------
+
+For details on running ExaML on Intel MIC (aka Xeon Phi), please refer to README_MIC.txt.
\ No newline at end of file
diff --git a/README_MIC.txt b/README_MIC.txt
new file mode 100644
index 0000000..d6e7683
--- /dev/null
+++ b/README_MIC.txt
@@ -0,0 +1,80 @@
+Using ExaML on the Intel MIC/Intel Xeon Phi coprocessors
+
+Compiling under Linux
+---------------------
+
+Please set your MPI/MIC environment (ask your sysadmin if unsure) and then run:
+
+ make -f Makefile.AVX.gcc
+ make -f Makefile.MIC.icc clean
+ make -f Makefile.MIC.icc
+
+This will create two executables for both host(=CPU) and MIC - they will be
+named examl-AVX and examl-MIC, respectively.
+
+
+Running
+----------------------
+
+1. Use parse-examl to generate a binary alignment file as usual.
+
+2. You might want to allocate MPI ranks on both host CPUs and MICs (hybrid mode)
+or just on the MICs, depending on your configuration.
+
+Sample command line for running ExaML in hybrid mode (16 CPU core + 2 MIC cards):
+
+ mpiexec -host myhost-ib -n 16 /scratch/examl-AVX -n mictest -s /scratch/mictest.binary -t /scratch/start.tre -m GAMMA -w /scratch : \
+ -host myhost-mic0 -n 30 -env OMP_NUM_THREADS 4 -env KMP_AFFINITY "granularity=fine,balanced" /scratch/examl-MIC -n mictest \
+ -s /scratch/mictest.binary -t /scratch/start.tre -m GAMMA -w /scratch : \
+ -host myhost-mic1 -n 30 -env OMP_NUM_THREADS 4 -env KMP_AFFINITY "granularity=fine,balanced" /scratch/examl-MIC -n mictest \
+ -s /scratch/mictest.binary -t /scratch/start.tre -m GAMMA -w /scratch
+
+Here, we use 1 MPI rank per core on the host CPUs. On each MIC, we start 30 ranks x 4 OpenMP threads,
+which gives 120 threads in total or 2 threads per MIC core. Changing the ratio of CPU:MIC ranks allows
+to fine-tune load balance for the specific hardware configuration at hand.
+
+
+Limitations & caveats
+---------------------
+
+1. Supported on the MIC:
+
+ + DNA and AA alignments
+ + GAMMA model of rate heterogeneity
+ + multiple partitions
+ + all AA substitution matrices supported by ExaML, including LG4
+
+2. Currently NOT supported:
+
+ - binary and generic multi-state alignments
+ - PSR model
+ - memory saving for gappy alignments (-S option)
+
+3. Memory
+
+ Compared to traditional CPUs, MIC cards have significantly lower memory-per-core value,
+ which poses a problem for memory-intensive ML computations. Thus you should plan carefully
+ and split your run over multiple cards, if needed.
+
+ To estimate memory requirements for your dataset, you can use the web-calculator here:
+
+ http://sco.h-its.org/exelixis/web/software/raxml/index.html#memcalc
+
+ A similar tool tailored for MICs is coming soon, stay tuned :)
+
+4. Performance
+
+ ExaML-MIC performs best on alignments with large number of sites and few taxa.
+ The latter is due to the limited on-card memory of the MICs (s. above), so you
+ might need to use multiple cards if the number of taxa is large.
+
+ For details, please refer to: http://www.hicomb.org/papers/HICOMB2014-04.pdf
+
+
+Contact & Support
+--------------------
+
+Please use RAxML google group to ask questions:
+
+https://groups.google.com/forum/?hl=en#!forum/raxml
+
diff --git a/codeDocumentation/PSR.txt b/codeDocumentation/PSR.txt
new file mode 100644
index 0000000..4e0b81b
--- /dev/null
+++ b/codeDocumentation/PSR.txt
@@ -0,0 +1,2 @@
+To disable per-site rate category scaling in ExaML it suffices to comment out the function invocations for:
+updatePerSiteRates() and checkPerSiteRates() in the source code.
diff --git a/codeDocumentation/startupIllustration.pdf b/codeDocumentation/startupIllustration.pdf
new file mode 100644
index 0000000..42683d3
Binary files /dev/null and b/codeDocumentation/startupIllustration.pdf differ
diff --git a/codeDocumentation/startupIllustration.svg b/codeDocumentation/startupIllustration.svg
new file mode 100644
index 0000000..b4cfc2f
--- /dev/null
+++ b/codeDocumentation/startupIllustration.svg
@@ -0,0 +1,818 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+ xmlns:dc="http://purl.org/dc/elements/1.1/"
+ xmlns:cc="http://creativecommons.org/ns#"
+ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns="http://www.w3.org/2000/svg"
+ xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+ xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+ width="2023.8151"
+ height="1020.7793"
+ id="svg2"
+ version="1.1"
+ inkscape:version="0.48.4 r9939"
+ sodipodi:docname="img-5.pdf">
+ <defs
+ id="defs4">
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend"
+ style="overflow:visible">
+ <path
+ id="path4196"
+ d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+ style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)"
+ inkscape:connector-curvature="0" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend-8"
+ style="overflow:visible">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4196-6"
+ d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+ style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend-4"
+ style="overflow:visible">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4196-66"
+ d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+ style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend-6"
+ style="overflow:visible">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4196-4"
+ d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+ style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend-5"
+ style="overflow:visible">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4196-69"
+ d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+ style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend-1"
+ style="overflow:visible">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4196-8"
+ d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+ style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend-46"
+ style="overflow:visible">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4196-7"
+ d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+ style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend-65"
+ style="overflow:visible">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4196-2"
+ d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+ style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend-40"
+ style="overflow:visible">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4196-25"
+ d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+ style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend-82"
+ style="overflow:visible">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4196-44"
+ d="M 0,0 5,-5 -12.5,0 5,5 0,0 z"
+ style="fill-rule:evenodd;stroke:#000000;stroke-width:1pt"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+ </marker>
+ </defs>
+ <sodipodi:namedview
+ id="base"
+ pagecolor="#ffffff"
+ bordercolor="#666666"
+ borderopacity="1.0"
+ inkscape:pageopacity="0.0"
+ inkscape:pageshadow="2"
+ inkscape:zoom="0.35"
+ inkscape:cx="678.71845"
+ inkscape:cy="456.86792"
+ inkscape:document-units="px"
+ inkscape:current-layer="svg2"
+ showgrid="false"
+ inkscape:window-width="1916"
+ inkscape:window-height="1057"
+ inkscape:window-x="0"
+ inkscape:window-y="19"
+ inkscape:window-maximized="0"
+ fit-margin-top="0"
+ fit-margin-left="0"
+ fit-margin-right="0"
+ fit-margin-bottom="0" />
+ <metadata
+ id="metadata7">
+ <rdf:RDF>
+ <cc:Work
+ rdf:about="">
+ <dc:format>image/svg+xml</dc:format>
+ <dc:type
+ rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+ <dc:title></dc:title>
+ </cc:Work>
+ </rdf:RDF>
+ </metadata>
+ <g
+ inkscape:label="<1->"
+ inkscape:groupmode="layer"
+ id="layer1"
+ transform="translate(-30,-13.400127)">
+ <text
+ xml:space="preserve"
+ style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="34.285713"
+ y="85.219322"
+ id="text3753"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan3755"
+ x="34.285713"
+ y="85.219322">bytefile layout</tspan></text>
+ <rect
+ style="fill:#cccccc;fill-opacity:1;stroke:none"
+ id="rect3757"
+ width="692.85718"
+ height="918.57141"
+ x="30"
+ y="103.79076" />
+ <g
+ id="g3948">
+ <text
+ sodipodi:linespacing="125%"
+ id="text3759"
+ y="136.64789"
+ x="54.285717"
+ style="font-size:10px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ xml:space="preserve"><tspan
+ y="136.64789"
+ x="54.285717"
+ id="tspan3761"
+ sodipodi:role="line">int sizeof(size_t),</tspan><tspan
+ id="tspan3763"
+ y="149.14789"
+ x="54.285717"
+ sodipodi:role="line">int numTax,</tspan><tspan
+ id="tspan3765"
+ y="161.64789"
+ x="54.285717"
+ sodipodi:role="line">size_t numPattern, </tspan><tspan
+ id="tspan3767"
+ y="174.14789"
+ x="54.285717"
+ sodipodi:role="line">int numPartitions, </tspan><tspan
+ id="tspan3769"
+ y="186.64789"
+ x="54.285717"
+ sodipodi:role="line">double gappyness </tspan></text>
+ <text
+ transform="scale(-0.86513234,1.1558925)"
+ sodipodi:linespacing="125%"
+ id="text3771"
+ y="155.88176"
+ x="-304.42612"
+ style="font-size:45.68457413px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ xml:space="preserve"><tspan
+ y="155.88176"
+ x="-304.42612"
+ id="tspan3773"
+ sodipodi:role="line">{</tspan></text>
+ <text
+ sodipodi:linespacing="125%"
+ id="text3775"
+ y="172.61049"
+ x="264.66064"
+ style="font-size:22.46417236px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ xml:space="preserve"><tspan
+ style="font-size:20px"
+ y="172.61049"
+ x="264.66064"
+ id="tspan3777"
+ sodipodi:role="line">header</tspan></text>
+ </g>
+ <g
+ id="g3908"
+ transform="translate(-7.0710681,154.55334)">
+ <text
+ sodipodi:linespacing="125%"
+ id="text3779"
+ y="275.93359"
+ x="55"
+ style="font-size:10px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ xml:space="preserve"><tspan
+ y="275.93359"
+ x="55"
+ id="tspan3781"
+ sodipodi:role="line">int len1, </tspan><tspan
+ id="tspan3783"
+ y="288.43359"
+ x="55"
+ sodipodi:role="line">char taxonName[len1], </tspan><tspan
+ id="tspan3785"
+ y="300.93359"
+ x="55"
+ sodipodi:role="line">int len2, </tspan><tspan
+ id="tspan3787"
+ y="313.43359"
+ x="55"
+ sodipodi:role="line">char taxonName[len2],</tspan><tspan
+ id="tspan3789"
+ y="325.93359"
+ x="55"
+ sodipodi:role="line">...</tspan></text>
+ <text
+ transform="scale(-0.75650437,1.3218694)"
+ sodipodi:linespacing="125%"
+ id="text3771-8"
+ y="237.30026"
+ x="-348.03134"
+ style="font-size:52.24451065px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ xml:space="preserve"><tspan
+ y="237.30026"
+ x="-348.03134"
+ id="tspan3773-9"
+ sodipodi:role="line">{</tspan></text>
+ <text
+ sodipodi:linespacing="125%"
+ id="text3812"
+ y="300.21933"
+ x="270"
+ style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ xml:space="preserve"><tspan
+ y="300.21933"
+ x="270"
+ id="tspan3814"
+ sodipodi:role="line">taxon names</tspan></text>
+ </g>
+ <g
+ id="g3920"
+ transform="translate(-6.0609153,95.964492)">
+ <text
+ sodipodi:linespacing="125%"
+ id="text3816"
+ y="424.50504"
+ x="56.42857"
+ style="font-size:10px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ xml:space="preserve"><tspan
+ y="424.50504"
+ x="56.42857"
+ id="tspan3818"
+ sodipodi:role="line">partition1{ </tspan><tspan
+ id="tspan3852"
+ y="437.00504"
+ x="56.42857"
+ sodipodi:role="line">int states, </tspan><tspan
+ id="tspan3820"
+ y="449.50504"
+ x="56.42857"
+ sodipodi:role="line">int maxTipStates,</tspan><tspan
+ id="tspan3822"
+ y="462.00504"
+ x="56.42857"
+ sodipodi:role="line">size_t lower,</tspan><tspan
+ id="tspan3824"
+ y="474.50504"
+ x="56.42857"
+ sodipodi:role="line">size_t upper,</tspan><tspan
+ id="tspan3826"
+ y="487.00504"
+ x="56.42857"
+ sodipodi:role="line">size_t width, (unused)</tspan><tspan
+ id="tspan3828"
+ y="499.50504"
+ x="56.42857"
+ sodipodi:role="line">int dataType,</tspan><tspan
+ id="tspan3830"
+ y="512.005"
+ x="56.42857"
+ sodipodi:role="line">int protModels,</tspan><tspan
+ id="tspan3832"
+ y="524.505"
+ x="56.42857"
+ sodipodi:role="line">int autoProtModels,</tspan><tspan
+ id="tspan3836"
+ y="537.005"
+ x="56.42857"
+ sodipodi:role="line">int protFreqs,</tspan><tspan
+ id="tspan3840"
+ y="549.505"
+ x="56.42857"
+ sodipodi:role="line">boolean nonGTR,</tspan><tspan
+ id="tspan3842"
+ y="562.005"
+ x="56.42857"
+ sodipodi:role="line">boolean optimizeBaseFrequencies,</tspan><tspan
+ id="tspan3844"
+ y="574.505"
+ x="56.42857"
+ sodipodi:role="line">int numberOfCategories,</tspan><tspan
+ id="tspan3846"
+ y="587.005"
+ x="56.42857"
+ sodipodi:role="line">int len,</tspan><tspan
+ id="tspan3848"
+ y="599.505"
+ x="56.42857"
+ sodipodi:role="line">char partitionName[len],</tspan><tspan
+ id="tspan3850"
+ y="612.005"
+ x="56.42857"
+ sodipodi:role="line">double frequencies[states]</tspan><tspan
+ id="tspan3854"
+ y="624.505"
+ x="56.42857"
+ sodipodi:role="line">}</tspan><tspan
+ id="tspan3856"
+ y="637.005"
+ x="56.42857"
+ sodipodi:role="line">partition 2{</tspan><tspan
+ id="tspan3858"
+ y="649.505"
+ x="56.42857"
+ sodipodi:role="line">....</tspan><tspan
+ id="tspan3862"
+ y="662.005"
+ x="56.42857"
+ sodipodi:role="line">}</tspan><tspan
+ id="tspan3860"
+ y="674.505"
+ x="56.42857"
+ sodipodi:role="line">....</tspan></text>
+ <text
+ transform="scale(-0.2978097,3.357849)"
+ sodipodi:linespacing="125%"
+ id="text3771-8-8"
+ y="185.44205"
+ x="-839.49341"
+ style="font-size:83.03351593px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ xml:space="preserve"><tspan
+ y="185.44205"
+ x="-839.49341"
+ id="tspan3773-9-3"
+ sodipodi:role="line">{</tspan></text>
+ <text
+ sodipodi:linespacing="125%"
+ id="text3885"
+ y="544.50507"
+ x="260.71429"
+ style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ xml:space="preserve"><tspan
+ y="544.50507"
+ x="260.71429"
+ id="tspan3887"
+ sodipodi:role="line">partition infos</tspan></text>
+ </g>
+ <rect
+ style="fill:#808080;fill-opacity:1;stroke:none"
+ id="rect3960"
+ width="403.05087"
+ height="159.6041"
+ x="54.548237"
+ y="227.06755" />
+ <text
+ xml:space="preserve"
+ style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="57.823799"
+ y="220.41492"
+ id="text3962"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan3964"
+ x="57.823799"
+ y="220.41492">int weights[numPattern]</tspan></text>
+ <rect
+ style="fill:#808080;fill-opacity:1;stroke:none"
+ id="rect3960-1"
+ width="403.05087"
+ height="159.6041"
+ x="44.951782"
+ y="811.81976" />
+ <text
+ xml:space="preserve"
+ style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="45.714287"
+ y="802.36218"
+ id="text3984"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan3986"
+ x="45.714287"
+ y="802.36218">char yVector[numPattern]</tspan></text>
+ </g>
+ <g
+ inkscape:groupmode="layer"
+ id="layer2"
+ inkscape:label="<2->"
+ transform="translate(-30,-13.400127)"
+ style="display:inline">
+ <g
+ id="g4683">
+ <rect
+ y="213.79076"
+ x="785.71429"
+ height="185.71428"
+ width="320"
+ id="rect3995"
+ style="fill:#00ffff;fill-opacity:1;stroke:none" />
+ <text
+ sodipodi:linespacing="125%"
+ id="text3997"
+ y="203.79076"
+ x="790"
+ style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ xml:space="preserve"><tspan
+ y="203.79076"
+ x="790"
+ id="tspan3999"
+ sodipodi:role="line">ByteFile *bFile</tspan></text>
+ <text
+ sodipodi:linespacing="125%"
+ id="text4651"
+ y="276.64789"
+ x="801.42859"
+ style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ xml:space="preserve"><tspan
+ y="276.64789"
+ x="801.42859"
+ id="tspan4653"
+ sodipodi:role="line">....</tspan><tspan
+ id="tspan4655"
+ y="301.64789"
+ x="801.42859"
+ sodipodi:role="line">pInfo* partitions</tspan><tspan
+ id="tspan4657"
+ y="326.64789"
+ x="801.42859"
+ sodipodi:role="line">....</tspan></text>
+ </g>
+ </g>
+ <g
+ inkscape:groupmode="layer"
+ id="layer3"
+ inkscape:label="<2>"
+ style="display:inline"
+ transform="translate(-30,-13.400127)">
+ <text
+ xml:space="preserve"
+ style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="825.71429"
+ y="43.790752"
+ id="text3988"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan3990"
+ x="825.71429"
+ y="43.790752">1. read header, taxa, partitions into ByteFile struct</tspan><tspan
+ sodipodi:role="line"
+ x="825.71429"
+ y="93.790756"
+ id="tspan3992">(use seekPos() to navigate in bytefile)</tspan></text>
+ <path
+ style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend)"
+ d="m 368.57143,160.93361 c 104.28571,57.14286 407.14286,80 407.14286,80"
+ id="path4001"
+ inkscape:connector-curvature="0" />
+ <path
+ style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend)"
+ d="M 417.79138,448.34805 C 524.93423,432.63377 774.93424,331.2052 774.93424,331.2052"
+ id="path4001-4"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend)"
+ d="M 405.78958,605.47122 C 512.93243,589.75694 778.64673,381.18552 778.64673,381.18552"
+ id="path4001-4-5"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ </g>
+ <g
+ inkscape:groupmode="layer"
+ id="layer5"
+ inkscape:label="<3->"
+ style="display:inline"
+ transform="translate(-30,-13.400127)">
+ <rect
+ style="fill:#00ffff;stroke:none;display:inline"
+ id="rect4718"
+ width="370"
+ height="195.71428"
+ x="1298.5714"
+ y="203.79076" />
+ <text
+ xml:space="preserve"
+ style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;display:inline;font-family:Sans"
+ x="1299.9999"
+ y="192.36218"
+ id="text4720"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan4722"
+ x="1299.9999"
+ y="192.36218">PartitionAssignment *pAss</tspan></text>
+ </g>
+ <g
+ inkscape:groupmode="layer"
+ id="layer4"
+ inkscape:label="<3>"
+ style="display:inline"
+ transform="translate(-30,-13.400127)">
+ <text
+ xml:space="preserve"
+ style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="777.14288"
+ y="59.505039"
+ id="text4714"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan4716"
+ x="777.14288"
+ y="59.505039">2. every process computes partition assignment</tspan></text>
+ <path
+ style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+ d="m 1062.9324,318.32837 c 108.5715,-55.71428 251.4286,-8.57142 251.4286,-8.57142"
+ id="path4001-4-9"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ </g>
+ <g
+ inkscape:groupmode="layer"
+ id="layer6"
+ inkscape:label="<4>"
+ style="display:inline"
+ transform="translate(-30,-13.400127)">
+ <rect
+ style="fill:#ff0000;fill-opacity:1;stroke:none"
+ id="rect4767"
+ width="215.71428"
+ height="37.142857"
+ x="151.42857"
+ y="226.6479" />
+ <rect
+ style="fill:#ff0000;fill-opacity:1;stroke:none"
+ id="rect4769"
+ width="205.71428"
+ height="35.714287"
+ x="251.28572"
+ y="351.07645" />
+ <rect
+ style="fill:#ff0000;fill-opacity:1;stroke:none"
+ id="rect4767-5"
+ width="215.71428"
+ height="37.142857"
+ x="141.42857"
+ y="811.64789" />
+ <rect
+ style="fill:#ff0000;fill-opacity:1;stroke:none"
+ id="rect4769-3"
+ width="205.71428"
+ height="35.714287"
+ x="241.42859"
+ y="936.64789" />
+ <text
+ xml:space="preserve"
+ style="font-size:27.47451591px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="812.2901"
+ y="435.74408"
+ id="text4792"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan4794"
+ x="812.2901"
+ y="435.74408">partitions[0].yVector</tspan><tspan
+ sodipodi:role="line"
+ x="812.2901"
+ y="470.08722"
+ id="tspan4796">partitions[0].wgt</tspan><tspan
+ sodipodi:role="line"
+ x="812.2901"
+ y="504.43036"
+ id="tspan4800">partitions[4].yVector</tspan><tspan
+ sodipodi:role="line"
+ x="812.2901"
+ y="538.7735"
+ id="tspan4804">partitions[4].wgt</tspan></text>
+ <path
+ style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+ d="m 365.84088,245.71715 c 258.07408,-7.22696 437.29667,183.35756 437.29667,183.35756"
+ id="path4001-4-9-2"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <text
+ xml:space="preserve"
+ style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="791.95959"
+ y="70.493912"
+ id="text4831"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ x="791.95959"
+ y="70.493912"
+ id="tspan4839">3. process only reads data assigned to it (exa_fread/exa_fseek)</tspan></text>
+ <path
+ style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+ d="m 446.65309,368.95576 c 120.69333,19.03701 358.50477,130.82963 358.50477,130.82963"
+ id="path4001-4-9-2-2"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+ d="M 438.57187,958.88485 C 642.09771,927.41423 795.05633,534.13058 795.05633,534.13058"
+ id="path4001-4-9-2-2-5"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+ d="M 337.55662,833.62593 C 423.7393,651.2704 759.25848,484.19643 799.09694,465.4402"
+ id="path4001-4-9-2-2-5-7"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ </g>
+ <g
+ inkscape:groupmode="layer"
+ id="layer7"
+ inkscape:label="<5>"
+ transform="translate(-30,-13.400127)"
+ style="display:inline">
+ <text
+ xml:space="preserve"
+ style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="783.87836"
+ y="62.412689"
+ id="text4916"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan4918"
+ x="783.87836"
+ y="62.412689">4. tree struct is initialized; bFile and pAss are deleted</tspan></text>
+ <flowRoot
+ xml:space="preserve"
+ id="flowRoot4920"
+ style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"><flowRegion
+ id="flowRegion4922"><rect
+ id="rect4924"
+ width="359.61432"
+ height="436.38589"
+ x="1004.0916"
+ y="599.81384" /></flowRegion><flowPara
+ id="flowPara4926"></flowPara></flowRoot> <rect
+ style="fill:#00ffff;fill-opacity:1;stroke:none"
+ id="rect4928"
+ width="393.9595"
+ height="307.08636"
+ x="1016.2134"
+ y="727.09308" />
+ <text
+ xml:space="preserve"
+ style="font-size:40px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="1018.2338"
+ y="706.89001"
+ id="text4930"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan4932"
+ x="1018.2338"
+ y="706.89001">tree *tr</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-size:20px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="1038.4368"
+ y="832.14893"
+ id="text4934"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan4936"
+ x="1038.4368"
+ y="832.14893">....</tspan><tspan
+ sodipodi:role="line"
+ x="1038.4368"
+ y="857.14893"
+ id="tspan4938">pInfo *partitionData</tspan><tspan
+ sodipodi:role="line"
+ x="1038.4368"
+ y="882.14893"
+ id="tspan4940">....</tspan><tspan
+ sodipodi:role="line"
+ x="1038.4368"
+ y="907.14893"
+ id="tspan4942" /><tspan
+ sodipodi:role="line"
+ x="1038.4368"
+ y="932.14893"
+ id="tspan4944">Assign* assignments</tspan></text>
+ <path
+ style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+ d="m 909.83212,402.22245 c -105.58083,249.35179 89.80411,439.93631 89.80411,439.93631"
+ id="path4001-4-9-2-8"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;stroke:#ff0000;stroke-width:4;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-end:url(#Arrow1Mend);display:inline"
+ d="m 1428.4762,395.11161 c 37.8607,285.71728 -158.6935,528.82973 -158.6935,528.82973"
+ id="path4001-4-9-2-8-8"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <text
+ xml:space="preserve"
+ style="font-size:186.88011169px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="1409.8387"
+ y="373.53967"
+ id="text4995"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan4997"
+ x="1409.8387"
+ y="373.53967"
+ style="fill:#ff0000">X</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-size:186.88011169px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+ x="861.42468"
+ y="368.92688"
+ id="text4995-0"
+ sodipodi:linespacing="125%"><tspan
+ sodipodi:role="line"
+ id="tspan4997-2"
+ x="861.42468"
+ y="368.92688"
+ style="fill:#ff0000">X</tspan></text>
+ </g>
+</svg>
diff --git a/examl/Makefile.AVX.gcc b/examl/Makefile.AVX.gcc
new file mode 100644
index 0000000..08e20aa
--- /dev/null
+++ b/examl/Makefile.AVX.gcc
@@ -0,0 +1,54 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = mpicc
+
+COMMON_FLAGS = -D__SIM_SSE3 -D__AVX -D_OPTIMIZED_FUNCTIONS -msse3 -D_GNU_SOURCE -fomit-frame-pointer -funroll-loops -D_USE_ALLREDUCE #-Wall -Wredundant-decls -Wreturn-type -Wswitch-default -Wunused-value -Wimplicit -Wimplicit-function-declaration -Wimplicit-int -Wimport -Wunused -Wunused-function -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wstrict-prototypes -Wpointer-sign -We [...]
+
+OPT_FLAG_1 = -O1
+OPT_FLAG_2 = -O2
+
+CFLAGS = $(COMMON_FLAGS) $(OPT_FLAG_2)
+
+LIBRARIES = -lm -mavx
+
+RM = rm -f
+
+objs = axml.o optimizeModel.o trash.o searchAlgo.o topologies.o treeIO.o models.o evaluatePartialGenericSpecial.o evaluateGenericSpecial.o newviewGenericSpecial.o makenewzGenericSpecial.o bipartitionList.o restartHashTable.o avxLikelihood.o byteFile.o partitionAssignment.o communication.o quartets.o
+
+all : clean examl-AVX
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h
+
+examl-AVX : $(objs)
+ $(CC) -o examl-AVX $(objs) $(LIBRARIES)
+
+avxLikelihood.o : avxLikelihood.c $(GLOBAL_DEPS)
+ $(CC) $(CFLAGS) -mavx -c -o avxLikelihood.o avxLikelihood.c
+
+models.o : models.c $(GLOBAL_DEPS)
+ $(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o models.o models.c
+
+bipartitionList.o : bipartitionList.c $(GLOBAL_DEPS)
+evaluatePartialSpecialGeneric.o : evaluatePartialSpecialGeneric.c $(GLOBAL_DEPS)
+optimizeModel.o : optimizeModel.c $(GLOBAL_DEPS)
+trash.o : trash.c $(GLOBAL_DEPS)
+axml.o : axml.c $(GLOBAL_DEPS)
+searchAlgo.o : searchAlgo.c $(GLOBAL_DEPS)
+topologies.o : topologies.c $(GLOBAL_DEPS)
+treeIO.o : treeIO.c $(GLOBAL_DEPS)
+quartets.o : quartets.c $(GLOBAL_DEPS)
+evaluatePartialGenericSpecial.o : evaluatePartialGenericSpecial.c $(GLOBAL_DEPS)
+evaluateGenericSpecial.o : evaluateGenericSpecial.c $(GLOBAL_DEPS)
+newviewGenericSpecial.o : newviewGenericSpecial.c $(GLOBAL_DEPS)
+makenewzGenericSpecial.o : makenewzGenericSpecial.c $(GLOBAL_DEPS)
+restartHashTable.o : restartHashTable.c $(GLOBAL_DEPS)
+byteFile.o : byteFile.c
+partitionAssignment.o : partitionAssignment.c $(GLOBAL_DEPS)
+communication.o : communication.c $(GLOBAL_DEPS)
+
+
+clean :
+ $(RM) *.o examl-AVX
+
+dev : examl-AVX
\ No newline at end of file
diff --git a/examl/Makefile.MIC.icc b/examl/Makefile.MIC.icc
new file mode 100644
index 0000000..1cd6e9a
--- /dev/null
+++ b/examl/Makefile.MIC.icc
@@ -0,0 +1,56 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = mpicc
+
+MICFLAGS = -D__MIC_NATIVE -mmic -opt-streaming-cache-evict=0 -openmp -D_USE_OMP #-D_PROFILE_MPI
+COMMON_FLAGS = -std=c99 -D__SIM_SSE3 -D_OPTIMIZED_FUNCTIONS -D_GNU_SOURCE -fomit-frame-pointer -funroll-loops -D_USE_ALLREDUCE $(MICFLAGS) # -Wall -Wredundant-decls -Wreturn-type -Wswitch-default -Wunused-value -Wimplicit -Wimplicit-function-declaration -Wimplicit-int -Wimport -Wunused -Wunused-function -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wstrict-prototypes -Wpointer- [...]
+
+OPT_FLAG_1 = -O1
+OPT_FLAG_2 = -O2
+
+CFLAGS = $(COMMON_FLAGS) $(OPT_FLAG_2)
+
+LIBRARIES = -lm -mmic -openmp
+
+RM = rm -f
+
+objs = axml.o optimizeModel.o trash.o searchAlgo.o topologies.o treeIO.o models.o evaluatePartialGenericSpecial.o evaluateGenericSpecial.o newviewGenericSpecial.o makenewzGenericSpecial.o bipartitionList.o restartHashTable.o byteFile.o partitionAssignment.o communication.o mic_native_dna.o mic_native_aa.o quartets.o
+
+all : clean examl-MIC
+
+GLOBAL_DEPS = axml.h globalVariables.h
+
+examl-MIC : $(objs)
+ $(CC) -o examl-MIC $(objs) $(LIBRARIES)
+
+models.o : models.c $(GLOBAL_DEPS)
+ $(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o models.o models.c
+
+partitionAssignment.o: partitionAssignment.o $(GLOBAL_DEPS)
+ $(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o partitionAssignment.o partitionAssignment.c
+
+bipartitionList.o : bipartitionList.c $(GLOBAL_DEPS)
+evaluatePartialSpecialGeneric.o : evaluatePartialSpecialGeneric.c $(GLOBAL_DEPS)
+optimizeModel.o : optimizeModel.c $(GLOBAL_DEPS)
+trash.o : trash.c $(GLOBAL_DEPS)
+axml.o : axml.c $(GLOBAL_DEPS)
+searchAlgo.o : searchAlgo.c $(GLOBAL_DEPS)
+topologies.o : topologies.c $(GLOBAL_DEPS)
+treeIO.o : treeIO.c $(GLOBAL_DEPS)
+
+evaluatePartialGenericSpecial.o : evaluatePartialGenericSpecial.c $(GLOBAL_DEPS)
+evaluateGenericSpecial.o : evaluateGenericSpecial.c $(GLOBAL_DEPS)
+newviewGenericSpecial.o : newviewGenericSpecial.c $(GLOBAL_DEPS)
+makenewzGenericSpecial.o : makenewzGenericSpecial.c $(GLOBAL_DEPS)
+restartHashTable.o : restartHashTable.c $(GLOBAL_DEPS)
+byteFile.o : byteFile.c
+communication.o : communication.c $(GLOBAL_DEPS)
+mic_native_dna.o : mic_native_dna.c $(GLOBAL_DEPS)
+mic_native_aa.o : mic_native_aa.c $(GLOBAL_DEPS)
+quartets.o : quartets.c $(GLOBAL_DEPS)
+
+clean :
+ $(RM) *.o examl-MIC
+
+dev : examl-MIC
diff --git a/examl/Makefile.OMP.AVX.gcc b/examl/Makefile.OMP.AVX.gcc
new file mode 100644
index 0000000..13d21e0
--- /dev/null
+++ b/examl/Makefile.OMP.AVX.gcc
@@ -0,0 +1,54 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = mpicc
+
+COMMON_FLAGS = -D__SIM_SSE3 -D__AVX -D_USE_OMP -fopenmp -D_OPTIMIZED_FUNCTIONS -msse3 -D_GNU_SOURCE -fomit-frame-pointer -funroll-loops -D_USE_ALLREDUCE -Wall # -Wredundant-decls -Wreturn-type -Wswitch-default -Wunused-value -Wimplicit -Wimplicit-function-declaration -Wimplicit-int -Wimport -Wunused -Wunused-function -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wstrict-prototyp [...]
+
+OPT_FLAG_1 = -O1
+OPT_FLAG_2 = -O2
+
+CFLAGS = $(COMMON_FLAGS) $(OPT_FLAG_2)
+
+LIBRARIES = -lm -mavx -fopenmp
+
+RM = rm -f
+
+objs = axml.o optimizeModel.o trash.o searchAlgo.o topologies.o treeIO.o models.o evaluatePartialGenericSpecial.o evaluateGenericSpecial.o newviewGenericSpecial.o makenewzGenericSpecial.o bipartitionList.o restartHashTable.o avxLikelihood.o byteFile.o partitionAssignment.o communication.o quartets.o
+
+all : clean examl-OMP-AVX
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h
+
+examl-OMP-AVX : $(objs)
+ $(CC) -o examl-OMP-AVX $(objs) $(LIBRARIES)
+
+avxLikelihood.o : avxLikelihood.c $(GLOBAL_DEPS)
+ $(CC) $(CFLAGS) -mavx -c -o avxLikelihood.o avxLikelihood.c
+
+models.o : models.c $(GLOBAL_DEPS)
+ $(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o models.o models.c
+
+bipartitionList.o : bipartitionList.c $(GLOBAL_DEPS)
+evaluatePartialSpecialGeneric.o : evaluatePartialSpecialGeneric.c $(GLOBAL_DEPS)
+optimizeModel.o : optimizeModel.c $(GLOBAL_DEPS)
+trash.o : trash.c $(GLOBAL_DEPS)
+axml.o : axml.c $(GLOBAL_DEPS)
+searchAlgo.o : searchAlgo.c $(GLOBAL_DEPS)
+topologies.o : topologies.c $(GLOBAL_DEPS)
+treeIO.o : treeIO.c $(GLOBAL_DEPS)
+quartets.o : quartets.c $(GLOBAL_DEPS)
+evaluatePartialGenericSpecial.o : evaluatePartialGenericSpecial.c $(GLOBAL_DEPS)
+evaluateGenericSpecial.o : evaluateGenericSpecial.c $(GLOBAL_DEPS)
+newviewGenericSpecial.o : newviewGenericSpecial.c $(GLOBAL_DEPS)
+makenewzGenericSpecial.o : makenewzGenericSpecial.c $(GLOBAL_DEPS)
+restartHashTable.o : restartHashTable.c $(GLOBAL_DEPS)
+byteFile.o : byteFile.c
+partitionAssignment.o : partitionAssignment.c $(GLOBAL_DEPS)
+communication.o : communication.c $(GLOBAL_DEPS)
+
+
+clean :
+ $(RM) *.o examl-OMP-AVX
+
+dev : examl-OMP-AVX
diff --git a/examl/Makefile.OMP.SSE3.gcc b/examl/Makefile.OMP.SSE3.gcc
new file mode 100644
index 0000000..5ee3bc2
--- /dev/null
+++ b/examl/Makefile.OMP.SSE3.gcc
@@ -0,0 +1,51 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = mpicc
+
+COMMON_FLAGS = -D_USE_OMP -fopenmp -D_GNU_SOURCE -D__SIM_SSE3 -msse3 -fomit-frame-pointer -funroll-loops -D_OPTIMIZED_FUNCTIONS -D_USE_ALLREDUCE -Wall #-Wunused-parameter -Wredundant-decls -Wreturn-type -Wswitch-default -Wunused-value -Wimplicit -Wimplicit-function-declaration -Wimplicit-int -Wimport -Wunused -Wunused-function -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wstrict [...]
+
+OPT_FLAG_1 = -O1
+OPT_FLAG_2 = -O2
+
+CFLAGS = $(COMMON_FLAGS) $(OPT_FLAG_2)
+
+LIBRARIES = -lm -fopenmp
+
+RM = rm -f
+
+objs = axml.o optimizeModel.o trash.o searchAlgo.o topologies.o treeIO.o models.o evaluatePartialGenericSpecial.o evaluateGenericSpecial.o newviewGenericSpecial.o makenewzGenericSpecial.o bipartitionList.o restartHashTable.o byteFile.o partitionAssignment.o communication.o quartets.o
+
+all : clean examl-OMP
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h
+
+examl-OMP : $(objs)
+ $(CC) -o examl-OMP $(objs) $(LIBRARIES)
+
+models.o : models.c $(GLOBAL_DEPS)
+ $(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o models.o models.c
+
+bipartitionList.o : bipartitionList.c $(GLOBAL_DEPS)
+evaluatePartialSpecialGeneric.o : evaluatePartialSpecialGeneric.c $(GLOBAL_DEPS)
+optimizeModel.o : optimizeModel.c $(GLOBAL_DEPS)
+trash.o : trash.c $(GLOBAL_DEPS)
+axml.o : axml.c $(GLOBAL_DEPS)
+searchAlgo.o : searchAlgo.c $(GLOBAL_DEPS)
+topologies.o : topologies.c $(GLOBAL_DEPS)
+treeIO.o : treeIO.c $(GLOBAL_DEPS)
+models.o : models.c $(GLOBAL_DEPS)
+evaluatePartialGenericSpecial.o : evaluatePartialGenericSpecial.c $(GLOBAL_DEPS)
+evaluateGenericSpecial.o : evaluateGenericSpecial.c $(GLOBAL_DEPS)
+newviewGenericSpecial.o : newviewGenericSpecial.c $(GLOBAL_DEPS)
+makenewzGenericSpecial.o : makenewzGenericSpecial.c $(GLOBAL_DEPS)
+restartHashTable.o : restartHashTable.c $(GLOBAL_DEPS)
+byteFile.o : byteFile.c
+partitionAssignment.o : partitionAssignment.c $(GLOBAL_DEPS)
+communication.o : communication.c $(GLOBAL_DEPS)
+quartets.o : quartets.c $(GLOBAL_DEPS)
+
+clean :
+ $(RM) *.o examl-OMP
+
+dev : examl-OMP
diff --git a/examl/Makefile.SSE3.gcc b/examl/Makefile.SSE3.gcc
new file mode 100644
index 0000000..c15f0fc
--- /dev/null
+++ b/examl/Makefile.SSE3.gcc
@@ -0,0 +1,52 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = mpicc
+
+
+COMMON_FLAGS = -D_GNU_SOURCE -D__SIM_SSE3 -msse3 -fomit-frame-pointer -funroll-loops -D_OPTIMIZED_FUNCTIONS -D_USE_ALLREDUCE #-Wall -Wunused-parameter -Wredundant-decls -Wreturn-type -Wswitch-default -Wunused-value -Wimplicit -Wimplicit-function-declaration -Wimplicit-int -Wimport -Wunused -Wunused-function -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wstrict-prototypes -Wpointe [...]
+
+OPT_FLAG_1 = -O1
+OPT_FLAG_2 = -O2
+
+CFLAGS = $(COMMON_FLAGS) $(OPT_FLAG_2)
+
+LIBRARIES = -lm
+
+RM = rm -f
+
+objs = axml.o optimizeModel.o trash.o searchAlgo.o topologies.o treeIO.o models.o evaluatePartialGenericSpecial.o evaluateGenericSpecial.o newviewGenericSpecial.o makenewzGenericSpecial.o bipartitionList.o restartHashTable.o byteFile.o partitionAssignment.o communication.o quartets.o
+
+all : clean examl
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h
+
+examl : $(objs)
+ $(CC) -o examl $(objs) $(LIBRARIES)
+
+models.o : models.c $(GLOBAL_DEPS)
+ $(CC) $(COMMON_FLAGS) $(OPT_FLAG_1) -c -o models.o models.c
+
+bipartitionList.o : bipartitionList.c $(GLOBAL_DEPS)
+evaluatePartialSpecialGeneric.o : evaluatePartialSpecialGeneric.c $(GLOBAL_DEPS)
+optimizeModel.o : optimizeModel.c $(GLOBAL_DEPS)
+trash.o : trash.c $(GLOBAL_DEPS)
+axml.o : axml.c $(GLOBAL_DEPS)
+searchAlgo.o : searchAlgo.c $(GLOBAL_DEPS)
+topologies.o : topologies.c $(GLOBAL_DEPS)
+treeIO.o : treeIO.c $(GLOBAL_DEPS)
+models.o : models.c $(GLOBAL_DEPS)
+evaluatePartialGenericSpecial.o : evaluatePartialGenericSpecial.c $(GLOBAL_DEPS)
+evaluateGenericSpecial.o : evaluateGenericSpecial.c $(GLOBAL_DEPS)
+newviewGenericSpecial.o : newviewGenericSpecial.c $(GLOBAL_DEPS)
+makenewzGenericSpecial.o : makenewzGenericSpecial.c $(GLOBAL_DEPS)
+restartHashTable.o : restartHashTable.c $(GLOBAL_DEPS)
+byteFile.o : byteFile.c
+partitionAssignment.o : partitionAssignment.c $(GLOBAL_DEPS)
+communication.o : communication.c $(GLOBAL_DEPS)
+quartets.o : quartets.c $(GLOBAL_DEPS)
+
+clean :
+ $(RM) *.o examl
+
+dev : examl
\ No newline at end of file
diff --git a/examl/avxLikelihood.c b/examl/avxLikelihood.c
new file mode 100644
index 0000000..f4438f3
--- /dev/null
+++ b/examl/avxLikelihood.c
@@ -0,0 +1,4052 @@
+#include <unistd.h>
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdint.h>
+#include <limits.h>
+#include "axml.h"
+#include <stdint.h>
+#include <xmmintrin.h>
+#include <pmmintrin.h>
+#include <immintrin.h>
+
+#ifdef _FMA
+#include <x86intrin.h>
+#define FMAMACC(a,b,c) _mm256_macc_pd(b,c,a)
+#endif
+
+extern const unsigned int mask32[32];
+
+const union __attribute__ ((aligned (BYTE_ALIGNMENT)))
+{
+ uint64_t i[4];
+ __m256d m;
+
+} absMask_AVX = {{0x7fffffffffffffffULL, 0x7fffffffffffffffULL, 0x7fffffffffffffffULL, 0x7fffffffffffffffULL}};
+
+
+
+static inline __m256d hadd4(__m256d v, __m256d u)
+{
+ __m256d
+ a, b;
+
+ v = _mm256_hadd_pd(v, v);
+ a = _mm256_permute2f128_pd(v, v, 1);
+ v = _mm256_add_pd(a, v);
+
+ u = _mm256_hadd_pd(u, u);
+ b = _mm256_permute2f128_pd(u, u, 1);
+ u = _mm256_add_pd(b, u);
+
+ v = _mm256_mul_pd(v, u);
+
+ return v;
+}
+
+static inline __m256d hadd3(__m256d v)
+{
+ __m256d
+ a;
+
+ v = _mm256_hadd_pd(v, v);
+ a = _mm256_permute2f128_pd(v, v, 1);
+ v = _mm256_add_pd(a, v);
+
+ return v;
+}
+
+
+void newviewGTRGAMMA_AVX(int tipCase,
+ double *x1, double *x2, double *x3,
+ double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ const int n, double *left, double *right, int *wgt, int *scalerIncrement
+ )
+{
+
+ int
+ i,
+ k,
+ scale,
+ addScale = 0;
+
+ __m256d
+ minlikelihood_avx = _mm256_set1_pd( minlikelihood ),
+ twoto = _mm256_set1_pd(twotothe256);
+
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ double
+ *uX1,
+ umpX1[1024] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ *uX2,
+ umpX2[1024] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ for (i = 1; i < 16; i++)
+ {
+ __m256d
+ tv = _mm256_load_pd(&(tipVector[i * 4]));
+
+ int
+ j;
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m256d
+ left1 = _mm256_load_pd(&left[j * 16 + k * 4]);
+
+ left1 = _mm256_mul_pd(left1, tv);
+ left1 = hadd3(left1);
+
+ _mm256_store_pd(&umpX1[i * 64 + j * 16 + k * 4], left1);
+ }
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m256d
+ left1 = _mm256_load_pd(&right[j * 16 + k * 4]);
+
+ left1 = _mm256_mul_pd(left1, tv);
+ left1 = hadd3(left1);
+
+ _mm256_store_pd(&umpX2[i * 64 + j * 16 + k * 4], left1);
+ }
+ }
+
+
+ for(i = 0; i < n; i++)
+ {
+ uX1 = &umpX1[64 * tipX1[i]];
+ uX2 = &umpX2[64 * tipX2[i]];
+
+ for(k = 0; k < 4; k++)
+ {
+ __m256d
+ xv = _mm256_setzero_pd();
+
+ int
+ l;
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(_mm256_load_pd(&uX1[k * 16 + l * 4]), _mm256_load_pd(&uX2[k * 16 + l * 4]));
+
+ __m256d
+ evv = _mm256_load_pd(&extEV[l * 4]);
+#ifdef _FMA
+ xv = FMAMACC(xv,x1v,evv);
+#else
+ xv = _mm256_add_pd(xv, _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+ _mm256_store_pd(&x3[16 * i + 4 * k], xv);
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+ double
+ *uX1,
+ umpX1[1024] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ for (i = 1; i < 16; i++)
+ {
+ __m256d
+ tv = _mm256_load_pd(&(tipVector[i*4]));
+
+ int
+ j;
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m256d
+ left1 = _mm256_load_pd(&left[j * 16 + k * 4]);
+
+ left1 = _mm256_mul_pd(left1, tv);
+ left1 = hadd3(left1);
+
+ _mm256_store_pd(&umpX1[i * 64 + j * 16 + k * 4], left1);
+ }
+ }
+
+ for(i = 0; i < n; i++)
+ {
+ __m256d
+ xv[4];
+
+ scale = 1;
+ uX1 = &umpX1[64 * tipX1[i]];
+
+ for(k = 0; k < 4; k++)
+ {
+ __m256d
+ xvr = _mm256_load_pd(&(x2[i * 16 + k * 4]));
+
+ int
+ l;
+
+ xv[k] = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_load_pd(&uX1[k * 16 + l * 4]),
+ x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));
+
+ x2v = hadd3(x2v);
+ x1v = _mm256_mul_pd(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&extEV[l * 4]);
+
+#ifdef _FMA
+ xv[k] = FMAMACC(xv[k],x1v,evv);
+#else
+ xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+ if(scale)
+ {
+ __m256d
+ v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) != 15)
+ scale = 0;
+ }
+ }
+
+ if(scale)
+ {
+ xv[0] = _mm256_mul_pd(xv[0], twoto);
+ xv[1] = _mm256_mul_pd(xv[1], twoto);
+ xv[2] = _mm256_mul_pd(xv[2], twoto);
+ xv[3] = _mm256_mul_pd(xv[3], twoto);
+ addScale += wgt[i];
+ }
+
+ _mm256_store_pd(&x3[16 * i], xv[0]);
+ _mm256_store_pd(&x3[16 * i + 4], xv[1]);
+ _mm256_store_pd(&x3[16 * i + 8], xv[2]);
+ _mm256_store_pd(&x3[16 * i + 12], xv[3]);
+ }
+ }
+ break;
+ case INNER_INNER:
+ {
+ for(i = 0; i < n; i++)
+ {
+ __m256d
+ xv[4];
+
+ scale = 1;
+
+ for(k = 0; k < 4; k++)
+ {
+ __m256d
+
+ xvl = _mm256_load_pd(&(x1[i * 16 + k * 4])),
+ xvr = _mm256_load_pd(&(x2[i * 16 + k * 4]));
+
+ int
+ l;
+
+ xv[k] = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(xvl, _mm256_load_pd(&left[k * 16 + l * 4])),
+ x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&extEV[l * 4]);
+
+ xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+ }
+
+ if(scale)
+ {
+ __m256d
+ v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) != 15)
+ scale = 0;
+ }
+ }
+
+ if(scale)
+ {
+ xv[0] = _mm256_mul_pd(xv[0], twoto);
+ xv[1] = _mm256_mul_pd(xv[1], twoto);
+ xv[2] = _mm256_mul_pd(xv[2], twoto);
+ xv[3] = _mm256_mul_pd(xv[3], twoto);
+ addScale += wgt[i];
+ }
+
+ _mm256_store_pd(&x3[16 * i], xv[0]);
+ _mm256_store_pd(&x3[16 * i + 4], xv[1]);
+ _mm256_store_pd(&x3[16 * i + 8], xv[2]);
+ _mm256_store_pd(&x3[16 * i + 12], xv[3]);
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+
+}
+
+
+
+
+void newviewGTRCAT_AVX(int tipCase, double *EV, int *cptr,
+ double *x1_start, double *x2_start, double *x3_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement)
+{
+ double
+ *le,
+ *ri,
+ *x1,
+ *x2;
+
+ int
+ i,
+ addScale = 0;
+
+ __m256d
+ minlikelihood_avx = _mm256_set1_pd( minlikelihood ),
+ twoto = _mm256_set1_pd(twotothe256);
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ int
+ l;
+
+ le = &left[cptr[i] * 16];
+ ri = &right[cptr[i] * 16];
+
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &(tipVector[4 * tipX2[i]]);
+
+ __m256d
+ vv = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+ x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&EV[l * 4]);
+#ifdef _FMA
+ vv = FMAMACC(vv,x1v,evv);
+#else
+ vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+ _mm256_store_pd(&x3_start[4 * i], vv);
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ int
+ l;
+
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &x2_start[4 * i];
+
+ le = &left[cptr[i] * 16];
+ ri = &right[cptr[i] * 16];
+
+ __m256d
+ vv = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+ x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&EV[l * 4]);
+
+#ifdef _FMA
+ vv = FMAMACC(vv,x1v,evv);
+#else
+ vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+
+ __m256d
+ v1 = _mm256_and_pd(vv, absMask_AVX.m);
+
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) == 15)
+ {
+ vv = _mm256_mul_pd(vv, twoto);
+ addScale += wgt[i];
+ }
+
+ _mm256_store_pd(&x3_start[4 * i], vv);
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ int
+ l;
+
+ x1 = &x1_start[4 * i];
+ x2 = &x2_start[4 * i];
+
+
+ le = &left[cptr[i] * 16];
+ ri = &right[cptr[i] * 16];
+
+ __m256d
+ vv = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+ x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&EV[l * 4]);
+#ifdef _FMA
+ vv = FMAMACC(vv,x1v,evv);
+#else
+ vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+
+ __m256d
+ v1 = _mm256_and_pd(vv, absMask_AVX.m);
+
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) == 15)
+ {
+ vv = _mm256_mul_pd(vv, twoto);
+ addScale += wgt[i];
+ }
+
+ _mm256_store_pd(&x3_start[4 * i], vv);
+
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+}
+
+void newviewGTRCATPROT_AVX(int tipCase, double *extEV,
+ int *cptr,
+ double *x1, double *x2, double *x3, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement)
+{
+ double
+ *le, *ri, *v, *vl, *vr;
+
+ int i, l, scale, addScale = 0;
+
+#ifdef _FMA
+ int k;
+#endif
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ for (i = 0; i < n; i++)
+ {
+ le = &left[cptr[i] * 400];
+ ri = &right[cptr[i] * 400];
+
+ vl = &(tipVector[20 * tipX1[i]]);
+ vr = &(tipVector[20 * tipX2[i]]);
+ v = &x3[20 * i];
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d
+ x1v = _mm256_setzero_pd(),
+ x2v = _mm256_setzero_pd();
+
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+#ifdef _FMA
+ for(k = 0; k < 20; k += 4)
+ {
+ __m256d vlv = _mm256_load_pd(&vl[k]);
+ __m256d lvv = _mm256_load_pd(&lv[k]);
+ x1v = FMAMACC(x1v,vlv,lvv);
+ __m256d vrv = _mm256_load_pd(&vr[k]);
+ __m256d rvv = _mm256_load_pd(&rv[k]);
+ x2v = FMAMACC(x2v,vrv,rvv);
+ }
+#else
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+#endif
+
+ x1v = hadd4(x1v, x2v);
+#ifdef _FMA
+ for(k = 0; k < 5; k++)
+ {
+ __m256d evv = _mm256_load_pd(&ev[k*4]);
+ vv[k] = FMAMACC(vv[k],x1v,evv);
+ }
+#else
+ __m256d
+ evv[5];
+
+ evv[0] = _mm256_load_pd(&ev[0]);
+ evv[1] = _mm256_load_pd(&ev[4]);
+ evv[2] = _mm256_load_pd(&ev[8]);
+ evv[3] = _mm256_load_pd(&ev[12]);
+ evv[4] = _mm256_load_pd(&ev[16]);
+
+ vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+ vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+ vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+ vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+ vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));
+#endif
+ }
+ _mm256_store_pd(&v[0], vv[0]);
+ _mm256_store_pd(&v[4], vv[1]);
+ _mm256_store_pd(&v[8], vv[2]);
+ _mm256_store_pd(&v[12], vv[3]);
+ _mm256_store_pd(&v[16], vv[4]);
+ }
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ le = &left[cptr[i] * 400];
+ ri = &right[cptr[i] * 400];
+
+ vl = &(tipVector[20 * tipX1[i]]);
+ vr = &x2[20 * i];
+ v = &x3[20 * i];
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d
+ x1v = _mm256_setzero_pd(),
+ x2v = _mm256_setzero_pd();
+
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+#ifdef _FMA
+ for(k = 0; k < 20; k += 4)
+ {
+ __m256d vlv = _mm256_load_pd(&vl[k]);
+ __m256d lvv = _mm256_load_pd(&lv[k]);
+ x1v = FMAMACC(x1v,vlv,lvv);
+ __m256d vrv = _mm256_load_pd(&vr[k]);
+ __m256d rvv = _mm256_load_pd(&rv[k]);
+ x2v = FMAMACC(x2v,vrv,rvv);
+ }
+#else
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+#endif
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv[5];
+
+ evv[0] = _mm256_load_pd(&ev[0]);
+ evv[1] = _mm256_load_pd(&ev[4]);
+ evv[2] = _mm256_load_pd(&ev[8]);
+ evv[3] = _mm256_load_pd(&ev[12]);
+ evv[4] = _mm256_load_pd(&ev[16]);
+
+#ifdef _FMA
+ for(k = 0; k < 5; k++)
+ vv[k] = FMAMACC(vv[k],x1v,evv[k]);
+#else
+ vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+ vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+ vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+ vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+ vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));
+#endif
+ }
+
+
+ __m256d minlikelihood_avx = _mm256_set1_pd( minlikelihood );
+
+ scale = 1;
+
+ for(l = 0; scale && (l < 20); l += 4)
+ {
+ __m256d
+ v1 = _mm256_and_pd(vv[l / 4], absMask_AVX.m);
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) != 15)
+ scale = 0;
+ }
+
+
+ if(scale)
+ {
+ __m256d
+ twoto = _mm256_set1_pd(twotothe256);
+
+ for(l = 0; l < 20; l += 4)
+ vv[l / 4] = _mm256_mul_pd(vv[l / 4] , twoto);
+
+
+ addScale += wgt[i];
+
+ }
+
+ _mm256_store_pd(&v[0], vv[0]);
+ _mm256_store_pd(&v[4], vv[1]);
+ _mm256_store_pd(&v[8], vv[2]);
+ _mm256_store_pd(&v[12], vv[3]);
+ _mm256_store_pd(&v[16], vv[4]);
+ }
+ break;
+ case INNER_INNER:
+ for(i = 0; i < n; i++)
+ {
+ le = &left[cptr[i] * 400];
+ ri = &right[cptr[i] * 400];
+
+ vl = &x1[20 * i];
+ vr = &x2[20 * i];
+ v = &x3[20 * i];
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d
+ x1v = _mm256_setzero_pd(),
+ x2v = _mm256_setzero_pd();
+
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+
+ x1v = hadd4(x1v, x2v);
+#ifdef _FMA
+ for(k = 0; k < 5; k++)
+ {
+ __m256d evv = _mm256_load_pd(&ev[k*4]);
+ vv[k] = FMAMACC(vv[k],x1v,evv);
+ }
+#else
+ __m256d
+ evv[5];
+
+ evv[0] = _mm256_load_pd(&ev[0]);
+ evv[1] = _mm256_load_pd(&ev[4]);
+ evv[2] = _mm256_load_pd(&ev[8]);
+ evv[3] = _mm256_load_pd(&ev[12]);
+ evv[4] = _mm256_load_pd(&ev[16]);
+
+ vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+ vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+ vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+ vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+ vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));
+#endif
+ }
+
+
+ __m256d minlikelihood_avx = _mm256_set1_pd( minlikelihood );
+
+ scale = 1;
+
+ for(l = 0; scale && (l < 20); l += 4)
+ {
+ __m256d
+ v1 = _mm256_and_pd(vv[l / 4], absMask_AVX.m);
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d
+ twoto = _mm256_set1_pd(twotothe256);
+
+ for(l = 0; l < 20; l += 4)
+ vv[l / 4] = _mm256_mul_pd(vv[l / 4] , twoto);
+
+
+ addScale += wgt[i];
+ }
+
+ _mm256_store_pd(&v[0], vv[0]);
+ _mm256_store_pd(&v[4], vv[1]);
+ _mm256_store_pd(&v[8], vv[2]);
+ _mm256_store_pd(&v[12], vv[3]);
+ _mm256_store_pd(&v[16], vv[4]);
+
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+}
+
+
+
+void newviewGTRGAMMAPROT_AVX_LG4(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV[4], double *tipVector[4],
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling)
+{
+ double
+ *uX1,
+ *uX2,
+ *v,
+ x1px2,
+ *vl,
+ *vr;
+
+ int
+ i,
+ j,
+ l,
+ k,
+ scale,
+ addScale = 0;
+
+
+#ifndef GCC_VERSION
+#define GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
+#endif
+
+
+#if GCC_VERSION < 40500
+ __m256d
+ bitmask = _mm256_set_pd(0,0,0,-1);
+#else
+ __m256i
+ bitmask = _mm256_set_epi32(0, 0, 0, 0, 0, 0, -1, -1);
+#endif
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+
+ double
+ umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ umpX2[1840] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+
+ for(i = 0; i < 23; i++)
+ {
+ for(k = 0; k < 80; k++)
+ {
+ double
+ *ll = &left[k * 20],
+ *rr = &right[k * 20];
+
+ __m256d
+ umpX1v = _mm256_setzero_pd(),
+ umpX2v = _mm256_setzero_pd();
+
+ v = &(tipVector[k / 20][20 * i]);
+
+ for(l = 0; l < 20; l+=4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+#ifdef _FMA
+ __m256d llv = _mm256_load_pd(&ll[l]);
+ umpX1v = FMAMACC(umpX1v,vv,llv);
+ __m256d rrv = _mm256_load_pd(&rr[l]);
+ umpX2v = FMAMACC(umpX2v,vv,rrv);
+#else
+ umpX1v = _mm256_add_pd(umpX1v,_mm256_mul_pd(vv,_mm256_load_pd(&ll[l])));
+ umpX2v = _mm256_add_pd(umpX2v,_mm256_mul_pd(vv,_mm256_load_pd(&rr[l])));
+#endif
+ }
+
+ umpX1v = hadd3(umpX1v);
+ umpX2v = hadd3(umpX2v);
+ _mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+ _mm256_maskstore_pd(&umpX2[80 * i + k], bitmask, umpX2v);
+ }
+ }
+
+ for(i = 0; i < n; i++)
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+ uX2 = &umpX2[80 * tipX2[i]];
+
+ for(j = 0; j < 4; j++)
+ {
+ __m256d vv[5];
+
+ v = &x3[i * 80 + j * 20];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(k = 0; k < 20; k++)
+ {
+ x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+
+ __m256d x1px2v = _mm256_set1_pd(x1px2);
+
+ __m256d extEvv = _mm256_load_pd(&extEV[j][20 * k]);
+#ifdef _FMA
+ vv[0] = FMAMACC(vv[0],x1px2v,extEvv);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+ extEvv = _mm256_load_pd(&extEV[j][20 * k + 4]);
+#ifdef _FMA
+ vv[1] = FMAMACC(vv[1],x1px2v,extEvv);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+ extEvv = _mm256_load_pd(&extEV[j][20 * k + 8]);
+#ifdef _FMA
+ vv[2] = FMAMACC(vv[2],x1px2v,extEvv);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+ extEvv = _mm256_load_pd(&extEV[j][20 * k + 12]);
+#ifdef _FMA
+ vv[3] = FMAMACC(vv[3],x1px2v,extEvv);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+ extEvv = _mm256_load_pd(&extEV[j][20 * k + 16]);
+#ifdef _FMA
+ vv[4] = FMAMACC(vv[4],x1px2v,extEvv);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+ }
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+
+ double
+ umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ ump_x2[20] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ for(i = 0; i < 23; i++)
+ {
+ for(k = 0; k < 80; k++)
+ {
+ __m256d umpX1v = _mm256_setzero_pd();
+
+ v = &(tipVector[k / 20][20 * i]);
+
+ for(l = 0; l < 20; l+=4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ __m256d leftv = _mm256_load_pd(&left[k * 20 + l]);
+#ifdef _FMA
+
+ umpX1v = FMAMACC(umpX1v, vv, leftv);
+#else
+ umpX1v = _mm256_add_pd(umpX1v, _mm256_mul_pd(vv, leftv));
+#endif
+ }
+ umpX1v = hadd3(umpX1v);
+ _mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+ }
+ }
+
+ for (i = 0; i < n; i++)
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+
+ for(k = 0; k < 4; k++)
+ {
+ v = &(x2[80 * i + k * 20]);
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d ump_x2v = _mm256_setzero_pd();
+
+ __m256d vv = _mm256_load_pd(&v[0]);
+ __m256d rightv = _mm256_load_pd(&right[k*400+l*20+0]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[4]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+4]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[8]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+8]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[12]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+12]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[16]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+16]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ ump_x2v = hadd3(ump_x2v);
+ _mm256_maskstore_pd(&ump_x2[l], bitmask, ump_x2v);
+ }
+
+ v = &(x3[80 * i + 20 * k]);
+
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ x1px2 = uX1[k * 20 + l] * ump_x2[l];
+ __m256d x1px2v = _mm256_set1_pd(x1px2);
+
+#ifdef _FMA
+ __m256d ev = _mm256_load_pd(&extEV[l * 20 + 0]);
+ vv[0] = FMAMACC(vv[0],x1px2v, ev);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[k][l * 20 + 0])));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 4]);
+ vv[1] = FMAMACC(vv[1],x1px2v, ev);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[k][l * 20 + 4])));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 8]);
+ vv[2] = FMAMACC(vv[2],x1px2v, ev);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[k][l * 20 + 8])));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 12]);
+ vv[3] = FMAMACC(vv[3],x1px2v, ev);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[k][l * 20 + 12])));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 16]);
+ vv[4] = FMAMACC(vv[4],x1px2v, ev);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[k][l * 20 + 16])));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+
+ }
+ }
+
+ v = &x3[80 * i];
+ __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+ vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+ if(_mm256_movemask_pd(vv_abs) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+ for(l = 0; l < 80; l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+ }
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ for(i = 0; i < n; i++)
+ {
+ scale = 1;
+
+ for(k = 0; k < 4; k++)
+ {
+ vl = &(x1[80 * i + 20 * k]);
+ vr = &(x2[80 * i + 20 * k]);
+ v = &(x3[80 * i + 20 * k]);
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d al = _mm256_setzero_pd();
+ __m256d ar = _mm256_setzero_pd();
+
+ __m256d leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 0]);
+ __m256d rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 0]);
+ __m256d vlv = _mm256_load_pd(&vl[0]);
+ __m256d vrv = _mm256_load_pd(&vr[0]);
+
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 4]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 4]);
+ vlv = _mm256_load_pd(&vl[4]);
+ vrv = _mm256_load_pd(&vr[4]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 8]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 8]);
+ vlv = _mm256_load_pd(&vl[8]);
+ vrv = _mm256_load_pd(&vr[8]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 12]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 12]);
+ vlv = _mm256_load_pd(&vl[12]);
+ vrv = _mm256_load_pd(&vr[12]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 16]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 16]);
+ vlv = _mm256_load_pd(&vl[16]);
+ vrv = _mm256_load_pd(&vr[16]);
+
+#ifdef _FMA
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ /**************************************************************************************************************/
+
+ al = hadd3(al);
+ ar = hadd3(ar);
+ al = _mm256_mul_pd(ar,al);
+
+ /************************************************************************************************************/
+#ifdef _FMA
+ __m256d ev = _mm256_load_pd(&extEV[20 * l + 0]);
+ vv[0] = FMAMACC(vv[0], al, ev);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(al, _mm256_load_pd(&extEV[k][20 * l + 0])));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 4]);
+ vv[1] = FMAMACC(vv[1], al, ev);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(al, _mm256_load_pd(&extEV[k][20 * l + 4])));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 8]);
+ vv[2] = FMAMACC(vv[2], al, ev);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(al, _mm256_load_pd(&extEV[k][20 * l + 8])));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 12]);
+ vv[3] = FMAMACC(vv[3], al, ev);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(al, _mm256_load_pd(&extEV[k][20 * l + 12])));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 16]);
+ vv[4] = FMAMACC(vv[4], al, ev);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(al, _mm256_load_pd(&extEV[k][20 * l + 16])));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+ }
+ }
+ v = &(x3[80 * i]);
+ scale = 1;
+ __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+
+ for(l = 0; scale && (l < 80); l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+ vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+ if(_mm256_movemask_pd(vv_abs) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+ for(l = 0; l < 80; l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+ }
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ if(useFastScaling)
+ *scalerIncrement = addScale;
+}
+
+
+
+void newviewGTRGAMMAPROT_AVX(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *left, double *right, int *wgt, int *scalerIncrement)
+{
+ double
+ *uX1,
+ *uX2,
+ *v,
+ x1px2,
+ *vl,
+ *vr;
+
+ int
+ i,
+ j,
+ l,
+ k,
+ scale,
+ addScale = 0;
+
+
+#ifndef GCC_VERSION
+#define GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
+#endif
+
+
+#if GCC_VERSION < 40500
+ __m256d
+ bitmask = _mm256_set_pd(0,0,0,-1);
+#else
+ __m256i
+ bitmask = _mm256_set_epi32(0, 0, 0, 0, 0, 0, -1, -1);
+#endif
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+
+ double
+ umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ umpX2[1840] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ for(i = 0; i < 23; i++)
+ {
+ v = &(tipVector[20 * i]);
+
+ for(k = 0; k < 80; k++)
+ {
+ double
+ *ll = &left[k * 20],
+ *rr = &right[k * 20];
+
+ __m256d
+ umpX1v = _mm256_setzero_pd(),
+ umpX2v = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l+=4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+#ifdef _FMA
+ __m256d llv = _mm256_load_pd(&ll[l]);
+ umpX1v = FMAMACC(umpX1v,vv,llv);
+ __m256d rrv = _mm256_load_pd(&rr[l]);
+ umpX2v = FMAMACC(umpX2v,vv,rrv);
+#else
+ umpX1v = _mm256_add_pd(umpX1v,_mm256_mul_pd(vv,_mm256_load_pd(&ll[l])));
+ umpX2v = _mm256_add_pd(umpX2v,_mm256_mul_pd(vv,_mm256_load_pd(&rr[l])));
+#endif
+ }
+
+ umpX1v = hadd3(umpX1v);
+ umpX2v = hadd3(umpX2v);
+ _mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+ _mm256_maskstore_pd(&umpX2[80 * i + k], bitmask, umpX2v);
+ }
+ }
+
+ for(i = 0; i < n; i++)
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+ uX2 = &umpX2[80 * tipX2[i]];
+
+ for(j = 0; j < 4; j++)
+ {
+ __m256d vv[5];
+
+ v = &x3[i * 80 + j * 20];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(k = 0; k < 20; k++)
+ {
+ x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+
+ __m256d x1px2v = _mm256_set1_pd(x1px2);
+
+ __m256d extEvv = _mm256_load_pd(&extEV[20 * k]);
+#ifdef _FMA
+ vv[0] = FMAMACC(vv[0],x1px2v,extEvv);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 4]);
+#ifdef _FMA
+ vv[1] = FMAMACC(vv[1],x1px2v,extEvv);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 8]);
+#ifdef _FMA
+ vv[2] = FMAMACC(vv[2],x1px2v,extEvv);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 12]);
+#ifdef _FMA
+ vv[3] = FMAMACC(vv[3],x1px2v,extEvv);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 16]);
+#ifdef _FMA
+ vv[4] = FMAMACC(vv[4],x1px2v,extEvv);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+ }
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+
+ double
+ umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ ump_x2[20] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ for(i = 0; i < 23; i++)
+ {
+ v = &(tipVector[20 * i]);
+
+ for(k = 0; k < 80; k++)
+ {
+ __m256d umpX1v = _mm256_setzero_pd();
+ for(l = 0; l < 20; l+=4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ __m256d leftv = _mm256_load_pd(&left[k * 20 + l]);
+#ifdef _FMA
+
+ umpX1v = FMAMACC(umpX1v, vv, leftv);
+#else
+ umpX1v = _mm256_add_pd(umpX1v, _mm256_mul_pd(vv, leftv));
+#endif
+ }
+ umpX1v = hadd3(umpX1v);
+ _mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+ }
+ }
+
+ for (i = 0; i < n; i++)
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+
+ for(k = 0; k < 4; k++)
+ {
+ v = &(x2[80 * i + k * 20]);
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d ump_x2v = _mm256_setzero_pd();
+
+ __m256d vv = _mm256_load_pd(&v[0]);
+ __m256d rightv = _mm256_load_pd(&right[k*400+l*20+0]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[4]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+4]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[8]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+8]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[12]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+12]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[16]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+16]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ ump_x2v = hadd3(ump_x2v);
+ _mm256_maskstore_pd(&ump_x2[l], bitmask, ump_x2v);
+ }
+
+ v = &(x3[80 * i + 20 * k]);
+
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ x1px2 = uX1[k * 20 + l] * ump_x2[l];
+ __m256d x1px2v = _mm256_set1_pd(x1px2);
+
+#ifdef _FMA
+ __m256d ev = _mm256_load_pd(&extEV[l * 20 + 0]);
+ vv[0] = FMAMACC(vv[0],x1px2v, ev);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 0])));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 4]);
+ vv[1] = FMAMACC(vv[1],x1px2v, ev);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 4])));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 8]);
+ vv[2] = FMAMACC(vv[2],x1px2v, ev);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 8])));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 12]);
+ vv[3] = FMAMACC(vv[3],x1px2v, ev);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 12])));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 16]);
+ vv[4] = FMAMACC(vv[4],x1px2v, ev);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 16])));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+
+ }
+ }
+
+ v = &x3[80 * i];
+ __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+ vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+ if(_mm256_movemask_pd(vv_abs) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+ for(l = 0; l < 80; l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+ }
+
+ addScale += wgt[i];
+
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ for(i = 0; i < n; i++)
+ {
+ scale = 1;
+
+ for(k = 0; k < 4; k++)
+ {
+ vl = &(x1[80 * i + 20 * k]);
+ vr = &(x2[80 * i + 20 * k]);
+ v = &(x3[80 * i + 20 * k]);
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d al = _mm256_setzero_pd();
+ __m256d ar = _mm256_setzero_pd();
+
+ __m256d leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 0]);
+ __m256d rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 0]);
+ __m256d vlv = _mm256_load_pd(&vl[0]);
+ __m256d vrv = _mm256_load_pd(&vr[0]);
+
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 4]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 4]);
+ vlv = _mm256_load_pd(&vl[4]);
+ vrv = _mm256_load_pd(&vr[4]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 8]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 8]);
+ vlv = _mm256_load_pd(&vl[8]);
+ vrv = _mm256_load_pd(&vr[8]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 12]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 12]);
+ vlv = _mm256_load_pd(&vl[12]);
+ vrv = _mm256_load_pd(&vr[12]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 16]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 16]);
+ vlv = _mm256_load_pd(&vl[16]);
+ vrv = _mm256_load_pd(&vr[16]);
+
+#ifdef _FMA
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ /**************************************************************************************************************/
+
+ al = hadd3(al);
+ ar = hadd3(ar);
+ al = _mm256_mul_pd(ar,al);
+
+ /************************************************************************************************************/
+#ifdef _FMA
+ __m256d ev = _mm256_load_pd(&extEV[20 * l + 0]);
+ vv[0] = FMAMACC(vv[0], al, ev);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 0])));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 4]);
+ vv[1] = FMAMACC(vv[1], al, ev);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 4])));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 8]);
+ vv[2] = FMAMACC(vv[2], al, ev);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 8])));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 12]);
+ vv[3] = FMAMACC(vv[3], al, ev);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 12])));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 16]);
+ vv[4] = FMAMACC(vv[4], al, ev);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 16])));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+ }
+ }
+ v = &(x3[80 * i]);
+ scale = 1;
+ __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+
+ for(l = 0; scale && (l < 80); l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+ vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+ if(_mm256_movemask_pd(vv_abs) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+ for(l = 0; l < 80; l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+ }
+
+ addScale += wgt[i];
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+}
+
+
+/***** functions with memory saving ******************************/
+
+void newviewGTRGAMMA_AVX_GAPPED_SAVE(int tipCase,
+ double *x1_start, double *x2_start, double *x3_start,
+ double *extEV, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ const int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn
+ )
+{
+
+ int
+ i,
+ k,
+ scale,
+ scaleGap,
+ addScale = 0;
+
+ __m256d
+ minlikelihood_avx = _mm256_set1_pd( minlikelihood ),
+ twoto = _mm256_set1_pd(twotothe256);
+
+ double
+ *x1,
+ *x2,
+ *x3,
+ *x1_ptr = x1_start,
+ *x2_ptr = x2_start;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ double
+ *uX1,
+ umpX1[1024] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ *uX2,
+ umpX2[1024] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ for (i = 1; i < 16; i++)
+ {
+ __m256d
+ tv = _mm256_load_pd(&(tipVector[i * 4]));
+
+ int
+ j;
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m256d
+ left1 = _mm256_load_pd(&left[j * 16 + k * 4]);
+
+ left1 = _mm256_mul_pd(left1, tv);
+ left1 = hadd3(left1);
+
+ _mm256_store_pd(&umpX1[i * 64 + j * 16 + k * 4], left1);
+ }
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m256d
+ left1 = _mm256_load_pd(&right[j * 16 + k * 4]);
+
+ left1 = _mm256_mul_pd(left1, tv);
+ left1 = hadd3(left1);
+
+ _mm256_store_pd(&umpX2[i * 64 + j * 16 + k * 4], left1);
+ }
+ }
+
+ x3 = x3_gapColumn;
+
+ {
+ uX1 = &umpX1[960];
+ uX2 = &umpX2[960];
+
+ for(k = 0; k < 4; k++)
+ {
+ __m256d
+ xv = _mm256_setzero_pd();
+
+ int
+ l;
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(_mm256_load_pd(&uX1[k * 16 + l * 4]), _mm256_load_pd(&uX2[k * 16 + l * 4]));
+
+ __m256d
+ evv = _mm256_load_pd(&extEV[l * 4]);
+#ifdef _FMA
+ xv = FMAMACC(xv,x1v,evv);
+#else
+ xv = _mm256_add_pd(xv, _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+ _mm256_store_pd(&x3[4 * k], xv);
+ }
+ }
+
+ x3 = x3_start;
+
+ for(i = 0; i < n; i++)
+ {
+ if(!(x3_gap[i / 32] & mask32[i % 32]))
+ {
+ uX1 = &umpX1[64 * tipX1[i]];
+ uX2 = &umpX2[64 * tipX2[i]];
+
+ for(k = 0; k < 4; k++)
+ {
+ __m256d
+ xv = _mm256_setzero_pd();
+
+ int
+ l;
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(_mm256_load_pd(&uX1[k * 16 + l * 4]), _mm256_load_pd(&uX2[k * 16 + l * 4]));
+
+ __m256d
+ evv = _mm256_load_pd(&extEV[l * 4]);
+#ifdef _FMA
+ xv = FMAMACC(xv,x1v,evv);
+#else
+ xv = _mm256_add_pd(xv, _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+ _mm256_store_pd(&x3[4 * k], xv);
+ }
+
+ x3 += 16;
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+ double
+ *uX1,
+ umpX1[1024] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ for (i = 1; i < 16; i++)
+ {
+ __m256d
+ tv = _mm256_load_pd(&(tipVector[i*4]));
+
+ int
+ j;
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m256d
+ left1 = _mm256_load_pd(&left[j * 16 + k * 4]);
+
+ left1 = _mm256_mul_pd(left1, tv);
+ left1 = hadd3(left1);
+
+ _mm256_store_pd(&umpX1[i * 64 + j * 16 + k * 4], left1);
+ }
+ }
+
+ {
+ __m256d
+ xv[4];
+
+ scaleGap = 1;
+ uX1 = &umpX1[960];
+
+ x2 = x2_gapColumn;
+ x3 = x3_gapColumn;
+
+ for(k = 0; k < 4; k++)
+ {
+ __m256d
+ xvr = _mm256_load_pd(&(x2[k * 4]));
+
+ int
+ l;
+
+ xv[k] = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_load_pd(&uX1[k * 16 + l * 4]),
+ x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));
+
+ x2v = hadd3(x2v);
+ x1v = _mm256_mul_pd(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&extEV[l * 4]);
+
+#ifdef _FMA
+ xv[k] = FMAMACC(xv[k],x1v,evv);
+#else
+ xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+ if(scaleGap)
+ {
+ __m256d
+ v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) != 15)
+ scaleGap = 0;
+ }
+ }
+
+ if(scaleGap)
+ {
+ xv[0] = _mm256_mul_pd(xv[0], twoto);
+ xv[1] = _mm256_mul_pd(xv[1], twoto);
+ xv[2] = _mm256_mul_pd(xv[2], twoto);
+ xv[3] = _mm256_mul_pd(xv[3], twoto);
+ }
+
+ _mm256_store_pd(&x3[0], xv[0]);
+ _mm256_store_pd(&x3[4], xv[1]);
+ _mm256_store_pd(&x3[8], xv[2]);
+ _mm256_store_pd(&x3[12], xv[3]);
+ }
+
+ x3 = x3_start;
+
+ for(i = 0; i < n; i++)
+ {
+ if((x3_gap[i / 32] & mask32[i % 32]))
+ {
+ if(scaleGap)
+ {
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ else
+ {
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 16;
+ }
+
+ __m256d
+ xv[4];
+
+ scale = 1;
+ uX1 = &umpX1[64 * tipX1[i]];
+
+ for(k = 0; k < 4; k++)
+ {
+ __m256d
+ xvr = _mm256_load_pd(&(x2[k * 4]));
+
+ int
+ l;
+
+ xv[k] = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_load_pd(&uX1[k * 16 + l * 4]),
+ x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));
+
+ x2v = hadd3(x2v);
+ x1v = _mm256_mul_pd(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&extEV[l * 4]);
+
+#ifdef _FMA
+ xv[k] = FMAMACC(xv[k],x1v,evv);
+#else
+ xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+ if(scale)
+ {
+ __m256d
+ v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) != 15)
+ scale = 0;
+ }
+ }
+
+ if(scale)
+ {
+ xv[0] = _mm256_mul_pd(xv[0], twoto);
+ xv[1] = _mm256_mul_pd(xv[1], twoto);
+ xv[2] = _mm256_mul_pd(xv[2], twoto);
+ xv[3] = _mm256_mul_pd(xv[3], twoto);
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+
+ _mm256_store_pd(&x3[0], xv[0]);
+ _mm256_store_pd(&x3[4], xv[1]);
+ _mm256_store_pd(&x3[8], xv[2]);
+ _mm256_store_pd(&x3[12], xv[3]);
+
+ x3 += 16;
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ {
+ {
+ x1 = x1_gapColumn;
+ x2 = x2_gapColumn;
+ x3 = x3_gapColumn;
+
+ __m256d
+ xv[4];
+
+ scaleGap = 1;
+
+ for(k = 0; k < 4; k++)
+ {
+ __m256d
+
+ xvl = _mm256_load_pd(&(x1[k * 4])),
+ xvr = _mm256_load_pd(&(x2[k * 4]));
+
+ int
+ l;
+
+ xv[k] = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(xvl, _mm256_load_pd(&left[k * 16 + l * 4])),
+ x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&extEV[l * 4]);
+
+ xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+ }
+
+ if(scaleGap)
+ {
+ __m256d
+ v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) != 15)
+ scaleGap = 0;
+ }
+ }
+
+ if(scaleGap)
+ {
+ xv[0] = _mm256_mul_pd(xv[0], twoto);
+ xv[1] = _mm256_mul_pd(xv[1], twoto);
+ xv[2] = _mm256_mul_pd(xv[2], twoto);
+ xv[3] = _mm256_mul_pd(xv[3], twoto);
+ }
+
+ _mm256_store_pd(&x3[0], xv[0]);
+ _mm256_store_pd(&x3[4], xv[1]);
+ _mm256_store_pd(&x3[8], xv[2]);
+ _mm256_store_pd(&x3[12], xv[3]);
+ }
+
+ x3 = x3_start;
+
+ for(i = 0; i < n; i++)
+ {
+ if(x3_gap[i / 32] & mask32[i % 32])
+ {
+ if(scaleGap)
+ {
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ else
+ {
+ if(x1_gap[i / 32] & mask32[i % 32])
+ x1 = x1_gapColumn;
+ else
+ {
+ x1 = x1_ptr;
+ x1_ptr += 16;
+ }
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 16;
+ }
+
+ __m256d
+ xv[4];
+
+ scale = 1;
+
+ for(k = 0; k < 4; k++)
+ {
+ __m256d
+
+ xvl = _mm256_load_pd(&(x1[k * 4])),
+ xvr = _mm256_load_pd(&(x2[k * 4]));
+
+ int
+ l;
+
+ xv[k] = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(xvl, _mm256_load_pd(&left[k * 16 + l * 4])),
+ x2v = _mm256_mul_pd(xvr, _mm256_load_pd(&right[k * 16 + l * 4]));
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&extEV[l * 4]);
+
+ xv[k] = _mm256_add_pd(xv[k], _mm256_mul_pd(x1v, evv));
+ }
+
+ if(scale)
+ {
+ __m256d
+ v1 = _mm256_and_pd(xv[k], absMask_AVX.m);
+
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) != 15)
+ scale = 0;
+ }
+ }
+
+ if(scale)
+ {
+ xv[0] = _mm256_mul_pd(xv[0], twoto);
+ xv[1] = _mm256_mul_pd(xv[1], twoto);
+ xv[2] = _mm256_mul_pd(xv[2], twoto);
+ xv[3] = _mm256_mul_pd(xv[3], twoto);
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+
+ _mm256_store_pd(&x3[0], xv[0]);
+ _mm256_store_pd(&x3[4], xv[1]);
+ _mm256_store_pd(&x3[8], xv[2]);
+ _mm256_store_pd(&x3[12], xv[3]);
+
+ x3 += 16;
+ }
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ if(useFastScaling)
+ *scalerIncrement = addScale;
+
+}
+
+
+void newviewGTRCAT_AVX_GAPPED_SAVE(int tipCase, double *EV, int *cptr,
+ double *x1_start, double *x2_start, double *x3_start, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats)
+{
+ double
+ *le,
+ *ri,
+ *x1,
+ *x2,
+ *x3,
+ *x1_ptr = x1_start,
+ *x2_ptr = x2_start,
+ *x3_ptr = x3_start;
+
+ int
+ i,
+ scaleGap = 0,
+ addScale = 0;
+
+ __m256d
+ minlikelihood_avx = _mm256_set1_pd( minlikelihood ),
+ twoto = _mm256_set1_pd(twotothe256);
+
+
+ {
+ int
+ l;
+
+ x1 = x1_gapColumn;
+ x2 = x2_gapColumn;
+ x3 = x3_gapColumn;
+
+ le = &left[maxCats * 16];
+ ri = &right[maxCats * 16];
+
+ __m256d
+ vv = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+ x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&EV[l * 4]);
+#ifdef _FMA
+ vv = FMAMACC(vv,x1v,evv);
+#else
+ vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+ if(tipCase != TIP_TIP)
+ {
+ __m256d
+ v1 = _mm256_and_pd(vv, absMask_AVX.m);
+
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) == 15)
+ {
+ vv = _mm256_mul_pd(vv, twoto);
+ scaleGap = 1;
+ }
+ }
+
+ _mm256_store_pd(x3, vv);
+ }
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ if(noGap(x3_gap, i))
+ {
+ int
+ l;
+
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &(tipVector[4 * tipX2[i]]);
+
+ x3 = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ le = &left[maxCats * 16];
+ else
+ le = &left[cptr[i] * 16];
+
+ if(isGap(x2_gap, i))
+ ri = &right[maxCats * 16];
+ else
+ ri = &right[cptr[i] * 16];
+
+ __m256d
+ vv = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+ x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&EV[l * 4]);
+#ifdef _FMA
+ vv = FMAMACC(vv,x1v,evv);
+#else
+ vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+ _mm256_store_pd(x3, vv);
+
+ x3_ptr += 4;
+ }
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ if(isGap(x3_gap, i))
+ {
+ if(scaleGap)
+ {
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ else
+ {
+ int
+ l;
+
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x3 = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ le = &left[maxCats * 16];
+ else
+ le = &left[cptr[i] * 16];
+
+ if(isGap(x2_gap, i))
+ {
+ ri = &right[maxCats * 16];
+ x2 = x2_gapColumn;
+ }
+ else
+ {
+ ri = &right[cptr[i] * 16];
+ x2 = x2_ptr;
+ x2_ptr += 4;
+ }
+
+ __m256d
+ vv = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+ x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&EV[l * 4]);
+
+#ifdef _FMA
+ vv = FMAMACC(vv,x1v,evv);
+#else
+ vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+
+ __m256d
+ v1 = _mm256_and_pd(vv, absMask_AVX.m);
+
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) == 15)
+ {
+ vv = _mm256_mul_pd(vv, twoto);
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+
+ _mm256_store_pd(x3, vv);
+
+ x3_ptr += 4;
+ }
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ if(isGap(x3_gap, i))
+ {
+ if(scaleGap)
+ {
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ else
+ {
+ int
+ l;
+
+ x3 = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ {
+ x1 = x1_gapColumn;
+ le = &left[maxCats * 16];
+ }
+ else
+ {
+ le = &left[cptr[i] * 16];
+ x1 = x1_ptr;
+ x1_ptr += 4;
+ }
+
+ if(isGap(x2_gap, i))
+ {
+ x2 = x2_gapColumn;
+ ri = &right[maxCats * 16];
+ }
+ else
+ {
+ ri = &right[cptr[i] * 16];
+ x2 = x2_ptr;
+ x2_ptr += 4;
+ }
+
+ __m256d
+ vv = _mm256_setzero_pd();
+
+ for(l = 0; l < 4; l++)
+ {
+ __m256d
+ x1v = _mm256_mul_pd(_mm256_load_pd(x1), _mm256_load_pd(&le[l * 4])),
+ x2v = _mm256_mul_pd(_mm256_load_pd(x2), _mm256_load_pd(&ri[l * 4]));
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv = _mm256_load_pd(&EV[l * 4]);
+#ifdef _FMA
+ vv = FMAMACC(vv,x1v,evv);
+#else
+ vv = _mm256_add_pd(vv, _mm256_mul_pd(x1v, evv));
+#endif
+ }
+
+
+ __m256d
+ v1 = _mm256_and_pd(vv, absMask_AVX.m);
+
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) == 15)
+ {
+ vv = _mm256_mul_pd(vv, twoto);
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+
+ _mm256_store_pd(x3, vv);
+
+ x3_ptr += 4;
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ if(useFastScaling)
+ *scalerIncrement = addScale;
+}
+
+void newviewGTRCATPROT_AVX_GAPPED_SAVE(int tipCase, double *extEV,
+ int *cptr,
+ double *x1, double *x2, double *x3, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats)
+{
+ double
+ *le,
+ *ri,
+ *v,
+ *vl,
+ *vr,
+ *x1_ptr = x1,
+ *x2_ptr = x2,
+ *x3_ptr = x3;
+
+ int
+ i,
+ l,
+ scale,
+ addScale = 0,
+ scaleGap = 0;
+
+#ifdef _FMA
+ int k;
+#endif
+
+ {
+ le = &left[maxCats * 400];
+ ri = &right[maxCats * 400];
+
+ vl = x1_gapColumn;
+ vr = x2_gapColumn;
+ v = x3_gapColumn;
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d
+ x1v = _mm256_setzero_pd(),
+ x2v = _mm256_setzero_pd();
+
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+
+ x1v = hadd4(x1v, x2v);
+#ifdef _FMA
+ for(k = 0; k < 5; k++)
+ {
+ __m256d evv = _mm256_load_pd(&ev[k*4]);
+ vv[k] = FMAMACC(vv[k],x1v,evv);
+ }
+#else
+ __m256d
+ evv[5];
+
+ evv[0] = _mm256_load_pd(&ev[0]);
+ evv[1] = _mm256_load_pd(&ev[4]);
+ evv[2] = _mm256_load_pd(&ev[8]);
+ evv[3] = _mm256_load_pd(&ev[12]);
+ evv[4] = _mm256_load_pd(&ev[16]);
+
+ vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+ vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+ vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+ vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+ vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));
+#endif
+ }
+
+
+ if(tipCase != TIP_TIP)
+ {
+ __m256d minlikelihood_avx = _mm256_set1_pd( minlikelihood );
+
+ scale = 1;
+
+ for(l = 0; scale && (l < 20); l += 4)
+ {
+ __m256d
+ v1 = _mm256_and_pd(vv[l / 4], absMask_AVX.m);
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d
+ twoto = _mm256_set1_pd(twotothe256);
+
+ for(l = 0; l < 20; l += 4)
+ vv[l / 4] = _mm256_mul_pd(vv[l / 4] , twoto);
+
+ scaleGap = 1;
+ }
+ }
+
+ _mm256_store_pd(&v[0], vv[0]);
+ _mm256_store_pd(&v[4], vv[1]);
+ _mm256_store_pd(&v[8], vv[2]);
+ _mm256_store_pd(&v[12], vv[3]);
+ _mm256_store_pd(&v[16], vv[4]);
+ }
+
+
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ for (i = 0; i < n; i++)
+ {
+ if(noGap(x3_gap, i))
+ {
+ vl = &(tipVector[20 * tipX1[i]]);
+ vr = &(tipVector[20 * tipX2[i]]);
+ v = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ le = &left[maxCats * 400];
+ else
+ le = &left[cptr[i] * 400];
+
+ if(isGap(x2_gap, i))
+ ri = &right[maxCats * 400];
+ else
+ ri = &right[cptr[i] * 400];
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d
+ x1v = _mm256_setzero_pd(),
+ x2v = _mm256_setzero_pd();
+
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+#ifdef _FMA
+ for(k = 0; k < 20; k += 4)
+ {
+ __m256d vlv = _mm256_load_pd(&vl[k]);
+ __m256d lvv = _mm256_load_pd(&lv[k]);
+ x1v = FMAMACC(x1v,vlv,lvv);
+ __m256d vrv = _mm256_load_pd(&vr[k]);
+ __m256d rvv = _mm256_load_pd(&rv[k]);
+ x2v = FMAMACC(x2v,vrv,rvv);
+ }
+#else
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+#endif
+
+ x1v = hadd4(x1v, x2v);
+#ifdef _FMA
+ for(k = 0; k < 5; k++)
+ {
+ __m256d evv = _mm256_load_pd(&ev[k*4]);
+ vv[k] = FMAMACC(vv[k],x1v,evv);
+ }
+#else
+ __m256d
+ evv[5];
+
+ evv[0] = _mm256_load_pd(&ev[0]);
+ evv[1] = _mm256_load_pd(&ev[4]);
+ evv[2] = _mm256_load_pd(&ev[8]);
+ evv[3] = _mm256_load_pd(&ev[12]);
+ evv[4] = _mm256_load_pd(&ev[16]);
+
+ vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+ vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+ vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+ vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+ vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));
+#endif
+ }
+
+ _mm256_store_pd(&v[0], vv[0]);
+ _mm256_store_pd(&v[4], vv[1]);
+ _mm256_store_pd(&v[8], vv[2]);
+ _mm256_store_pd(&v[12], vv[3]);
+ _mm256_store_pd(&v[16], vv[4]);
+
+ x3_ptr += 20;
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ if(isGap(x3_gap, i))
+ {
+ if(scaleGap)
+ {
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ else
+ {
+ vl = &(tipVector[20 * tipX1[i]]);
+
+ vr = x2_ptr;
+ v = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ le = &left[maxCats * 400];
+ else
+ le = &left[cptr[i] * 400];
+
+ if(isGap(x2_gap, i))
+ {
+ ri = &right[maxCats * 400];
+ vr = x2_gapColumn;
+ }
+ else
+ {
+ ri = &right[cptr[i] * 400];
+ vr = x2_ptr;
+ x2_ptr += 20;
+ }
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d
+ x1v = _mm256_setzero_pd(),
+ x2v = _mm256_setzero_pd();
+
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+#ifdef _FMA
+ for(k = 0; k < 20; k += 4)
+ {
+ __m256d vlv = _mm256_load_pd(&vl[k]);
+ __m256d lvv = _mm256_load_pd(&lv[k]);
+ x1v = FMAMACC(x1v,vlv,lvv);
+ __m256d vrv = _mm256_load_pd(&vr[k]);
+ __m256d rvv = _mm256_load_pd(&rv[k]);
+ x2v = FMAMACC(x2v,vrv,rvv);
+ }
+#else
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+#endif
+
+ x1v = hadd4(x1v, x2v);
+
+ __m256d
+ evv[5];
+
+ evv[0] = _mm256_load_pd(&ev[0]);
+ evv[1] = _mm256_load_pd(&ev[4]);
+ evv[2] = _mm256_load_pd(&ev[8]);
+ evv[3] = _mm256_load_pd(&ev[12]);
+ evv[4] = _mm256_load_pd(&ev[16]);
+
+#ifdef _FMA
+ for(k = 0; k < 5; k++)
+ vv[k] = FMAMACC(vv[k],x1v,evv[k]);
+#else
+ vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+ vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+ vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+ vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+ vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));
+#endif
+ }
+
+
+ __m256d minlikelihood_avx = _mm256_set1_pd( minlikelihood );
+
+ scale = 1;
+
+ for(l = 0; scale && (l < 20); l += 4)
+ {
+ __m256d
+ v1 = _mm256_and_pd(vv[l / 4], absMask_AVX.m);
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d
+ twoto = _mm256_set1_pd(twotothe256);
+
+ for(l = 0; l < 20; l += 4)
+ vv[l / 4] = _mm256_mul_pd(vv[l / 4] , twoto);
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+
+ _mm256_store_pd(&v[0], vv[0]);
+ _mm256_store_pd(&v[4], vv[1]);
+ _mm256_store_pd(&v[8], vv[2]);
+ _mm256_store_pd(&v[12], vv[3]);
+ _mm256_store_pd(&v[16], vv[4]);
+
+ x3_ptr += 20;
+ }
+ }
+ break;
+ case INNER_INNER:
+ for(i = 0; i < n; i++)
+ {
+ if(isGap(x3_gap, i))
+ {
+ if(scaleGap)
+ {
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ else
+ {
+
+ v = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ {
+ vl = x1_gapColumn;
+ le = &left[maxCats * 400];
+ }
+ else
+ {
+ le = &left[cptr[i] * 400];
+ vl = x1_ptr;
+ x1_ptr += 20;
+ }
+
+ if(isGap(x2_gap, i))
+ {
+ vr = x2_gapColumn;
+ ri = &right[maxCats * 400];
+ }
+ else
+ {
+ ri = &right[cptr[i] * 400];
+ vr = x2_ptr;
+ x2_ptr += 20;
+ }
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d
+ x1v = _mm256_setzero_pd(),
+ x2v = _mm256_setzero_pd();
+
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[0]), _mm256_load_pd(&lv[0])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[4]), _mm256_load_pd(&lv[4])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[8]), _mm256_load_pd(&lv[8])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[12]), _mm256_load_pd(&lv[12])));
+ x1v = _mm256_add_pd(x1v, _mm256_mul_pd(_mm256_load_pd(&vl[16]), _mm256_load_pd(&lv[16])));
+
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[0]), _mm256_load_pd(&rv[0])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[4]), _mm256_load_pd(&rv[4])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[8]), _mm256_load_pd(&rv[8])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[12]), _mm256_load_pd(&rv[12])));
+ x2v = _mm256_add_pd(x2v, _mm256_mul_pd(_mm256_load_pd(&vr[16]), _mm256_load_pd(&rv[16])));
+
+ x1v = hadd4(x1v, x2v);
+#ifdef _FMA
+ for(k = 0; k < 5; k++)
+ {
+ __m256d evv = _mm256_load_pd(&ev[k*4]);
+ vv[k] = FMAMACC(vv[k],x1v,evv);
+ }
+#else
+ __m256d
+ evv[5];
+
+ evv[0] = _mm256_load_pd(&ev[0]);
+ evv[1] = _mm256_load_pd(&ev[4]);
+ evv[2] = _mm256_load_pd(&ev[8]);
+ evv[3] = _mm256_load_pd(&ev[12]);
+ evv[4] = _mm256_load_pd(&ev[16]);
+
+ vv[0] = _mm256_add_pd(vv[0], _mm256_mul_pd(x1v, evv[0]));
+ vv[1] = _mm256_add_pd(vv[1], _mm256_mul_pd(x1v, evv[1]));
+ vv[2] = _mm256_add_pd(vv[2], _mm256_mul_pd(x1v, evv[2]));
+ vv[3] = _mm256_add_pd(vv[3], _mm256_mul_pd(x1v, evv[3]));
+ vv[4] = _mm256_add_pd(vv[4], _mm256_mul_pd(x1v, evv[4]));
+#endif
+ }
+
+
+ __m256d minlikelihood_avx = _mm256_set1_pd( minlikelihood );
+
+ scale = 1;
+
+ for(l = 0; scale && (l < 20); l += 4)
+ {
+ __m256d
+ v1 = _mm256_and_pd(vv[l / 4], absMask_AVX.m);
+ v1 = _mm256_cmp_pd(v1, minlikelihood_avx, _CMP_LT_OS);
+
+ if(_mm256_movemask_pd( v1 ) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d
+ twoto = _mm256_set1_pd(twotothe256);
+
+ for(l = 0; l < 20; l += 4)
+ vv[l / 4] = _mm256_mul_pd(vv[l / 4] , twoto);
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+
+ _mm256_store_pd(&v[0], vv[0]);
+ _mm256_store_pd(&v[4], vv[1]);
+ _mm256_store_pd(&v[8], vv[2]);
+ _mm256_store_pd(&v[12], vv[3]);
+ _mm256_store_pd(&v[16], vv[4]);
+
+ x3_ptr += 20;
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ if(useFastScaling)
+ *scalerIncrement = addScale;
+}
+
+void newviewGTRGAMMAPROT_AVX_GAPPED_SAVE(int tipCase,
+ double *x1_start, double *x2_start, double *x3_start, double *extEV, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn)
+{
+ double
+ *x1 = x1_start,
+ *x2 = x2_start,
+ *x3_ptr = x3_start,
+ *x2_ptr = x2_start,
+ *x1_ptr = x1_start,
+ *uX1,
+ *uX2,
+ *v,
+ x1px2,
+ *vl,
+ *vr;
+
+ int
+ i,
+ j,
+ l,
+ k,
+ gapScaling = 0,
+ scale,
+ addScale = 0;
+
+
+#ifndef GCC_VERSION
+#define GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
+#endif
+
+
+#if GCC_VERSION < 40500
+ __m256d
+ bitmask = _mm256_set_pd(0,0,0,-1);
+#else
+ __m256i
+ bitmask = _mm256_set_epi32(0, 0, 0, 0, 0, 0, -1, -1);
+#endif
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ double
+ umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ umpX2[1840] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+
+
+ for(i = 0; i < 23; i++)
+ {
+ v = &(tipVector[20 * i]);
+
+ for(k = 0; k < 80; k++)
+ {
+ double
+ *ll = &left[k * 20],
+ *rr = &right[k * 20];
+
+ __m256d
+ umpX1v = _mm256_setzero_pd(),
+ umpX2v = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l+=4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+#ifdef _FMA
+ __m256d llv = _mm256_load_pd(&ll[l]);
+ umpX1v = FMAMACC(umpX1v,vv,llv);
+ __m256d rrv = _mm256_load_pd(&rr[l]);
+ umpX2v = FMAMACC(umpX2v,vv,rrv);
+#else
+ umpX1v = _mm256_add_pd(umpX1v,_mm256_mul_pd(vv,_mm256_load_pd(&ll[l])));
+ umpX2v = _mm256_add_pd(umpX2v,_mm256_mul_pd(vv,_mm256_load_pd(&rr[l])));
+#endif
+ }
+
+ umpX1v = hadd3(umpX1v);
+ umpX2v = hadd3(umpX2v);
+ _mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+ _mm256_maskstore_pd(&umpX2[80 * i + k], bitmask, umpX2v);
+ }
+ }
+
+
+ {
+ uX1 = &umpX1[1760];
+ uX2 = &umpX2[1760];
+
+ for(j = 0; j < 4; j++)
+ {
+ __m256d vv[5];
+
+ v = &x3_gapColumn[j * 20];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(k = 0; k < 20; k++)
+ {
+ x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+
+ __m256d x1px2v = _mm256_set1_pd(x1px2);
+
+ __m256d extEvv = _mm256_load_pd(&extEV[20 * k]);
+#ifdef _FMA
+ vv[0] = FMAMACC(vv[0],x1px2v,extEvv);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 4]);
+#ifdef _FMA
+ vv[1] = FMAMACC(vv[1],x1px2v,extEvv);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 8]);
+#ifdef _FMA
+ vv[2] = FMAMACC(vv[2],x1px2v,extEvv);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 12]);
+#ifdef _FMA
+ vv[3] = FMAMACC(vv[3],x1px2v,extEvv);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 16]);
+#ifdef _FMA
+ vv[4] = FMAMACC(vv[4],x1px2v,extEvv);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+ }
+ }
+ }
+
+
+ for(i = 0; i < n; i++)
+ {
+ if(!(x3_gap[i / 32] & mask32[i % 32]))
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+ uX2 = &umpX2[80 * tipX2[i]];
+
+ for(j = 0; j < 4; j++)
+ {
+ __m256d vv[5];
+
+ v = &x3_ptr[j * 20];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(k = 0; k < 20; k++)
+ {
+ x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+
+ __m256d x1px2v = _mm256_set1_pd(x1px2);
+
+ __m256d extEvv = _mm256_load_pd(&extEV[20 * k]);
+#ifdef _FMA
+ vv[0] = FMAMACC(vv[0],x1px2v,extEvv);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 4]);
+#ifdef _FMA
+ vv[1] = FMAMACC(vv[1],x1px2v,extEvv);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 8]);
+#ifdef _FMA
+ vv[2] = FMAMACC(vv[2],x1px2v,extEvv);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 12]);
+#ifdef _FMA
+ vv[3] = FMAMACC(vv[3],x1px2v,extEvv);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+ extEvv = _mm256_load_pd(&extEV[20 * k + 16]);
+#ifdef _FMA
+ vv[4] = FMAMACC(vv[4],x1px2v,extEvv);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v,extEvv));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+ }
+ }
+ x3_ptr += 80;
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+ double
+ umpX1[1840] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ ump_x2[20] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+
+
+ for(i = 0; i < 23; i++)
+ {
+ v = &(tipVector[20 * i]);
+
+ for(k = 0; k < 80; k++)
+ {
+ __m256d umpX1v = _mm256_setzero_pd();
+ for(l = 0; l < 20; l+=4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ __m256d leftv = _mm256_load_pd(&left[k * 20 + l]);
+#ifdef _FMA
+
+ umpX1v = FMAMACC(umpX1v, vv, leftv);
+#else
+ umpX1v = _mm256_add_pd(umpX1v, _mm256_mul_pd(vv, leftv));
+#endif
+ }
+ umpX1v = hadd3(umpX1v);
+ _mm256_maskstore_pd(&umpX1[80 * i + k], bitmask, umpX1v);
+ }
+ }
+
+ {
+ uX1 = &umpX1[1760];
+
+ for(k = 0; k < 4; k++)
+ {
+ v = &(x2_gapColumn[k * 20]);
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d ump_x2v = _mm256_setzero_pd();
+
+ __m256d vv = _mm256_load_pd(&v[0]);
+ __m256d rightv = _mm256_load_pd(&right[k*400+l*20+0]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[4]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+4]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[8]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+8]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[12]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+12]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[16]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+16]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ ump_x2v = hadd3(ump_x2v);
+ _mm256_maskstore_pd(&ump_x2[l], bitmask, ump_x2v);
+ }
+
+ v = &x3_gapColumn[20 * k];
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ x1px2 = uX1[k * 20 + l] * ump_x2[l];
+ __m256d x1px2v = _mm256_set1_pd(x1px2);
+
+#ifdef _FMA
+ __m256d ev = _mm256_load_pd(&extEV[l * 20 + 0]);
+ vv[0] = FMAMACC(vv[0],x1px2v, ev);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 0])));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 4]);
+ vv[1] = FMAMACC(vv[1],x1px2v, ev);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 4])));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 8]);
+ vv[2] = FMAMACC(vv[2],x1px2v, ev);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 8])));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 12]);
+ vv[3] = FMAMACC(vv[3],x1px2v, ev);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 12])));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 16]);
+ vv[4] = FMAMACC(vv[4],x1px2v, ev);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 16])));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+
+ }
+ }
+
+ v = x3_gapColumn;
+ __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+ vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+ if(_mm256_movemask_pd(vv_abs) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+ gapScaling = 1;
+
+ for(l = 0; l < 80; l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+ }
+ }
+ }
+
+ for (i = 0; i < n; i++)
+ {
+ if((x3_gap[i / 32] & mask32[i % 32]))
+ {
+ if(gapScaling)
+ {
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ else
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 80;
+ }
+
+ for(k = 0; k < 4; k++)
+ {
+ v = &(x2[k * 20]);
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d ump_x2v = _mm256_setzero_pd();
+
+ __m256d vv = _mm256_load_pd(&v[0]);
+ __m256d rightv = _mm256_load_pd(&right[k*400+l*20+0]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[4]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+4]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[8]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+8]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[12]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+12]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ vv = _mm256_load_pd(&v[16]);
+ rightv = _mm256_load_pd(&right[k*400+l*20+16]);
+#ifdef _FMA
+ ump_x2v = FMAMACC(ump_x2v,vv,rightv);
+#else
+ ump_x2v = _mm256_add_pd(ump_x2v, _mm256_mul_pd(vv, rightv));
+#endif
+
+ ump_x2v = hadd3(ump_x2v);
+ _mm256_maskstore_pd(&ump_x2[l], bitmask, ump_x2v);
+ }
+
+
+ v = &x3_ptr[k * 20];
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ x1px2 = uX1[k * 20 + l] * ump_x2[l];
+ __m256d x1px2v = _mm256_set1_pd(x1px2);
+
+#ifdef _FMA
+ __m256d ev = _mm256_load_pd(&extEV[l * 20 + 0]);
+ vv[0] = FMAMACC(vv[0],x1px2v, ev);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 0])));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 4]);
+ vv[1] = FMAMACC(vv[1],x1px2v, ev);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 4])));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 8]);
+ vv[2] = FMAMACC(vv[2],x1px2v, ev);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 8])));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 12]);
+ vv[3] = FMAMACC(vv[3],x1px2v, ev);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 12])));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[l * 20 + 16]);
+ vv[4] = FMAMACC(vv[4],x1px2v, ev);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(x1px2v, _mm256_load_pd(&extEV[l * 20 + 16])));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+
+ }
+ }
+
+ v = x3_ptr;
+ __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+ vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+ if(_mm256_movemask_pd(vv_abs) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+ for(l = 0; l < 80; l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+ }
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ x3_ptr += 80;
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ for(k = 0; k < 4; k++)
+ {
+ vl = &(x1_gapColumn[20 * k]);
+ vr = &(x2_gapColumn[20 * k]);
+ v = &(x3_gapColumn[20 * k]);
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d al = _mm256_setzero_pd();
+ __m256d ar = _mm256_setzero_pd();
+
+ __m256d leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 0]);
+ __m256d rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 0]);
+ __m256d vlv = _mm256_load_pd(&vl[0]);
+ __m256d vrv = _mm256_load_pd(&vr[0]);
+
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 4]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 4]);
+ vlv = _mm256_load_pd(&vl[4]);
+ vrv = _mm256_load_pd(&vr[4]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 8]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 8]);
+ vlv = _mm256_load_pd(&vl[8]);
+ vrv = _mm256_load_pd(&vr[8]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 12]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 12]);
+ vlv = _mm256_load_pd(&vl[12]);
+ vrv = _mm256_load_pd(&vr[12]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 16]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 16]);
+ vlv = _mm256_load_pd(&vl[16]);
+ vrv = _mm256_load_pd(&vr[16]);
+
+#ifdef _FMA
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ /**************************************************************************************************************/
+
+ al = hadd3(al);
+ ar = hadd3(ar);
+ al = _mm256_mul_pd(ar,al);
+
+ /************************************************************************************************************/
+#ifdef _FMA
+ __m256d ev = _mm256_load_pd(&extEV[20 * l + 0]);
+ vv[0] = FMAMACC(vv[0], al, ev);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 0])));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 4]);
+ vv[1] = FMAMACC(vv[1], al, ev);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 4])));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 8]);
+ vv[2] = FMAMACC(vv[2], al, ev);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 8])));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 12]);
+ vv[3] = FMAMACC(vv[3], al, ev);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 12])));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 16]);
+ vv[4] = FMAMACC(vv[4], al, ev);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 16])));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+ }
+ }
+
+ v = x3_gapColumn;
+ scale = 1;
+ __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+
+ for(l = 0; scale && (l < 80); l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+ vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+ if(_mm256_movemask_pd(vv_abs) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+ gapScaling = 1;
+
+ for(l = 0; l < 80; l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+ }
+
+ }
+
+
+
+ for(i = 0; i < n; i++)
+ {
+
+ if(x3_gap[i / 32] & mask32[i % 32])
+ {
+ if(gapScaling)
+ {
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ else
+ {
+ if(x1_gap[i / 32] & mask32[i % 32])
+ x1 = x1_gapColumn;
+ else
+ {
+ x1 = x1_ptr;
+ x1_ptr += 80;
+ }
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 80;
+ }
+
+ for(k = 0; k < 4; k++)
+ {
+ vl = &(x1[20 * k]);
+ vr = &(x2[20 * k]);
+ v = &(x3_ptr[20 * k]);
+
+ __m256d vv[5];
+
+ vv[0] = _mm256_setzero_pd();
+ vv[1] = _mm256_setzero_pd();
+ vv[2] = _mm256_setzero_pd();
+ vv[3] = _mm256_setzero_pd();
+ vv[4] = _mm256_setzero_pd();
+
+ for(l = 0; l < 20; l++)
+ {
+ __m256d al = _mm256_setzero_pd();
+ __m256d ar = _mm256_setzero_pd();
+
+ __m256d leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 0]);
+ __m256d rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 0]);
+ __m256d vlv = _mm256_load_pd(&vl[0]);
+ __m256d vrv = _mm256_load_pd(&vr[0]);
+
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 4]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 4]);
+ vlv = _mm256_load_pd(&vl[4]);
+ vrv = _mm256_load_pd(&vr[4]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 8]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 8]);
+ vlv = _mm256_load_pd(&vl[8]);
+ vrv = _mm256_load_pd(&vr[8]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 12]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 12]);
+ vlv = _mm256_load_pd(&vl[12]);
+ vrv = _mm256_load_pd(&vr[12]);
+#ifdef _FMA
+
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ leftv = _mm256_load_pd(&left[k * 400 + l * 20 + 16]);
+ rightv = _mm256_load_pd(&right[k * 400 + l * 20 + 16]);
+ vlv = _mm256_load_pd(&vl[16]);
+ vrv = _mm256_load_pd(&vr[16]);
+
+#ifdef _FMA
+ al = FMAMACC(al, vlv, leftv);
+ ar = FMAMACC(ar, vrv, rightv);
+#else
+ al = _mm256_add_pd(al,_mm256_mul_pd(vlv,leftv));
+ ar = _mm256_add_pd(ar,_mm256_mul_pd(vrv,rightv));
+#endif
+
+ /**************************************************************************************************************/
+
+ al = hadd3(al);
+ ar = hadd3(ar);
+ al = _mm256_mul_pd(ar,al);
+
+ /************************************************************************************************************/
+#ifdef _FMA
+ __m256d ev = _mm256_load_pd(&extEV[20 * l + 0]);
+ vv[0] = FMAMACC(vv[0], al, ev);
+#else
+ vv[0] = _mm256_add_pd(vv[0],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 0])));
+#endif
+ _mm256_store_pd(&v[0],vv[0]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 4]);
+ vv[1] = FMAMACC(vv[1], al, ev);
+#else
+ vv[1] = _mm256_add_pd(vv[1],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 4])));
+#endif
+ _mm256_store_pd(&v[4],vv[1]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 8]);
+ vv[2] = FMAMACC(vv[2], al, ev);
+#else
+ vv[2] = _mm256_add_pd(vv[2],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 8])));
+#endif
+ _mm256_store_pd(&v[8],vv[2]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 12]);
+ vv[3] = FMAMACC(vv[3], al, ev);
+#else
+ vv[3] = _mm256_add_pd(vv[3],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 12])));
+#endif
+ _mm256_store_pd(&v[12],vv[3]);
+
+#ifdef _FMA
+ ev = _mm256_load_pd(&extEV[20 * l + 16]);
+ vv[4] = FMAMACC(vv[4], al, ev);
+#else
+ vv[4] = _mm256_add_pd(vv[4],_mm256_mul_pd(al, _mm256_load_pd(&extEV[20 * l + 16])));
+#endif
+ _mm256_store_pd(&v[16],vv[4]);
+ }
+ }
+
+ v = x3_ptr;
+ scale = 1;
+
+ __m256d minlikelihood_avx = _mm256_set1_pd(minlikelihood);
+
+ for(l = 0; scale && (l < 80); l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ __m256d vv_abs = _mm256_and_pd(vv,absMask_AVX.m);
+ vv_abs = _mm256_cmp_pd(vv_abs,minlikelihood_avx,_CMP_LT_OS);
+ if(_mm256_movemask_pd(vv_abs) != 15)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m256d twotothe256v = _mm256_set_pd(twotothe256,twotothe256,twotothe256,twotothe256);
+ for(l = 0; l < 80; l += 4)
+ {
+ __m256d vv = _mm256_load_pd(&v[l]);
+ _mm256_store_pd(&v[l],_mm256_mul_pd(vv,twotothe256v));
+ }
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ x3_ptr += 80;
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ if(useFastScaling)
+ *scalerIncrement = addScale;
+}
diff --git a/examl/axml.c b/examl/axml.c
new file mode 100644
index 0000000..605d096
--- /dev/null
+++ b/examl/axml.c
@@ -0,0 +1,2782 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifdef WIN32
+#include <direct.h>
+#endif
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdarg.h>
+#include <limits.h>
+#include <unistd.h>
+#include <getopt.h>
+
+#include <mpi.h>
+
+#if ! (defined(__ppc) || defined(__powerpc__) || defined(PPC))
+#include <xmmintrin.h>
+/*
+ special bug fix, enforces denormalized numbers to be flushed to zero,
+ without this program is a tiny bit faster though.
+ #include <emmintrin.h>
+ #define MM_DAZ_MASK 0x0040
+ #define MM_DAZ_ON 0x0040
+ #define MM_DAZ_OFF 0x0000
+*/
+#endif
+
+#include "axml.h"
+
+
+#include "globalVariables.h"
+
+#include "byteFile.h"
+#include "partitionAssignment.h"
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+/***************** UTILITY FUNCTIONS **************************/
+
+/*pInfo *cleanPinfoInit()
+{
+ pInfo *p = (pInfo*)malloc(sizeof(pInfo));
+
+
+ return p;
+ }*/
+
+
+void storeExecuteMaskInTraversalDescriptor(tree *tr)
+{
+ int model;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ tr->td[0].executeModel[model] = tr->executeModel[model];
+}
+
+void storeValuesInTraversalDescriptor(tree *tr, double *value)
+{
+ int model;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ tr->td[0].parameterValues[model] = value[model];
+}
+
+
+
+void myBinFwrite(void *ptr, size_t size, size_t nmemb, FILE *byteFile)
+{
+ size_t
+ bytes_read;
+
+ bytes_read = fwrite(ptr, size, nmemb, byteFile);
+
+ assert(bytes_read == nmemb);
+}
+
+void myBinFread(void *ptr, size_t size, size_t nmemb, FILE *byteFile)
+{
+ size_t
+ bytes_read;
+
+ bytes_read = fread(ptr, size, nmemb, byteFile);
+
+ assert(bytes_read == nmemb);
+}
+
+
+static void outOfMemory(void)
+{
+ printf("ExaML process %d was not able to allocate enough memory.\n", processID);
+ printf("Please check the approximate memory consumption of your dataset using\n");
+ printf("the memory calculator at http://www.exelixis-lab.org/web/software/raxml/index.html.\n");
+ printf("ExaML will exit now\n");
+
+
+ MPI_Abort(MPI_COMM_WORLD, -1);
+
+ exit(-1);
+ }
+
+void *malloc_aligned(size_t size)
+{
+ void
+ *ptr = (void *)NULL;
+
+ int
+ res;
+
+
+#ifdef WIN32
+ ptr = _aligned_malloc(size, BYTE_ALIGNMENT);;
+#else
+ res = posix_memalign( &ptr, BYTE_ALIGNMENT, size );
+
+ if(res != 0)
+ {
+ outOfMemory();
+ assert(0);
+ }
+#endif
+
+ return ptr;
+}
+
+
+
+
+
+
+
+static void printBoth(FILE *f, const char* format, ... )
+{
+ if(processID == 0)
+ {
+ va_list args;
+ va_start(args, format);
+ vfprintf(f, format, args );
+ va_end(args);
+
+ va_start(args, format);
+ vprintf(format, args );
+ va_end(args);
+ }
+}
+
+
+
+
+void printBothOpen(const char* format, ... )
+{
+ if(processID == 0)
+ {
+ FILE *f = myfopen(infoFileName, "ab");
+
+ va_list args;
+ va_start(args, format);
+ vfprintf(f, format, args );
+ va_end(args);
+
+ va_start(args, format);
+ vprintf(format, args );
+ va_end(args);
+
+ fclose(f);
+ }
+}
+
+static void printBothOpenDifferentFile(char *fileName, const char* format, ... )
+{
+ if(processID == 0)
+ {
+ FILE
+ *f = myfopen(fileName, "ab");
+
+ va_list
+ args;
+
+ va_start(args, format);
+ vfprintf(f, format, args );
+ va_end(args);
+
+ fclose(f);
+ }
+}
+
+
+
+boolean getSmoothFreqs(int dataType)
+{
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ return pLengths[dataType].smoothFrequencies;
+}
+
+const unsigned int *getBitVector(int dataType)
+{
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ return pLengths[dataType].bitVector;
+}
+
+
+int getStates(int dataType)
+{
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ return pLengths[dataType].states;
+}
+
+int getUndetermined(int dataType)
+{
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ return pLengths[dataType].undetermined;
+}
+
+
+
+char getInverseMeaning(int dataType, unsigned char state)
+{
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ return pLengths[dataType].inverseMeaning[state];
+}
+
+partitionLengths *getPartitionLengths(pInfo *p)
+{
+ int
+ dataType = p->dataType,
+ states = p->states,
+ tipLength = p->maxTipStates;
+
+ assert(states != -1 && tipLength != -1);
+
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ pLength.leftLength = pLength.rightLength = states * states;
+ pLength.eignLength = states;
+ pLength.evLength = states * states;
+ pLength.eiLength = states * states;
+ pLength.substRatesLength = (states * states - states) / 2;
+ pLength.frequenciesLength = states;
+ pLength.tipVectorLength = tipLength * states;
+ pLength.symmetryVectorLength = (states * states - states) / 2;
+ pLength.frequencyGroupingLength = states;
+ pLength.nonGTR = FALSE;
+
+ return (&pLengths[dataType]);
+}
+
+
+
+
+
+
+
+
+
+
+size_t discreteRateCategories(int rateHetModel)
+{
+ size_t
+ result;
+
+ switch(rateHetModel)
+ {
+ case CAT:
+ result = 1;
+ break;
+ case GAMMA:
+ result = 4;
+ break;
+ default:
+ assert(0);
+ }
+
+ return result;
+}
+
+
+
+double gettime(void)
+{
+#ifdef WIN32
+ time_t tp;
+ struct tm localtm;
+ tp = time(NULL);
+ localtm = *localtime(&tp);
+ return 60.0*localtm.tm_min + localtm.tm_sec;
+#else
+ struct timeval ttime;
+ gettimeofday(&ttime , NULL);
+ return ttime.tv_sec + ttime.tv_usec * 0.000001;
+#endif
+}
+
+int gettimeSrand(void)
+{
+#ifdef WIN32
+ time_t tp;
+ struct tm localtm;
+ tp = time(NULL);
+ localtm = *localtime(&tp);
+ return 24*60*60*localtm.tm_yday + 60*60*localtm.tm_hour + 60*localtm.tm_min + localtm.tm_sec;
+#else
+ struct timeval ttime;
+ gettimeofday(&ttime , NULL);
+ return ttime.tv_sec + ttime.tv_usec;
+#endif
+}
+
+double randum (long *seed)
+{
+ long sum, mult0, mult1, seed0, seed1, seed2, newseed0, newseed1, newseed2;
+ double res;
+
+ mult0 = 1549;
+ seed0 = *seed & 4095;
+ sum = mult0 * seed0;
+ newseed0 = sum & 4095;
+ sum >>= 12;
+ seed1 = (*seed >> 12) & 4095;
+ mult1 = 406;
+ sum += mult0 * seed1 + mult1 * seed0;
+ newseed1 = sum & 4095;
+ sum >>= 12;
+ seed2 = (*seed >> 24) & 255;
+ sum += mult0 * seed2 + mult1 * seed1;
+ newseed2 = sum & 255;
+
+ *seed = newseed2 << 24 | newseed1 << 12 | newseed0;
+ res = 0.00390625 * (newseed2 + 0.000244140625 * (newseed1 + 0.000244140625 * newseed0));
+
+ return res;
+}
+
+static int filexists(char *filename)
+{
+ FILE
+ *fp;
+
+ int
+ res;
+
+ fp = fopen(filename,"rb");
+
+ if(fp)
+ {
+ res = 1;
+ fclose(fp);
+ }
+ else
+ res = 0;
+
+ return res;
+}
+
+
+FILE *myfopen(const char *path, const char *mode)
+{
+ FILE *fp = fopen(path, mode);
+
+ if(strcmp(mode,"r") == 0 || strcmp(mode,"rb") == 0)
+ {
+ if(fp)
+ return fp;
+ else
+ {
+ if(processID == 0)
+ printf("The file %s you want to open for reading does not exist, exiting ...\n", path);
+ errorExit(-1);
+ return (FILE *)NULL;
+ }
+ }
+ else
+ {
+ if(fp)
+ return fp;
+ else
+ {
+ if(processID == 0)
+ printf("The file %s ExaML wants to open for writing or appending can not be opened [mode: %s], exiting ...\n",
+ path, mode);
+ errorExit(-1);
+ return (FILE *)NULL;
+ }
+ }
+
+
+}
+
+
+
+
+
+/********************* END UTILITY FUNCTIONS ********************/
+
+
+/******************************some functions for the likelihood computation ****************************/
+
+
+boolean isTip(int number, int maxTips)
+{
+ assert(number > 0);
+
+ if(number <= maxTips)
+ return TRUE;
+ else
+ return FALSE;
+}
+
+
+
+
+
+
+
+
+
+void getxnode (nodeptr p)
+{
+ nodeptr s;
+
+ if ((s = p->next)->x || (s = s->next)->x)
+ {
+ p->x = s->x;
+ s->x = 0;
+ }
+
+ assert(p->x);
+}
+
+
+
+
+
+void hookup (nodeptr p, nodeptr q, double *z, int numBranches)
+{
+ int i;
+
+ p->back = q;
+ q->back = p;
+
+ for(i = 0; i < numBranches; i++)
+ p->z[i] = q->z[i] = z[i];
+}
+
+void hookupDefault (nodeptr p, nodeptr q, int numBranches)
+{
+ int i;
+
+ p->back = q;
+ q->back = p;
+
+ for(i = 0; i < numBranches; i++)
+ p->z[i] = q->z[i] = defaultz;
+}
+
+
+/***********************reading and initializing input ******************/
+
+
+
+
+
+
+
+boolean whitechar (int ch)
+{
+ return (ch == ' ' || ch == '\n' || ch == '\t' || ch == '\r');
+}
+
+
+
+
+
+
+
+
+
+
+static unsigned int KISS32(void)
+{
+ static unsigned int
+ x = 123456789,
+ y = 362436069,
+ z = 21288629,
+ w = 14921776,
+ c = 0;
+
+ unsigned int t;
+
+ x += 545925293;
+ y ^= (y<<13);
+ y ^= (y>>17);
+ y ^= (y<<5);
+ t = z + w + c;
+ z = w;
+ c = (t>>31);
+ w = t & 2147483647;
+
+ return (x+y+w);
+}
+
+static boolean setupTree (tree *tr)
+{
+ nodeptr
+ p0,
+ p,
+ q;
+
+ int
+ i,
+ j,
+ tips,
+ inter;
+
+ tr->bigCutoff = FALSE;
+
+ tr->maxCategories = MAX(4, tr->categories);
+
+ tr->partitionContributions = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+ tr->partitionWeights = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ {
+ tr->partitionContributions[i] = -1.0;
+ tr->partitionWeights[i] = -1.0;
+ }
+
+ tr->perPartitionLH = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ tr->perPartitionLH[i] = 0.0;
+
+ tips = tr->mxtips;
+ inter = tr->mxtips - 1;
+
+ /* printf("%d tips\t%d inner\n", tips, inter); */
+
+
+ tr->treeStringLength = tr->mxtips * (nmlngth+128) + 256 + tr->mxtips * 2;
+
+ tr->tree_string = (char*)calloc(tr->treeStringLength, sizeof(char));
+ tr->tree0 = (char*)calloc(tr->treeStringLength, sizeof(char));
+ tr->tree1 = (char*)calloc(tr->treeStringLength, sizeof(char));
+
+
+ /* TODO, must that be so long ? */
+ /* assert(0); */
+
+
+ tr->td[0].count = 0;
+ tr->td[0].ti = (traversalInfo *)malloc(sizeof(traversalInfo) * tr->mxtips);
+ tr->td[0].executeModel = (boolean *)malloc(sizeof(boolean) * tr->NumberOfModels);
+ tr->td[0].parameterValues = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+
+ tr->constraintVector = (int *)malloc((2 * tr->mxtips) * sizeof(int));
+
+
+ if (!(p0 = (nodeptr) malloc((tips + 3*inter) * sizeof(node))))
+ {
+ printf("ERROR: Unable to obtain sufficient tree memory\n");
+ return FALSE;
+ }
+
+ tr->nodeBaseAddress = p0;
+
+
+ if (!(tr->nodep = (nodeptr *) malloc((2*tr->mxtips) * sizeof(nodeptr))))
+ {
+ printf("ERROR: Unable to obtain sufficient tree memory, too\n");
+ return FALSE;
+ }
+
+ tr->nodep[0] = (node *) NULL; /* Use as 1-based array */
+
+ for (i = 1; i <= tips; i++)
+ {
+ p = p0++;
+
+ p->hash = KISS32(); /* hast table stuff */
+ p->x = 0;
+ p->xBips = 0;
+ p->number = i;
+ p->next = p;
+ p->back = (node *)NULL;
+ tr->nodep[i] = p;
+ }
+
+ for (i = tips + 1; i <= tips + inter; i++)
+ {
+ q = (node *) NULL;
+ for (j = 1; j <= 3; j++)
+ {
+ p = p0++;
+ if(j == 1)
+ {
+ p->xBips = 1;
+ p->x = 1;
+ }
+ else
+ {
+ p->xBips = 0;
+ p->x = 0;
+ }
+ p->number = i;
+ p->next = q;
+ p->back = (node *) NULL;
+ p->hash = 0;
+ q = p;
+ }
+ p->next->next->next = p;
+ tr->nodep[i] = p;
+ }
+
+ tr->likelihood = unlikely;
+ tr->start = (node *) NULL;
+
+ tr->ntips = 0;
+ tr->nextnode = 0;
+
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionSmoothed[i] = FALSE;
+
+ tr->bitVectors = (unsigned int **)NULL;
+
+ tr->vLength = 0;
+
+ tr->h = (hashtable*)NULL;
+
+ tr->nameHash = initStringHashTable(10 * tr->mxtips);
+
+ return TRUE;
+}
+
+
+
+
+static void initAdef(analdef *adef)
+{
+ adef->max_rearrange = 21;
+ adef->stepwidth = 5;
+ adef->initial = 10;
+ adef->bestTrav = 10;
+ adef->initialSet = FALSE;
+ adef->mode = BIG_RAPID_MODE;
+ adef->likelihoodEpsilon = 0.1;
+
+ adef->permuteTreeoptimize = FALSE;
+ adef->perGeneBranchLengths = FALSE;
+
+ adef->useCheckpoint = FALSE;
+
+ adef->useQuartetGrouping = FALSE;
+ adef->numberRandomQuartets = 0;
+
+ adef->quartetCkpInterval = 1000;
+
+#ifdef _BAYESIAN
+ adef->bayesian = FALSE;
+#endif
+
+}
+
+
+
+static int modelExists(char *model, tree *tr)
+{
+ if(strcmp(model, "PSR\0") == 0)
+ {
+ tr->rateHetModel = CAT;
+ return 1;
+ }
+
+ if(strcmp(model, "GAMMA\0") == 0)
+ {
+ tr->rateHetModel = GAMMA;
+ return 1;
+ }
+
+
+ return 0;
+}
+
+
+
+
+
+
+/*********************************** *********************************************************/
+
+
+static void printVersionInfo(void)
+{
+ if(processID == 0)
+ printf("\n\nThis is %s version %s released by Alexandros Stamatakis, Andre J. Aberer, and Alexey Kozlov on %s.\n\n", programName, programVersion, programDate);
+}
+
+static void printMinusFUsage(void)
+{
+ printf("\n");
+
+
+ printf(" \"-f d\": new rapid hill-climbing \n");
+ printf(" DEFAULT: ON\n");
+
+ printf("\n");
+
+ printf(" \"-f e\": compute the likelihood of a bunch of trees passed via -t\n");
+ printf(" this option will do a quick and dirty optimization without re-optimizng\n");
+ printf(" the model parameters for each tree\n");
+
+ printf("\n");
+
+ printf(" \"-f E\": compute the likelihood of a bunch of trees passed via -t\n");
+ printf(" this option will do a thorough optimization that re-optimizes\n");
+ printf(" the model parameters for each tree\n");
+
+ printf("\n");
+
+ printf(" \"-f o\": old and slower rapid hill-climbing without heuristic cutoff\n");
+
+ printf("\n");
+
+ printf(" \"-f q\": fast quartet calculator\n");
+
+ printf("\n");
+
+ printf(" DEFAULT for \"-f\": new rapid hill climbing\n");
+
+ printf("\n");
+}
+
+
+static void printREADME(void)
+{
+ if(processID == 0)
+ {
+ printVersionInfo();
+ printf("\n");
+ printf("\nTo report bugs use the RAxML google group\n");
+ printf("Please send me all input files, the exact invocation, details of the HW and operating system,\n");
+ printf("as well as all error messages printed to screen.\n\n\n");
+
+ printf("examl|examl-AVX\n");
+ printf(" -s binarySequenceFileName\n");
+ printf(" -n outputFileNames\n");
+ printf(" -m rateHeterogeneityModel\n");
+ printf(" -t userStartingTree|-R binaryCheckpointFile|-g constraintTree -p randomNumberSeed\n");
+ printf(" [-a]\n");
+ printf(" [-B numberOfMLtreesToSave]\n");
+ printf(" [-c numberOfCategories]\n");
+ printf(" [-D]\n");
+ printf(" [-e likelihoodEpsilon] \n");
+ printf(" [-f d|e|E|o|q]\n");
+ printf(" [-h] \n");
+ printf(" [-i initialRearrangementSetting] \n");
+ printf(" [-I quartetCheckpointInterval] \n");
+ printf(" [-M]\n");
+ printf(" [-r randomQuartetNumber] \n");
+ printf(" [-S]\n");
+ printf(" [-v]\n");
+ printf(" [-w outputDirectory] \n");
+ printf(" [-Y quartetGroupingFileName]\n");
+ printf(" [--auto-prot=ml|bic|aic|aicc]\n");
+ printf("\n");
+ printf(" -a use the median for the discrete approximation of the GAMMA model of rate heterogeneity\n");
+ printf("\n");
+ printf(" DEFAULT: OFF\n");
+ printf("\n");
+ printf(" -B specify the number of best ML trees to save and print to file\n");
+ printf("\n");
+ printf(" -c Specify number of distinct rate catgories for ExaML when modelOfEvolution\n");
+ printf(" is set to GTRPSR\n");
+ printf(" Individual per-site rates are categorized into numberOfCategories rate \n");
+ printf(" categories to accelerate computations. \n");
+ printf("\n");
+ printf(" DEFAULT: 25\n");
+ printf("\n");
+ printf(" -D ML search convergence criterion. This will break off ML searches if the relative \n");
+ printf(" Robinson-Foulds distance between the trees obtained from two consecutive lazy SPR cycles\n");
+ printf(" is smaller or equal to 1%s. Usage recommended for very large datasets in terms of taxa.\n", "%");
+ printf(" On trees with more than 500 taxa this will yield execution time improvements of approximately 50%s\n", "%");
+ printf(" While yielding only slightly worse trees.\n");
+ printf("\n");
+ printf(" DEFAULT: OFF\n");
+ printf("\n");
+ printf(" -e set model optimization precision in log likelihood units for final\n");
+ printf(" optimization of model parameters\n");
+ printf("\n");
+ printf(" DEFAULT: 0.1 \n");
+ printf("\n");
+ printf(" -f select algorithm:\n");
+
+ printMinusFUsage();
+
+ printf("\n");
+ printf(" -g Pass a multi-furcating constraint tree to ExaML. The tree needs to contain all taxa of the alignment!\n");
+ printf(" When using this option you also need to specify a random number seed via \"-p\"\n");
+ printf("\n");
+ printf(" -h Display this help message.\n");
+ printf("\n");
+ printf(" -i Initial rearrangement setting for the subsequent application of topological \n");
+ printf(" changes phase\n");
+ printf("\n");
+ printf(" -I Set after how many quartet evaluations a new checkpoint will be printed.\n");
+ printf("\n");
+ printf(" DEFAULT: 1000\n");
+ printf("\n");
+ printf(" -m Model of rate heterogeneity\n");
+ printf("\n");
+ printf(" select \"-m PSR\" for the per-site rate category model (this used to be called CAT in RAxML)\n");
+ printf(" select \"-m GAMMA\" for the gamma model of rate heterogeneity with 4 discrete rates\n");
+ printf("\n");
+ printf(" -M Switch on estimation of individual per-partition branch lengths. Only has effect when used in combination with \"-q\"\n");
+ printf(" Branch lengths for individual partitions will be printed to separate files\n");
+ printf(" A weighted average of the branch lengths is computed by using the respective partition lengths\n");
+ printf("\n");
+ printf(" DEFAULT: OFF\n");
+ printf("\n");
+ printf(" -n Specifies the name of the output file.\n");
+ printf("\n");
+ printf(" -p Specify a random number seed, required in conjunction with the \"-g\" option for constraint trees\n");
+ printf("\n");
+ printf(" -R read in a binary checkpoint file called ExaML_binaryCheckpoint.RUN_ID_number\n");
+ printf("\n");
+ printf(" -r Pass the number of quartets to randomly sub-sample from the possible number of quartets for the given taxon set.\n");
+ printf(" Only works in combination with -f q !\n");
+ printf("\n");
+ printf(" -s Specify the name of the BINARY alignment data file generated by the parser component\n");
+ printf("\n");
+ printf(" -S turn on memory saving option for gappy multi-gene alignments. For large and gappy datasets specify -S to save memory\n");
+ printf(" This will produce slightly different likelihood values, may be a bit slower but can reduce memory consumption\n");
+ printf(" from 70GB to 19GB on very large and gappy datasets\n");
+ printf("\n");
+ printf(" -t Specify a user starting tree file name in Newick format\n");
+ printf("\n");
+ printf(" -v Display version information\n");
+ printf("\n");
+ printf(" -w FULL (!) path to the directory into which ExaML shall write its output files\n");
+ printf("\n");
+ printf(" DEFAULT: current directory\n");
+ printf("\n");
+ printf(" -Y Pass a quartet grouping file name defining four groups from which to draw quartets\n");
+ printf(" The file input format must contain 4 groups in the following form:\n");
+ printf(" (Chicken, Human, Loach), (Cow, Carp), (Mouse, Rat, Seal), (Whale, Frog);\n");
+ printf(" Only works in combination with -f q !\n");
+ printf("\n");
+
+ printf("\n");
+ printf(" --auto-prot=ml|bic|aic|aicc When using automatic protein model selection you can chose the criterion for selecting these models.\n");
+ printf(" RAxML will test all available prot subst. models except for LG4M, LG4X and GTR-based models, with and without empirical base frequencies.\n");
+ printf(" You can chose between ML score based selection and the BIC, AIC, and AICc criteria.\n");
+ printf("\n");
+ printf(" DEFAULT: ml\n");
+ printf("\n\n\n\n");
+ }
+}
+
+
+
+
+static void analyzeRunId(char id[128])
+{
+ int i = 0;
+
+ while(id[i] != '\0')
+ {
+ if(i >= 128)
+ {
+ printf("Error: run id after \"-n\" is too long, it has %d characters please use a shorter one\n", i);
+ assert(0);
+ }
+
+ if(id[i] == '/')
+ {
+ printf("Error character %c not allowed in run ID\n", id[i]);
+ assert(0);
+ }
+
+
+ i++;
+ }
+
+ if(i == 0)
+ {
+ printf("Error: please provide a string for the run id after \"-n\" \n");
+ assert(0);
+ }
+
+}
+
+static void get_args(int argc, char *argv[], analdef *adef, tree *tr)
+{
+ boolean
+ resultDirSet = FALSE;
+
+ char
+ resultDir[1024] = "",
+ //*optarg,
+ model[1024] = "",
+ modelChar;
+
+ double
+ likelihoodEpsilon;
+
+ int
+ fOptionCount = 0,
+ c,
+ nameSet = 0,
+ treeSet = 0,
+ modelSet = 0,
+ byteFileSet = 0,
+ seedSet = 0;
+
+
+ /*********** tr inits **************/
+
+ tr->doCutoff = TRUE;
+ tr->secondaryStructureModel = SEC_16; /* default setting */
+ tr->searchConvergenceCriterion = FALSE;
+ tr->rateHetModel = GAMMA;
+
+ tr->multiStateModel = GTR_MULTI_STATE;
+ tr->useGappedImplementation = FALSE;
+ tr->saveMemory = FALSE;
+ tr->constraintTree = FALSE;
+
+ tr->fastTreeEvaluation = FALSE;
+
+ /* tr->manyPartitions = FALSE; */
+
+ tr->categories = 25;
+
+ tr->gapyness = 0.0;
+ tr->saveBestTrees = 0;
+
+ tr->useMedian = FALSE;
+
+ tr->autoProteinSelectionType = AUTO_ML;
+
+ /********* tr inits end*************/
+
+ //while(!bad_opt && ((c = mygetopt(argc,argv,"R:B:e:c:f:i:m:t:g:w:n:s:p:vhMSDa", &optind, &optarg))!=-1))
+
+ static
+ int flag;
+
+ while(1)
+ {
+ static struct
+ option long_options[2] =
+ {
+ {"auto-prot", required_argument, &flag, 1},
+ {0, 0, 0, 0}
+ };
+
+ int
+ option_index;
+
+ flag = 0;
+
+ c = getopt_long(argc, argv, "R:B:Y:I:e:c:f:i:m:t:g:w:n:s:p:r:vhMSDa", long_options, &option_index);
+
+ if(c == -1)
+ break;
+
+ if(flag > 0)
+ {
+ switch(option_index)
+ {
+ case 0:
+ {
+ char
+ *autoModels[4] = {"ml", "bic", "aic", "aicc"};
+
+ int
+ k;
+
+ for(k = 0; k < 4; k++)
+ if(strcmp(optarg, autoModels[k]) == 0)
+ break;
+
+ if(k == 4)
+ {
+ printf("\nError, unknown protein model selection type, you can specify one of the following selection criteria:\n\n");
+ for(k = 0; k < 4; k++)
+ printf("--auto-prot=%s\n", autoModels[k]);
+ printf("\n");
+ errorExit(-1);
+ }
+ else
+ {
+ switch(k)
+ {
+ case 0:
+ tr->autoProteinSelectionType = AUTO_ML;
+ break;
+ case 1:
+ tr->autoProteinSelectionType = AUTO_BIC;
+ break;
+ case 2:
+ tr->autoProteinSelectionType = AUTO_AIC;
+ break;
+ case 3:
+ tr->autoProteinSelectionType = AUTO_AICC;
+ break;
+ default:
+ assert(0);
+ }
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+ }
+ else
+ switch(c)
+ {
+ case 'Y':
+ adef->useQuartetGrouping = TRUE;
+ strcpy(quartetGroupingFileName, optarg);
+ break;
+ case 'r':
+ sscanf(optarg, "%lu", &(adef->numberRandomQuartets));
+ assert(adef->numberRandomQuartets > 0);
+ break;
+ case 'p':
+ sscanf(optarg,"%u", &(tr->randomSeed));
+ seedSet = 1;
+ break;
+ case 'a':
+ tr->useMedian = TRUE;
+ break;
+ case 'B':
+ sscanf(optarg,"%d", &(tr->saveBestTrees));
+ if(tr->saveBestTrees < 0)
+ {
+ printf("Number of best trees to save must be greater than 0!\n");
+ errorExit(-1);
+ }
+ break;
+ case 's':
+ strcpy(byteFileName, optarg);
+ byteFileSet = TRUE;
+ /*printf("%s \n", byteFileName);*/
+ break;
+ case 'S':
+ tr->saveMemory = TRUE;
+ break;
+ case 'D':
+ tr->searchConvergenceCriterion = TRUE;
+ break;
+ case 'R':
+ adef->useCheckpoint = TRUE;
+ strcpy(binaryCheckpointInputName, optarg);
+ break;
+ case 'I':
+ sscanf(optarg, "%lu", &(adef->quartetCkpInterval));
+ break;
+ case 'M':
+ adef->perGeneBranchLengths = TRUE;
+ break;
+ case 'e':
+ sscanf(optarg,"%lf", &likelihoodEpsilon);
+ adef->likelihoodEpsilon = likelihoodEpsilon;
+ break;
+ case 'v':
+ printVersionInfo();
+ errorExit(0);
+ case 'h':
+ printREADME();
+ errorExit(0);
+ case 'c':
+ sscanf(optarg, "%d", &tr->categories);
+ break;
+ case 'f':
+ sscanf(optarg, "%c", &modelChar);
+ fOptionCount++;
+ if(fOptionCount > 1)
+ {
+ printf("\nError: only one of the various \"-f \" options can be used per ExaML run!\n");
+ printf("They are mutually exclusive! exiting ...\n\n");
+ errorExit(-1);
+ }
+ switch(modelChar)
+ {
+ case 'e':
+ adef->mode = TREE_EVALUATION;
+ tr->fastTreeEvaluation = TRUE;
+ break;
+ case 'E':
+ adef->mode = TREE_EVALUATION;
+ tr->fastTreeEvaluation = FALSE;
+ break;
+ case 'd':
+ adef->mode = BIG_RAPID_MODE;
+ tr->doCutoff = TRUE;
+ break;
+ case 'o':
+ adef->mode = BIG_RAPID_MODE;
+ tr->doCutoff = FALSE;
+ break;
+ case 'q':
+ adef->mode = QUARTET_CALCULATION;
+ break;
+ default:
+ {
+ if(processID == 0)
+ {
+ printf("Error select one of the following algorithms via -f :\n");
+ printMinusFUsage();
+ }
+ errorExit(-1);
+ }
+ }
+ break;
+ case 'i':
+ sscanf(optarg, "%d", &adef->initial);
+ adef->initialSet = TRUE;
+ break;
+ case 'n':
+ strcpy(run_id,optarg);
+ analyzeRunId(run_id);
+ nameSet = 1;
+ break;
+ case 'w':
+ strcpy(resultDir, optarg);
+ resultDirSet = TRUE;
+ break;
+ case 't':
+ strcpy(tree_file, optarg);
+ treeSet = 1;
+ break;
+ case 'g':
+ strcpy(tree_file, optarg);
+ treeSet = 1;
+ tr->constraintTree = TRUE;
+ break;
+ case 'm':
+ strcpy(model,optarg);
+ if(modelExists(model, tr) == 0)
+ {
+ if(processID == 0)
+ {
+ printf("Rate heterogeneity Model %s does not exist\n\n", model);
+ printf("For per site rates (called CAT in previous versions) use: PSR\n");
+ printf("For GAMMA use: GAMMA\n");
+ }
+ errorExit(-1);
+ }
+ else
+ modelSet = 1;
+ break;
+ default:
+ errorExit(-1);
+ }
+ }
+
+ if(adef->useQuartetGrouping && adef->mode != QUARTET_CALCULATION)
+ {
+ if(processID == 0)
+ printf("\nError, you must specify \"-Y quartetGroupingFileName\" in combination with \"-f q\"\n");
+ errorExit(-1);
+ }
+
+ if(adef->numberRandomQuartets > 0 && adef->mode != QUARTET_CALCULATION)
+ {
+ if(processID == 0)
+ printf("\nError, you must specify \"-r randomQuartetNumber\" in combination with \"-f q\"\n");
+ errorExit(-1);
+ }
+
+ if((adef->numberRandomQuartets > 0) && (adef->useQuartetGrouping))
+ {
+ if(processID == 0)
+ printf("\nError, you must specify either \"-r randomQuartetNumber\" or \"-Y quartetGroupingFileName\"\n");
+ errorExit(-1);
+ }
+
+ if(tr->constraintTree)
+ {
+ if(!seedSet && processID == 0)
+ {
+ printf("\nError, you must specify a random number seed via \"-p\" when using a constraint\n");
+ printf("tree via \"-g\" \n");
+ errorExit(-1);
+ }
+ }
+
+ if(!byteFileSet)
+ {
+ if(processID == 0)
+ printf("\nError, you must specify a binary format data file with the \"-s\" option\n");
+ errorExit(-1);
+ }
+
+ if(!modelSet)
+ {
+ if(processID == 0)
+ printf("\nError, you must specify a model of rate heterogeneity with the \"-m\" option\n");
+ errorExit(-1);
+ }
+
+ if(!nameSet)
+ {
+ if(processID == 0)
+ printf("\nError: please specify a name for this run with -n\n");
+ errorExit(-1);
+ }
+
+ if(!treeSet && !adef->useCheckpoint)
+ {
+ if(processID == 0)
+ {
+ printf("\nError: please either specify a starting tree for this run with -t\n");
+ printf("or re-start the run from a checkpoint with -R\n");
+ }
+
+ errorExit(-1);
+ }
+
+ {
+
+ const
+ char *separator = "/";
+
+ if(resultDirSet)
+ {
+ char
+ dir[1024] = "";
+
+
+ if(resultDir[0] != separator[0])
+ strcat(dir, separator);
+
+ strcat(dir, resultDir);
+
+ if(dir[strlen(dir) - 1] != separator[0])
+ strcat(dir, separator);
+ strcpy(workdir, dir);
+ }
+ else
+ {
+ char
+ dir[1024] = "",
+ *result = getcwd(dir, sizeof(dir));
+
+ assert(result != (char*)NULL);
+
+ if(dir[strlen(dir) - 1] != separator[0])
+ strcat(dir, separator);
+
+ strcpy(workdir, dir);
+ }
+ }
+
+ return;
+}
+
+
+
+
+void errorExit(int e)
+{
+ MPI_Finalize();
+
+ exit(e);
+}
+
+
+
+static void makeFileNames(void)
+{
+ int
+ infoFileExists = 0;
+
+ strcpy(resultFileName, workdir);
+ strcpy(logFileName, workdir);
+ strcpy(infoFileName, workdir);
+ strcpy(treeFileName, workdir);
+ strcpy(binaryCheckpointName, workdir);
+ strcpy(modelFileName, workdir);
+ strcpy(quartetFileName, workdir);
+
+ strcat(resultFileName, "ExaML_result.");
+ strcat(logFileName, "ExaML_log.");
+ strcat(infoFileName, "ExaML_info.");
+ strcat(binaryCheckpointName, "ExaML_binaryCheckpoint.");
+ strcat(modelFileName, "ExaML_modelFile.");
+ strcat(treeFileName, "ExaML_TreeFile.");
+ strcat(quartetFileName, "ExaML_quartets.");
+
+ strcat(resultFileName, run_id);
+ strcat(logFileName, run_id);
+ strcat(infoFileName, run_id);
+ strcat(binaryCheckpointName, run_id);
+ strcat(modelFileName, run_id);
+ strcat(treeFileName, run_id);
+ strcat(quartetFileName, run_id);
+
+ infoFileExists = filexists(infoFileName);
+
+ if(infoFileExists)
+ {
+ if(processID == 0)
+ {
+ printf("ExaML output files with the run ID <%s> already exist \n", run_id);
+ printf("in directory %s ...... exiting\n", workdir);
+ }
+
+ errorExit(-1);
+ }
+}
+
+
+
+
+
+
+
+
+
+/***********************reading and initializing input ******************/
+
+
+/********************PRINTING various INFO **************************************/
+
+
+static void printModelAndProgramInfo(tree *tr, analdef *adef, int argc, char *argv[])
+{
+ if(processID == 0)
+ {
+ int i, model;
+ FILE *infoFile = myfopen(infoFileName, "ab");
+ char modelType[128];
+
+
+ if(tr->useMedian)
+ strcpy(modelType, "GAMMA with Median");
+ else
+ strcpy(modelType, "GAMMA");
+
+ printBoth(infoFile, "\n\nThis is %s version %s released by Alexandros Stamatakis, Andre Aberer, and Alexey Kozlov in %s.\n\n", programName, programVersion, programDate);
+
+ printBoth(infoFile, "\nAlignment has %zu distinct alignment patterns\n\n", tr->originalCrunchedLength);
+
+ printBoth(infoFile, "Proportion of gaps and completely undetermined characters in this alignment: %3.2f%s\n", 100.0 * tr->gapyness, "%");
+
+ switch(adef->mode)
+ {
+ case BIG_RAPID_MODE:
+ printBoth(infoFile, "\nExaML rapid hill-climbing mode\n\n");
+ break;
+ case TREE_EVALUATION:
+ printBoth(infoFile, "\nExaML %s tree evaluation mode\n\n", (tr->fastTreeEvaluation)?"fast":"slow");
+ break;
+ case QUARTET_CALCULATION:
+ printBoth(infoFile, "\nExaML quartet evaluation mode\n\n");
+ break;
+ default:
+ assert(0);
+ }
+
+ if(adef->perGeneBranchLengths)
+ printBoth(infoFile, "Using %d distinct models/data partitions with individual per partition branch length optimization\n\n\n", tr->NumberOfModels);
+ else
+ printBoth(infoFile, "Using %d distinct models/data partitions with joint branch length optimization\n\n\n", tr->NumberOfModels);
+
+ printBoth(infoFile, "All free model parameters will be estimated by ExaML\n");
+
+ if(tr->rateHetModel == GAMMA || tr->rateHetModel == GAMMA_I)
+ printBoth(infoFile, "%s model of rate heteorgeneity, ML estimate of alpha-parameter\n\n", modelType);
+ else
+ {
+ printBoth(infoFile, "ML estimate of %d per site rate categories\n\n", tr->categories);
+
+ }
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ printBoth(infoFile, "Partition: %d\n", model);
+ printBoth(infoFile, "Alignment Patterns: %d\n", tr->partitionData[model].upper - tr->partitionData[model].lower);
+ printBoth(infoFile, "Name: %s\n", tr->partitionData[model].partitionName);
+
+ switch(tr->partitionData[model].dataType)
+ {
+ case DNA_DATA:
+ printBoth(infoFile, "DataType: DNA\n");
+ printBoth(infoFile, "Substitution Matrix: GTR\n");
+ if(tr->partitionData[model].optimizeBaseFrequencies)
+ printBoth(infoFile, "ML optimization of base frequencies\n");
+ break;
+ case AA_DATA:
+ assert(tr->partitionData[model].protModels >= 0 && tr->partitionData[model].protModels < NUM_PROT_MODELS);
+ printBoth(infoFile, "DataType: AA\n");
+ printBoth(infoFile, "Substitution Matrix: %s\n", protModels[tr->partitionData[model].protModels]);
+ if(!tr->partitionData[model].optimizeBaseFrequencies)
+ printBoth(infoFile, "Using %s Base Frequencies\n", (tr->partitionData[model].protFreqs == 1)?"empirical":"fixed");
+ else
+ printBoth(infoFile, "ML optimization of base frequencies\n");
+ break;
+ case BINARY_DATA:
+ printBoth(infoFile, "DataType: BINARY/MORPHOLOGICAL\n");
+ printBoth(infoFile, "Substitution Matrix: Uncorrected\n");
+ break;
+
+ /*
+ case SECONDARY_DATA:
+ printBoth(infoFile, "DataType: SECONDARY STRUCTURE\n");
+ printBoth(infoFile, "Substitution Matrix: %s\n", secondaryModelList[tr->secondaryStructureModel]);
+ break;
+ case SECONDARY_DATA_6:
+ printBoth(infoFile, "DataType: SECONDARY STRUCTURE 6 STATE\n");
+ printBoth(infoFile, "Substitution Matrix: %s\n", secondaryModelList[tr->secondaryStructureModel]);
+ break;
+ case SECONDARY_DATA_7:
+ printBoth(infoFile, "DataType: SECONDARY STRUCTURE 7 STATE\n");
+ printBoth(infoFile, "Substitution Matrix: %s\n", secondaryModelList[tr->secondaryStructureModel]);
+ break;
+ case GENERIC_32:
+ printBoth(infoFile, "DataType: Multi-State with %d distinct states in use (maximum 32)\n",tr->partitionData[model].states);
+ switch(tr->multiStateModel)
+ {
+ case ORDERED_MULTI_STATE:
+ printBoth(infoFile, "Substitution Matrix: Ordered Likelihood\n");
+ break;
+ case MK_MULTI_STATE:
+ printBoth(infoFile, "Substitution Matrix: MK model\n");
+ break;
+ case GTR_MULTI_STATE:
+ printBoth(infoFile, "Substitution Matrix: GTR\n");
+ break;
+ default:
+ assert(0);
+ }
+ break;
+ case GENERIC_64:
+ printBoth(infoFile, "DataType: Codon\n");
+ break;
+ */
+ default:
+ assert(0);
+ }
+ printBoth(infoFile, "\n\n\n");
+ }
+
+ printBoth(infoFile, "\n");
+
+ printBoth(infoFile, "ExaML was called as follows:\n\n");
+ for(i = 0; i < argc; i++)
+ printBoth(infoFile,"%s ", argv[i]);
+ printBoth(infoFile,"\n\n\n");
+
+ fclose(infoFile);
+ }
+}
+
+void printResult(tree *tr, analdef *adef, boolean finalPrint)
+{
+ if(processID == 0)
+ {
+ FILE *logFile;
+ char temporaryFileName[1024] = "";
+
+ strcpy(temporaryFileName, resultFileName);
+
+ switch(adef->mode)
+ {
+ case TREE_EVALUATION:
+ Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, finalPrint, SUMMARIZE_LH, FALSE, FALSE);
+
+ logFile = myfopen(temporaryFileName, "wb");
+ fprintf(logFile, "%s", tr->tree_string);
+ fclose(logFile);
+
+ if(adef->perGeneBranchLengths)
+ printTreePerGene(tr, adef, temporaryFileName, "wb");
+ break;
+ case BIG_RAPID_MODE:
+ if(finalPrint)
+ {
+ switch(tr->rateHetModel)
+ {
+ case GAMMA:
+ case GAMMA_I:
+ Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, finalPrint,
+ SUMMARIZE_LH, FALSE, FALSE);
+
+ logFile = myfopen(temporaryFileName, "wb");
+ fprintf(logFile, "%s", tr->tree_string);
+ fclose(logFile);
+
+ if(adef->perGeneBranchLengths)
+ printTreePerGene(tr, adef, temporaryFileName, "wb");
+ break;
+ case CAT:
+ /*Tree2String(tr->tree_string, tr, tr->start->back, FALSE, TRUE, FALSE, FALSE, finalPrint, adef,
+ NO_BRANCHES, FALSE, FALSE);*/
+
+
+
+ Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE,
+ TRUE, SUMMARIZE_LH, FALSE, FALSE);
+
+
+
+
+ logFile = myfopen(temporaryFileName, "wb");
+ fprintf(logFile, "%s", tr->tree_string);
+ fclose(logFile);
+
+ break;
+ default:
+ assert(0);
+ }
+ }
+ else
+ {
+ Tree2String(tr->tree_string, tr, tr->start->back, FALSE, TRUE, FALSE, FALSE, finalPrint,
+ NO_BRANCHES, FALSE, FALSE);
+ logFile = myfopen(temporaryFileName, "wb");
+ fprintf(logFile, "%s", tr->tree_string);
+ fclose(logFile);
+ }
+ break;
+ default:
+ printf("FATAL ERROR call to printResult from undefined STATE %d\n", adef->mode);
+ exit(-1);
+ break;
+ }
+ }
+}
+
+
+
+
+
+
+
+
+void printLog(tree *tr)
+{
+ if(processID == 0)
+ {
+ FILE *logFile;
+ double t;
+
+ t = gettime() - masterTime;
+
+ logFile = myfopen(logFileName, "ab");
+
+ /* printf("%f %1.40f\n", t, tr->likelihood); */
+
+ fprintf(logFile, "%f %f\n", t, tr->likelihood);
+
+ fclose(logFile);
+ }
+
+}
+
+
+
+
+
+
+
+
+
+void getDataTypeString(tree *tr, int model, char typeOfData[1024])
+{
+ switch(tr->partitionData[model].dataType)
+ {
+ case AA_DATA:
+ strcpy(typeOfData,"AA");
+ break;
+ case DNA_DATA:
+ strcpy(typeOfData,"DNA");
+ break;
+ case BINARY_DATA:
+ strcpy(typeOfData,"BINARY/MORPHOLOGICAL");
+ break;
+ case SECONDARY_DATA:
+ strcpy(typeOfData,"SECONDARY 16 STATE MODEL USING ");
+ strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+ break;
+ case SECONDARY_DATA_6:
+ strcpy(typeOfData,"SECONDARY 6 STATE MODEL USING ");
+ strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+ break;
+ case SECONDARY_DATA_7:
+ strcpy(typeOfData,"SECONDARY 7 STATE MODEL USING ");
+ strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+ break;
+ case GENERIC_32:
+ strcpy(typeOfData,"Multi-State");
+ break;
+ case GENERIC_64:
+ strcpy(typeOfData,"Codon");
+ break;
+ default:
+ assert(0);
+ }
+}
+static void printRatesDNA_BIN(int n, double *r, char **names, char *fileName)
+{
+ int i, j, c;
+
+ for(i = 0, c = 0; i < n; i++)
+ {
+ for(j = i + 1; j < n; j++)
+ {
+ if(i == n - 2 && j == n - 1)
+ printBothOpenDifferentFile(fileName, "rate %s <-> %s: %f\n", names[i], names[j], 1.0);
+ else
+ printBothOpenDifferentFile(fileName, "rate %s <-> %s: %f\n", names[i], names[j], r[c]);
+ c++;
+ }
+ }
+}
+
+static void printRatesRest(int n, double *r, char **names, char *fileName)
+{
+ int i, j, c;
+
+ for(i = 0, c = 0; i < n; i++)
+ {
+ for(j = i + 1; j < n; j++)
+ {
+ printBothOpenDifferentFile(fileName, "rate %s <-> %s: %f\n", names[i], names[j], r[c]);
+ c++;
+ }
+ }
+}
+static double branchLength(int model, double *z, tree *tr)
+{
+ double x;
+
+ x = z[model];
+ assert(x > 0);
+ if (x < zmin)
+ x = zmin;
+
+
+ assert(x <= zmax);
+
+ x = -log(x);
+
+ return x;
+}
+
+
+static double treeLengthRec(nodeptr p, tree *tr, int model)
+{
+ double
+ x = branchLength(model, p->z, tr);
+
+ if(isTip(p->number, tr->mxtips))
+ return x;
+ else
+ {
+ double acc = 0;
+ nodeptr q;
+
+ q = p->next;
+
+ while(q != p)
+ {
+ acc += treeLengthRec(q->back, tr, model);
+ q = q->next;
+ }
+
+ return acc + x;
+ }
+}
+
+static double treeLength(tree *tr, int model)
+{
+ return treeLengthRec(tr->start->back, tr, model);
+}
+
+static void printFreqs(int n, double *f, char **names, char *fileName)
+{
+ int k;
+
+ for(k = 0; k < n; k++)
+ printBothOpenDifferentFile(fileName, "freq pi(%s): %f\n", names[k], f[k]);
+}
+
+static void printModelParams(tree *tr, analdef *adef, int treeIteration)
+{
+ int
+ model;
+
+ double
+ *f = (double*)NULL,
+ *r = (double*)NULL;
+
+ char
+ fileName[2048],
+ buf[64];
+
+
+ strcpy(fileName, modelFileName);
+
+ if(treeIteration >= 0)
+ {
+ strcat(fileName, ".");
+ sprintf(buf, "%d", treeIteration);
+ strcat(fileName, buf);
+ }
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ double tl;
+ char typeOfData[1024];
+
+ getDataTypeString(tr, model, typeOfData);
+
+ printBothOpenDifferentFile(fileName, "\n\n");
+
+ printBothOpenDifferentFile(fileName, "Model Parameters of Partition %d, Name: %s, Type of Data: %s\n",
+ model, tr->partitionData[model].partitionName, typeOfData);
+
+ if(tr->rateHetModel == GAMMA)
+ printBothOpenDifferentFile(fileName, "alpha: %f\n", tr->partitionData[model].alpha);
+
+
+ if(adef->perGeneBranchLengths)
+ tl = treeLength(tr, model);
+ else
+ tl = treeLength(tr, 0);
+
+ printBothOpenDifferentFile(fileName, "Tree-Length: %f\n", tl);
+
+ f = tr->partitionData[model].frequencies;
+ r = tr->partitionData[model].substRates;
+
+ switch(tr->partitionData[model].dataType)
+ {
+ case AA_DATA:
+ {
+ char *freqNames[20] = {"A", "R", "N ","D", "C", "Q", "E", "G",
+ "H", "I", "L", "K", "M", "F", "P", "S",
+ "T", "W", "Y", "V"};
+
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+ {
+ int
+ k;
+
+ for(k = 0; k < 4; k++)
+ {
+ printBothOpenDifferentFile(fileName, "LGM %d\n", k);
+ printRatesRest(20, tr->partitionData[model].substRates_LG4[k], freqNames, fileName);
+ printBothOpenDifferentFile(fileName, "\n");
+ printFreqs(20, tr->partitionData[model].frequencies_LG4[k], freqNames, fileName);
+ }
+ }
+
+ printRatesRest(20, r, freqNames, fileName);
+ printBothOpenDifferentFile(fileName, "\n");
+ printFreqs(20, f, freqNames, fileName);
+ }
+ break;
+ case DNA_DATA:
+ {
+ char *freqNames[4] = {"A", "C", "G", "T"};
+
+ printRatesDNA_BIN(4, r, freqNames, fileName);
+ printBothOpenDifferentFile(fileName, "\n");
+ printFreqs(4, f, freqNames, fileName);
+ }
+ break;
+ case BINARY_DATA:
+ {
+ char *freqNames[2] = {"0", "1"};
+
+ printRatesDNA_BIN(2, r, freqNames, fileName);
+ printBothOpenDifferentFile(fileName, "\n");
+ printFreqs(2, f, freqNames, fileName);
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ printBothOpenDifferentFile(fileName, "\n");
+ }
+
+ printBothOpenDifferentFile(fileName, "\n");
+}
+
+
+static void finalizeInfoFile(tree *tr, analdef *adef)
+{
+ if(processID == 0)
+ {
+ double t;
+
+ t = gettime() - masterTime;
+ accumulatedTime = accumulatedTime + t;
+
+ switch(adef->mode)
+ {
+ case BIG_RAPID_MODE:
+ printBothOpen("\n\nOverall Time for 1 Inference %f\n", t);
+ printBothOpen("\nOverall accumulated Time (in case of restarts): %f\n\n", accumulatedTime);
+ printBothOpen("Likelihood : %f\n", tr->likelihood);
+ printBothOpen("\n\n");
+ printBothOpen("Model parameters written to: %s\n", modelFileName);
+ printBothOpen("Final tree written to: %s\n", resultFileName);
+ printBothOpen("Execution Log File written to: %s\n", logFileName);
+ printBothOpen("Execution information file written to: %s\n",infoFileName);
+ break;
+ case TREE_EVALUATION:
+ printBothOpen("\n\nOverall Time for evaluating the likelihood of %d trees: %f secs\n\n", tr->numberOfTrees, t);
+ printBothOpen("\n\nThe model parameters of the trees have been written to files called %s.i\n", modelFileName);
+ printBothOpen("where i is the number of the tree\n\n");
+ printBothOpen("Note that, in case of a restart from a checkpoint, some tree model files will have been produced by previous runs!\n\n");
+ printBothOpen("The trees with branch lengths have been written to file: %s\n", treeFileName);
+ printBothOpen("They are in the same order as in the input file!\n\n");
+ break;
+ case QUARTET_CALCULATION:
+ printBothOpen("\n\nOverall quartet computation time: %f secs\n", t);
+ printBothOpen("\nAll quartets and corresponding likelihoods written to file %s\n", quartetFileName);
+ break;
+ default:
+ assert(0);
+ }
+
+
+ }
+
+}
+
+
+/************************************************************************************/
+
+
+static int iterated_bitcount(unsigned int n)
+{
+ int
+ count=0;
+
+ while(n)
+ {
+ count += n & 0x1u ;
+ n >>= 1 ;
+ }
+
+ return count;
+}
+
+/*static char bits_in_16bits [0x1u << 16];*/
+
+static void compute_bits_in_16bits(char *bits_in_16bits)
+{
+ unsigned int i;
+
+ /* size is 65536 */
+
+ for (i = 0; i < (0x1u<<16); i++)
+ bits_in_16bits[i] = iterated_bitcount(i);
+
+ return ;
+}
+
+unsigned int precomputed16_bitcount (unsigned int n, char *bits_in_16bits)
+{
+ /* works only for 32-bit int*/
+
+ return bits_in_16bits [n & 0xffffu]
+ + bits_in_16bits [(n >> 16) & 0xffffu] ;
+}
+
+
+static void clean_MPI_Exit(void)
+{
+ MPI_Barrier(MPI_COMM_WORLD);
+ MPI_Finalize();
+}
+
+static void error_MPI_Exit(void)
+{
+ MPI_Barrier(MPI_COMM_WORLD);
+ MPI_Finalize();
+
+ exit(1);
+}
+
+
+static void initializePartitions(tree *tr)
+{
+ size_t
+ i,
+ len,
+ j,
+ width;
+
+ int
+ model,
+ maxCategories;
+
+ compute_bits_in_16bits(tr->bits_in_16bits);
+
+ maxCategories = tr->maxCategories;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ const partitionLengths
+ *pl = getPartitionLengths(&(tr->partitionData[model]));
+
+ //must already be set as a consequence of alloc in function readPartitions
+ //and the subsequent copy of bf->partitions into tr->partitions!
+ assert(tr->partitionData[model].partitionName != (char*)NULL);
+
+ //printf("Partition name %s\n", tr->partitionData[model].partitionName);
+
+ width = tr->partitionData[model].width;
+
+ /*
+ globalScaler needs to be 2 * tr->mxtips such that scalers of inner AND tip nodes can be added without a case switch
+ to this end, it must also be initialized with zeros -> calloc
+ */
+
+ len = 2 * tr->mxtips;
+ tr->partitionData[model].globalScaler = (unsigned int *)calloc(len, sizeof(unsigned int));
+
+#ifdef _USE_OMP
+ tr->partitionData[model].threadGlobalScaler = (unsigned int**) calloc(tr->nThreads, sizeof(unsigned int*));
+
+ tr->partitionData[model].reductionBuffer = (double*) calloc(tr->nThreads, sizeof(double));
+ tr->partitionData[model].reductionBuffer2 = (double*) calloc(tr->nThreads, sizeof(double));
+
+ int
+ t;
+
+ for (t = 0; t < tr->maxThreadsPerModel; ++t)
+ {
+ Assign*
+ pAss = tr->partThreadAssigns[model * tr->maxThreadsPerModel + t];
+
+ if (pAss)
+ {
+ int
+ tid = pAss->procId;
+
+ tr->partitionData[model].threadGlobalScaler[tid] = (unsigned int *)calloc(len, sizeof(unsigned int));
+ }
+ }
+#endif
+
+ tr->partitionData[model].left = (double *)malloc_aligned(pl->leftLength * (maxCategories + 1) * sizeof(double));
+ tr->partitionData[model].right = (double *)malloc_aligned(pl->rightLength * (maxCategories + 1) * sizeof(double));
+ tr->partitionData[model].EIGN = (double*)malloc(pl->eignLength * sizeof(double));
+ tr->partitionData[model].EV = (double*)malloc_aligned(pl->evLength * sizeof(double));
+ tr->partitionData[model].EI = (double*)malloc(pl->eiLength * sizeof(double));
+
+ tr->partitionData[model].substRates = (double *)malloc(pl->substRatesLength * sizeof(double));
+
+
+ //must already be set as a consequence of alloc in function readPartitions
+ //and the subsequent copy of bf->partitions into tr->partitions!
+ assert(tr->partitionData[model].frequencies != (double*)NULL);
+ //tr->partitionData[model].frequencies = (double*)malloc(pl->frequenciesLength * sizeof(double));
+
+
+
+ tr->partitionData[model].freqExponents = (double*)malloc(pl->frequenciesLength * sizeof(double));
+ tr->partitionData[model].empiricalFrequencies = (double*)malloc(pl->frequenciesLength * sizeof(double));
+ tr->partitionData[model].tipVector = (double *)malloc_aligned(pl->tipVectorLength * sizeof(double));
+
+
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+ {
+ int
+ k;
+
+ for(k = 0; k < 4; k++)
+ {
+ tr->partitionData[model].rawEIGN_LG4[k] = (double*)malloc(pl->eignLength * sizeof(double));
+ tr->partitionData[model].EIGN_LG4[k] = (double*)malloc(pl->eignLength * sizeof(double));
+ tr->partitionData[model].EV_LG4[k] = (double*)malloc_aligned(pl->evLength * sizeof(double));
+ tr->partitionData[model].EI_LG4[k] = (double*)malloc(pl->eiLength * sizeof(double));
+ tr->partitionData[model].substRates_LG4[k] = (double *)malloc(pl->substRatesLength * sizeof(double));
+ tr->partitionData[model].frequencies_LG4[k] = (double*)malloc(pl->frequenciesLength * sizeof(double));
+ tr->partitionData[model].tipVector_LG4[k] = (double *)malloc_aligned(pl->tipVectorLength * sizeof(double));
+ }
+ }
+
+
+ tr->partitionData[model].symmetryVector = (int *)malloc(pl->symmetryVectorLength * sizeof(int));
+ tr->partitionData[model].frequencyGrouping = (int *)malloc(pl->frequencyGroupingLength * sizeof(int));
+
+ tr->partitionData[model].perSiteRates = (double *)malloc(sizeof(double) * tr->maxCategories);
+
+ // tr->partitionData[model].nonGTR = FALSE;
+ // tr->partitionData[model].optimizeBaseFrequencies = FALSE;
+
+
+ //tr->partitionData[model].gammaRates = (double*)malloc(sizeof(double) * 4);
+
+ tr->partitionData[model].xVector = (double **)malloc(sizeof(double*) * tr->mxtips);
+
+ for(j = 0; j < (size_t)tr->mxtips; j++)
+ tr->partitionData[model].xVector[j] = (double*)NULL;
+
+ tr->partitionData[model].xSpaceVector = (size_t *)calloc(tr->mxtips, sizeof(size_t));
+
+#ifdef __MIC_NATIVE
+ tr->partitionData[model].mic_EV = (double*)malloc_aligned(4 * pl->evLength * sizeof(double));
+ tr->partitionData[model].mic_tipVector = (double*)malloc_aligned(4 * pl->tipVectorLength * sizeof(double));
+ tr->partitionData[model].mic_umpLeft = (double*)malloc_aligned(4 * pl->tipVectorLength * sizeof(double));
+ tr->partitionData[model].mic_umpRight = (double*)malloc_aligned(4 * pl->tipVectorLength * sizeof(double));
+
+ /* for Xeon Phi, sumBuffer must be padded to the multiple of 8 (because of site blocking in kernels) */
+ const int padded_width = GET_PADDED_WIDTH(width);
+ const int span = (size_t)(tr->partitionData[model].states) *
+ discreteRateCategories(tr->rateHetModel);
+
+ tr->partitionData[model].sumBuffer = (double *)malloc_aligned(padded_width *
+ span * sizeof(double));
+
+ /* fill padding entries with 1. (will be corrected for with zero site weights in wgt) */
+ {
+ int k;
+ for (k = width*span; k < padded_width*span; ++k)
+ tr->partitionData[model].sumBuffer[k] = 1.;
+ }
+#else
+ tr->partitionData[model].sumBuffer = (double *)malloc_aligned(width *
+ (size_t)(tr->partitionData[model].states) *
+ discreteRateCategories(tr->rateHetModel) *
+ sizeof(double));
+#endif
+
+ /* tr->partitionData[model].wgt = (int *)malloc_aligned(width * sizeof(int)); */
+
+ /* rateCategory must be assigned using calloc() at start up there is only one rate category 0 for all sites */
+
+ if(width > 0 && tr->saveMemory)
+ {
+ tr->partitionData[model].gapVectorLength = ((int)width / 32) + 1;
+
+ len = tr->partitionData[model].gapVectorLength * 2 * tr->mxtips;
+ tr->partitionData[model].gapVector = (unsigned int*)calloc(len, sizeof(unsigned int));
+
+ tr->partitionData[model].gapColumn = (double *)malloc_aligned(((size_t)tr->mxtips) *
+ ((size_t)(tr->partitionData[model].states)) *
+ discreteRateCategories(tr->rateHetModel) * sizeof(double));
+ }
+ else
+ {
+ tr->partitionData[model].gapVectorLength = 0;
+
+ tr->partitionData[model].gapVector = (unsigned int*)NULL;
+
+ tr->partitionData[model].gapColumn = (double*)NULL;
+ }
+ }
+
+
+ /* set up the averaged frac changes per partition such that no further reading accesses to aliaswgt are necessary
+ and we can free the array for the GAMMA model */
+
+ {
+ /* definitions:
+ sizeof(short) <= sizeof(int) <= sizeof(long)
+ size_t defined by address space (here: 64 bit).
+
+ size_t + MPI is a bad idea: in the mpi2.2 standard, they do
+ not mention it once.
+ */
+
+ unsigned long
+ *modelWeights = (unsigned long*) calloc(tr->NumberOfModels, sizeof(unsigned long));
+
+ size_t
+ wgtsum = 0;
+
+ /* determine my weights per partition */
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ const pInfo
+ partition = tr->partitionData[model] ;
+
+ size_t
+ i = 0;
+
+ for(i = 0; i < partition.width; ++i)
+ modelWeights[model] += (long) partition.wgt[i];
+ }
+ MPI_Allreduce(MPI_IN_PLACE, modelWeights, tr->NumberOfModels, MPI_UNSIGNED_LONG, MPI_SUM, MPI_COMM_WORLD);
+
+ /* determine sum */
+ for(model = 0; model < tr->NumberOfModels; ++model)
+ wgtsum += modelWeights[model];
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ tr->partitionWeights[model] = (double)modelWeights[model];
+ tr->partitionContributions[model] = ((double)modelWeights[model]) / ((double)wgtsum);
+ }
+
+ free(modelWeights);
+ }
+
+ /* initialize gap bit vectors at tips when memory saving option is enabled */
+
+ if(tr->saveMemory)
+ {
+ for(model = 0; model <tr->NumberOfModels; model++)
+ {
+ int
+ undetermined = getUndetermined(tr->partitionData[model].dataType);
+
+ width = tr->partitionData[model].width;
+
+ if(width > 0)
+ {
+ for(j = 1; j <= (size_t)(tr->mxtips); j++)
+ for(i = 0; i < width; i++)
+ if(tr->partitionData[model].yVector[j][i] == undetermined)
+ tr->partitionData[model].gapVector[tr->partitionData[model].gapVectorLength * j + i / 32] |= mask32[i % 32];
+ }
+ }
+ }
+}
+
+
+
+static void initializeTree(tree *tr, analdef *adef)
+{
+ size_t
+ i ;
+
+ if(adef->perGeneBranchLengths)
+ tr->numBranches = tr->NumberOfModels;
+ else
+ tr->numBranches = 1;
+
+
+ if(NUM_BRANCHES < tr->numBranches)
+ {
+ if(processID == 0 )
+ printf("You have specified per-partition branch lengths (-M option) with %d models. \n\
+Please set #define NUM_BRANCHES in axml.h to %d (or higher) and recompile %s\n",
+ tr->NumberOfModels,tr->NumberOfModels, programName );
+ error_MPI_Exit();
+ }
+
+
+ /* If we use the RF-based convergence criterion we will need to allocate some hash tables.
+ let's not worry about this right now, because it is indeed ExaML-specific */
+
+ tr->executeModel = (boolean *)calloc( tr->NumberOfModels, sizeof(boolean));
+
+ for(i = 0; i < (size_t)tr->NumberOfModels; i++)
+ tr->executeModel[i] = TRUE;
+
+ setupTree(tr);
+
+ if(tr->searchConvergenceCriterion && processID == 0)
+ {
+ tr->bitVectors = initBitVector(tr->mxtips, &(tr->vLength));
+ tr->h = initHashTable(tr->mxtips * 4);
+ }
+
+ for(i = 1; i <= (size_t)tr->mxtips; i++)
+ addword(tr->nameList[i], tr->nameHash, i);
+
+ initializePartitions(tr);
+
+ initModel(tr);
+}
+
+
+static int getNumberOfTrees(char *fileName, boolean getOffsets, exa_off_t *treeOffsets)
+{
+ FILE
+ *f = myfopen(fileName, "r");
+
+ int
+ trees = 0,
+ ch;
+
+ if(getOffsets)
+ treeOffsets[trees] = 0;
+
+ while((ch = fgetc(f)) != EOF)
+ {
+ if(ch == ';')
+ {
+ trees++;
+ if(getOffsets)
+ treeOffsets[trees] = exa_ftell(f) + 1;
+ }
+ }
+
+ assert(trees > 0);
+
+ fclose(f);
+
+ return trees;
+}
+
+static void optimizeTrees(tree *tr, analdef *adef)
+{
+ exa_off_t
+ *treeOffsets;
+
+ int
+ i;
+
+ tr->numberOfTrees = getNumberOfTrees(tree_file, FALSE, (exa_off_t *)NULL);
+
+ if(processID == 0)
+ accumulatedTime = 0.0;
+
+ treeOffsets = (exa_off_t *)malloc(sizeof(exa_off_t) * (tr->numberOfTrees + 1));
+
+ tr->likelihoods = (double *)malloc(sizeof(double) * tr->numberOfTrees);
+ tr->treeStrings = (char *)malloc(sizeof(char) * (size_t)tr->treeStringLength * (size_t)tr->numberOfTrees);
+
+ getNumberOfTrees(tree_file, TRUE, treeOffsets);
+
+ if(processID == 0)
+ printBothOpen("\n\nFound %d trees to evaluate\n\n", tr->numberOfTrees);
+
+ i = 0;
+
+ if(adef->useCheckpoint)
+ {
+ restart(tr, adef);
+
+ i = ckp.treeIteration;
+
+ if(tr->fastTreeEvaluation && i > 0)
+ treeEvaluate(tr, 2);
+ else
+ modOpt(tr, 0.1, adef, i);
+
+ tr->likelihoods[i] = tr->likelihood;
+ Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, FALSE, SUMMARIZE_LH, FALSE, FALSE);
+ memcpy(&(tr->treeStrings[(size_t)tr->treeStringLength * (size_t)i]), tr->tree_string, sizeof(char) * tr->treeStringLength);
+
+
+ if(processID == 0)
+ printModelParams(tr, adef, i);
+
+ i++;
+ }
+
+ for(; i < tr->numberOfTrees; i++)
+ {
+ FILE
+ *treeFile = myfopen(tree_file, "rb");
+
+ if(exa_fseek(treeFile, treeOffsets[i], SEEK_SET) != 0)
+ assert(0);
+
+ tr->likelihood = unlikely;
+
+ treeReadLen(treeFile, tr, FALSE, FALSE, FALSE);
+
+ fclose(treeFile);
+
+ tr->start = tr->nodep[1];
+
+ if(i > 0)
+ resetBranches(tr);
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ if(tr->fastTreeEvaluation && i > 0)
+ {
+ ckp.state = MOD_OPT;
+
+ ckp.treeIteration = i;
+
+ writeCheckpoint(tr, adef);
+
+ treeEvaluate(tr, 2);
+ }
+ else
+ {
+ treeEvaluate(tr, 1);
+ modOpt(tr, 0.1, adef, i);
+ }
+
+ tr->likelihoods[i] = tr->likelihood;
+ Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, FALSE, SUMMARIZE_LH, FALSE, FALSE);
+ memcpy(&(tr->treeStrings[(size_t)tr->treeStringLength * (size_t)i]), tr->tree_string, sizeof(char) * tr->treeStringLength);
+
+ if(processID == 0)
+ printModelParams(tr, adef, i);
+ }
+
+ if(processID == 0)
+ {
+ FILE
+ *f = myfopen(treeFileName, "w");
+
+ for(i = 0; i < tr->numberOfTrees; i++)
+ {
+ printBothOpen("Likelihood tree %d: %f \n", i, tr->likelihoods[i]);
+ fprintf(f, "%s", &(tr->treeStrings[(size_t)tr->treeStringLength * (size_t)i]));
+ }
+
+ fclose(f);
+ }
+}
+
+
+
+
+
+
+
+
+
+
+static void readByteFile (tree *tr, int commRank, int commSize )
+{
+ /* read stuff that is cheap; do not change the order! */
+ ByteFile
+ *bFile = NULL;
+
+ initializeByteFile(&bFile, byteFileName);
+ readHeader(bFile);
+ readTaxa(bFile);
+ readPartitions(bFile);
+
+ /* calculate optimal distribution of data */
+ PartitionAssignment
+ *pAss = NULL;
+
+ initializePartitionAssignment(&pAss, bFile->partitions, bFile->numPartitions, commSize);
+ assign(pAss);
+
+ if(commRank == 0 )
+ {
+ printf("\n");
+ printAssignments(pAss);
+ printf("\n");
+ printLoad(pAss);
+ printf("\n");
+ }
+
+ /* now the data of this process is in this struct */
+ readMyData(bFile,pAss, commRank );
+
+ /* carry over the information to the tree */
+ initializeTreeFromByteFile(bFile, tr);
+
+ /* just fills up tr->partAssigns that contains the representation of
+ the assignment that we will need */
+ copyAssignmentInfoToTree(pAss, tr);
+
+ deletePartitionAssignment(pAss);
+ deleteByteFile(bFile);
+}
+
+#ifdef _USE_OMP
+void allocateXVectors(tree* tr)
+{
+ nodeptr
+ p = tr->start,
+ q = p->back;
+
+ tr->td[0].ti[0].pNumber = p->number;
+ tr->td[0].ti[0].qNumber = q->number;
+
+ tr->td[0].count = 1;
+
+ computeTraversalInfo(q, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, FALSE);
+
+ traversalInfo
+ *ti = tr->td[0].ti;
+
+ int
+ i,
+ model;
+
+ for(i = 1; i < tr->td[0].count; i++)
+ {
+ traversalInfo *tInfo = &ti[i];
+
+ /* now loop over all partitions for nodes p, q, and r of the current traversal vector entry */
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ /* printf("new view on model %d with width %d\n", model, width); */
+
+ size_t
+ width = (size_t)tr->partitionData[model].width;
+
+ double
+ *x3_start = tr->partitionData[model].xVector[tInfo->pNumber - tr->mxtips - 1];
+
+ size_t
+ rateHet = discreteRateCategories(tr->rateHetModel),
+
+ /* get the number of states in the data stored in partition model */
+
+ states = (size_t)tr->partitionData[model].states,
+
+ /* get the length of the current likelihood array stored at node p. This is
+ important mainly for the SEV-based memory saving option described in here:
+
+ F. Izquierdo-Carrasco, S.A. Smith, A. Stamatakis: "Algorithms, Data Structures, and Numerics for Likelihood-based Phylogenetic Inference of Huge Trees".
+
+ So tr->partitionData[model].xSpaceVector[i] provides the length of the allocated conditional array of partition model
+ and node i
+ */
+
+ availableLength = tr->partitionData[model].xSpaceVector[(tInfo->pNumber - tr->mxtips - 1)],
+ requiredLength = 0;
+
+ /* memory saving stuff, not important right now, but if you are interested ask Fernando */
+
+ if(tr->saveMemory)
+ {
+ size_t
+ j,
+ setBits = 0;
+
+ unsigned int
+ *x1_gap = &(tr->partitionData[model].gapVector[tInfo->qNumber * tr->partitionData[model].gapVectorLength]),
+ *x2_gap = &(tr->partitionData[model].gapVector[tInfo->rNumber * tr->partitionData[model].gapVectorLength]),
+ *x3_gap = &(tr->partitionData[model].gapVector[tInfo->pNumber * tr->partitionData[model].gapVectorLength]);
+
+ for(j = 0; j < (size_t)tr->partitionData[model].gapVectorLength; j++)
+ {
+ x3_gap[j] = x1_gap[j] & x2_gap[j];
+ setBits += (size_t)(precomputed16_bitcount(x3_gap[j], tr->bits_in_16bits));
+ }
+
+ requiredLength = (width - setBits) * rateHet * states * sizeof(double);
+ }
+ else
+ /* if we are not trying to save memory the space required to store an inner likelihood array
+ is the number of sites in the partition times the number of states of the data type in the partition
+ times the number of discrete GAMMA rates (1 for CAT essentially) times 8 bytes */
+ requiredLength = width * rateHet * states * sizeof(double);
+
+ /* Initially, even when not using memory saving no space is allocated for inner likelihood arrats hence
+ availableLength will be zero at the very first time we traverse the tree.
+ Hence we need to allocate something here */
+
+ if(requiredLength != availableLength)
+ {
+ /* if there is a vector of incorrect length assigned here i.e., x3 != NULL we must free
+ it first */
+ if(x3_start)
+ free(x3_start);
+
+ /* allocate memory: note that here we use a byte-boundary aligned malloc, because we need the vectors
+ to be aligned at 16 BYTE (SSE3) or 32 BYTE (AVX) boundaries! */
+
+ x3_start = (double*)malloc_aligned(requiredLength);
+
+ /* update the data structures for consistent bookkeeping */
+ tr->partitionData[model].xVector[tInfo->pNumber - tr->mxtips - 1] = x3_start;
+ tr->partitionData[model].xSpaceVector[(tInfo->pNumber - tr->mxtips - 1)] = requiredLength;
+ }
+ } // for model
+ } // for traversal
+}
+
+void assignPartitionsToThreads(tree *tr, int commRank)
+{
+ pInfo** rankPartitions = (pInfo **)calloc(tr->NumberOfModels, sizeof(pInfo*) );
+ int i;
+ for (i = 0; i < tr->NumberOfModels; ++i)
+ {
+ rankPartitions[i] = (pInfo *)calloc(1, sizeof(pInfo));
+ rankPartitions[i]->lower = 0;
+ rankPartitions[i]->upper = tr->partitionData[i].width;
+ rankPartitions[i]->width = rankPartitions[i]->upper;
+ rankPartitions[i]->states = tr->partitionData[i].states;
+ }
+
+ PartitionAssignment *pAss = NULL;
+ initializePartitionAssignment(&pAss, rankPartitions, tr->NumberOfModels, tr->nThreads);
+
+ /* */
+ for(i = 0; i < pAss->numPartitions; ++i)
+ {
+ Partition
+ *p = pAss->partitions + i;
+ p->width = (int) ceil((float) p->width / (float) VECTOR_PADDING);
+ }
+ assign(pAss);
+
+ /* Align partition sizes to the boundary (needed for site-blocking on the MIC) */
+ int j;
+ for(i = 0; i < pAss->numProc; ++i)
+ {
+ for(j = 0; j < pAss->numAssignPerProc[i] ; ++j)
+ {
+ Assignment *a = &pAss->assignPerProc[i][j];
+ a->offset *= VECTOR_PADDING;
+ a->width *= VECTOR_PADDING;
+
+ /* adjust width of last chunk -> must NOT include padding */
+ size_t realWidth = rankPartitions[a->partId]->width;
+ if (a->offset + a->width > realWidth)
+ a->width = realWidth - a->offset;
+ }
+ }
+
+ printf("Partition assignments to threads: \n");
+ printAssignments(pAss);
+ printf("\n");
+ printLoad(pAss);
+ printf("\n");
+
+ copyThreadAssignmentInfoToTree(pAss, tr);
+
+ deletePartitionAssignment(pAss);
+ for (i = 0; i < tr->NumberOfModels; ++i)
+ free(rankPartitions[i]);
+ free(rankPartitions);
+}
+#endif
+
+
+int main (int argc, char *argv[])
+{
+ MPI_Init(&argc, &argv);
+ MPI_Comm_rank(MPI_COMM_WORLD, &processID);
+ MPI_Comm_size(MPI_COMM_WORLD, &processes);
+ printf("\nThis is ExaML FINE-GRAIN MPI Process Number: %d\n", processID);
+ MPI_Barrier(MPI_COMM_WORLD);
+
+ {
+ tree
+ *tr = (tree*)malloc(sizeof(tree));
+
+ analdef
+ *adef = (analdef*)malloc(sizeof(analdef));
+
+ /*
+ tell the CPU to ignore exceptions generated by denormalized floating point values.
+ If this is not done, depending on the input data, the likelihood functions can exhibit
+ substantial run-time differences for vectors of equal length.
+ */
+
+#if ! (defined(__ppc) || defined(__powerpc__) || defined(PPC))
+ _mm_setcsr( _mm_getcsr() | _MM_FLUSH_ZERO_ON);
+#endif
+
+ /* get the start time */
+
+ masterTime = gettime();
+
+ /* initialize the analysis parameters in struct adef to default values */
+
+ initAdef(adef);
+
+ /* parse command line arguments: this has a side effect on tr struct and adef struct variables */
+
+ get_args(argc, argv, adef, tr);
+
+ /* generate the ExaML output file names and store them in strings */
+
+ makeFileNames();
+
+#ifdef _USE_OMP
+ if(tr->saveMemory)
+ {
+ printBothOpen("\nError: Memory saving option \"-S\" is not supported by the OpenMP version of ExaML!\n\n");
+ error_MPI_Exit();
+ }
+#endif
+
+ readByteFile(tr, processID, processes );
+
+#ifdef _USE_OMP
+ tr->nThreads = omp_get_max_threads();
+ assignPartitionsToThreads(tr, processID);
+#endif
+
+ initializeTree(tr, adef);
+
+ if(processID == 0)
+ {
+ printModelAndProgramInfo(tr, adef, argc, argv);
+ printBothOpen("Memory Saving Option: %s\n", (tr->saveMemory == TRUE)?"ENABLED":"DISABLED");
+ }
+
+ /* do some error checks for the LG4 model and the binary models and the MIC and exit gracefully */
+
+ {
+ int
+ countBinary = 0,
+ countLG4 = 0,
+ model;
+
+#ifdef __MIC_NATIVE
+ if(tr->saveMemory)
+ {
+ printBothOpen("Error: There is no MIC support yet for the memory saving option \"-S\"!\n\n");
+ error_MPI_Exit();
+ }
+
+ if(tr->rateHetModel == CAT)
+ {
+ printBothOpen("Error: There is no MIC support yet for the PSR model!\n\n");
+ error_MPI_Exit();
+ }
+#endif
+
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+ countLG4++;
+ if(tr->partitionData[model].states == 2)
+ countBinary++;
+ }
+
+ if(countLG4 > 0)
+ {
+ if(tr->saveMemory == TRUE)
+ {
+ printBothOpen("Error: the LG4 substitution model does not work in combination with the \"-S\" memory saving flag!\n\n");
+ error_MPI_Exit();
+ }
+
+ if(tr->rateHetModel == CAT)
+ {
+ printBothOpen("Error: the LG4 substitution model does not work for proportion of invariavble sites estimates!\n\n");
+ error_MPI_Exit();
+ }
+ }
+
+ if(countBinary > 0)
+ {
+ if(tr->saveMemory == TRUE)
+ {
+ printBothOpen("Error: Binary data partitions can not be used in combination with the \"-S\" memory saving flag!\n\n");
+ error_MPI_Exit();
+ }
+
+#ifdef __MIC_NATIVE
+ printBothOpen("Error: There is no MIC support yet for binary data partitions!\n\n");
+ error_MPI_Exit();
+#endif
+ }
+ }
+
+ /*
+ this will re-start ExaML exactly where it has left off from a checkpoint file,
+ while checkpointing is important and has to be implemented for the library we should not worry about this right now
+ */
+
+
+
+ switch(adef->mode)
+ {
+ case TREE_EVALUATION:
+ optimizeTrees(tr, adef);
+ break;
+ case BIG_RAPID_MODE:
+ if(adef->useCheckpoint)
+ {
+ /* read checkpoint file */
+ restart(tr, adef);
+
+ /* continue tree search where we left it off */
+ computeBIGRAPID(tr, adef, TRUE);
+
+ /* now print the model parameters to file */
+ if(processID == 0)
+ printModelParams(tr, adef, -1);
+ }
+ else
+ {
+ /* not important, only used to keep track of total accumulated exec time
+ when checkpointing and restarts were used */
+
+ if(processID == 0)
+ accumulatedTime = 0.0;
+
+ /* get the starting tree: here we just parse the tree passed via the command line
+ and do an initial likelihood computation traversal
+ which we maybe should skip, TODO */
+
+ getStartingTree(tr);
+
+#ifdef _USE_OMP
+ allocateXVectors(tr);
+#endif
+
+ /*
+ here we do an initial full tree traversal on the starting tree using the Felsenstein pruning algorithm
+ This should basically be the first call to the library that actually computes something :-)
+ */
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ /* the treeEvaluate() function repeatedly iterates over the entire tree to optimize branch lengths until convergence */
+
+ treeEvaluate(tr, 1);
+
+ /* now start the ML search algorithm */
+
+ computeBIGRAPID(tr, adef, TRUE);
+
+ /* now print the model parameters to file */
+ if(processID == 0)
+ printModelParams(tr, adef, -1);
+
+ }
+ break;
+ case QUARTET_CALCULATION:
+ computeQuartets(tr, adef);
+ break;
+ default:
+ assert(0);
+ }
+
+ /* print some more nonsense into the ExaML_info file */
+
+ if(processID == 0)
+ finalizeInfoFile(tr, adef);
+ }
+
+ /* return 0 which means that our unix program terminated correctly, the return value is not 1 here */
+
+ clean_MPI_Exit();
+
+ return 0;
+}
+
+
diff --git a/examl/axml.h b/examl/axml.h
new file mode 100644
index 0000000..95e90c1
--- /dev/null
+++ b/examl/axml.h
@@ -0,0 +1,1418 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses
+ * with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef _AXML_H
+#define _AXML_H
+
+
+#include <assert.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include "../versionHeader/version.h"
+
+#ifdef __MIC_NATIVE
+#define BYTE_ALIGNMENT 64
+#define VECTOR_PADDING 8
+#elif defined __AVX
+#define BYTE_ALIGNMENT 32
+#define VECTOR_PADDING 1
+#else
+#define BYTE_ALIGNMENT 16
+#define VECTOR_PADDING 1
+#endif
+
+#define GET_PADDED_WIDTH(w) w % VECTOR_PADDING == 0 ? w : w + (VECTOR_PADDING - (w % VECTOR_PADDING))
+
+#include <mpi.h>
+
+#ifdef _USE_OMP
+#include "omp.h"
+#endif
+
+/* BEGIN: file streams */
+#ifdef _GNU_SOURCE
+
+/* notice, that the gnu source macro implies posix compliance */
+
+
+/* these are posix compliant functions. They potentially work on files
+ larger than 2 GB (for gcc this can be ensured using the following
+ macro) */
+#define exa_fseek fseeko
+#define exa_ftell ftello
+#define exa_off_t off_t
+
+/* only usefull for ftello/fseeko: ensure that we are using 64-bit
+ types for representing an offset */
+#define _FILE_OFFSET_BITS 64
+
+#else
+
+#define exa_fseek fseek
+#define exa_ftell ftell
+#define exa_off_t long
+
+#endif
+/* END: file streams */
+
+
+#define MAX_TIP_EV 0.999999999 /* max tip vector value, sum of EVs needs to be smaller than 1.0, otherwise the numerics break down */
+#define smoothings 32 /* maximum smoothing passes through tree */
+#define iterations 10 /* maximum iterations of iterations per insert */
+#define newzpercycle 1 /* iterations of makenewz per tree traversal */
+#define nmlngth 256 /* number of characters in species name */
+#define deltaz 0.00001 /* test of net branch length change in update */
+#define defaultz 0.9 /* value of z assigned as starting point */
+#define unlikely -1.0E300 /* low likelihood for initialization */
+
+
+#define AUTO_ML 0
+#define AUTO_BIC 1
+#define AUTO_AIC 2
+#define AUTO_AICC 3
+
+#define SUMMARIZE_LENGTH -3
+#define SUMMARIZE_LH -2
+#define NO_BRANCHES -1
+
+#define MASK_LENGTH 32
+#define GET_BITVECTOR_LENGTH(x) ((x % MASK_LENGTH) ? (x / MASK_LENGTH + 1) : (x / MASK_LENGTH))
+
+#define zmin 1.0E-15 /* max branch prop. to -log(zmin) (= 34) */
+#define zmax (1.0 - 1.0E-6) /* min branch prop. to 1.0-zmax (= 1.0E-6) */
+
+#define twotothe256 \
+ 115792089237316195423570985008687907853269984665640564039457584007913129639936.0
+ /* 2**256 (exactly) */
+
+#define minlikelihood (1.0/twotothe256)
+#define minusminlikelihood -minlikelihood
+
+
+
+
+/* 18446744073709551616.0 */
+
+/*4294967296.0*/
+
+/* 18446744073709551616.0 */
+
+/* 2**64 (exactly) */
+/* 4294967296 2**32 */
+
+#define badRear -1
+
+#define NUM_BRANCHES 256
+
+#define TRUE 1
+#define FALSE 0
+
+
+
+#define LIKELIHOOD_EPSILON 0.0000001
+
+#define AA_SCALE 10.0
+#define AA_SCALE_PLUS_EPSILON 10.001
+
+/* ALPHA_MIN is critical -> numerical instability, eg for 4 discrete rate cats */
+/* and alpha = 0.01 the lowest rate r_0 is */
+/* 0.00000000000000000000000000000000000000000000000000000000000034878079110511010487 */
+/* which leads to numerical problems Table for alpha settings below: */
+/* */
+/* 0.010000 0.00000000000000000000000000000000000000000000000000000000000034878079110511010487 */
+/* 0.010000 yielded nasty numerical bugs in at least one case ! */
+/* 0.020000 0.00000000000000000000000000000044136090435925743185910935350715027016962154188875 */
+/* 0.030000 0.00000000000000000000476844846859006690412039180149775802624789852441798419292220 */
+/* 0.040000 0.00000000000000049522423236954066431210260930029681736928018820007024736185030633 */
+/* 0.050000 0.00000000000050625351310359203371872643495343928538368616365517027588794007897377 */
+/* 0.060000 0.00000000005134625283884191118711474021861409372524676086868566926568746566772461 */
+/* 0.070000 0.00000000139080650074206434685544624965062437960128249869740102440118789672851562 */
+/* 0.080000 0.00000001650681201563587066858709818343436959153791576682124286890029907226562500 */
+/* 0.090000 0.00000011301977332931251259273962858978301859735893231118097901344299316406250000 */
+/* 0.100000 0.00000052651925834844387815526344648331402709118265192955732345581054687500000000 */
+
+
+#define ALPHA_MIN 0.02
+#define ALPHA_MAX 1000.0
+
+#define RATE_MIN 0.0000001
+#define RATE_MAX 1000000.0
+
+#define INVAR_MIN 0.0001
+#define INVAR_MAX 0.9999
+
+#define TT_MIN 0.0000001
+#define TT_MAX 1000000.0
+
+#define FREQ_MIN 0.001
+
+#define LG4X_RATE_MIN 0.0000001
+#define LG4X_RATE_MAX 1000.0
+
+/*
+ previous values between 0.001 and 0.000001
+
+ TO AVOID NUMERICAL PROBLEMS WHEN FREQ == 0 IN PARTITIONED MODELS, ESPECIALLY WITH AA
+ previous value of FREQ_MIN was: 0.000001, but this seemed to cause problems with some
+ of the 7-state secondary structure models with some rather exotic small toy test datasets,
+ on the other hand 0.001 caused problems with some of the 16-state secondary structure models
+
+ For some reason the frequency settings seem to be repeatedly causing numerical problems
+
+*/
+
+#define ITMAX 100
+
+
+
+#define SHFT(a,b,c,d) (a)=(b);(b)=(c);(c)=(d);
+#define SIGN(a,b) ((b) > 0.0 ? fabs(a) : -fabs(a))
+
+#define ABS(x) (((x)<0) ? (-(x)) : (x))
+#define MIN(x,y) (((x)<(y)) ? (x) : (y))
+#define MAX(x,y) (((x)>(y)) ? (x) : (y))
+#define NINT(x) ((int) ((x)>0 ? ((x)+0.5) : ((x)-0.5)))
+
+
+#define LOG(x) log(x)
+
+#define FABS(x) fabs(x)
+
+
+#define EXP(x) exp(x)
+
+
+
+
+
+
+#define PointGamma(prob,alpha,beta) PointChi2(prob,2.0*(alpha))/(2.0*(beta))
+
+//#define programName "ExaML"
+//#define programVersion "2.0.3"
+//#define programDate "June 25 2014"
+
+
+#define TREE_EVALUATION 0
+#define BIG_RAPID_MODE 1
+#define QUARTET_CALCULATION 2
+
+
+#define M_GTRCAT 1
+#define M_GTRGAMMA 2
+#define M_BINCAT 3
+#define M_BINGAMMA 4
+#define M_PROTCAT 5
+#define M_PROTGAMMA 6
+#define M_32CAT 7
+#define M_32GAMMA 8
+#define M_64CAT 9
+#define M_64GAMMA 10
+
+
+#define DAYHOFF 0
+#define DCMUT 1
+#define JTT 2
+#define MTREV 3
+#define WAG 4
+#define RTREV 5
+#define CPREV 6
+#define VT 7
+#define BLOSUM62 8
+#define MTMAM 9
+#define LG 10
+#define MTART 11
+#define MTZOA 12
+#define PMB 13
+#define HIVB 14
+#define HIVW 15
+#define JTTDCMUT 16
+#define FLU 17
+#define STMTREV 18
+#define AUTO 19
+#define LG4M 20
+#define LG4X 21
+#define GTR 22 /* GTR always needs to be the last one */
+
+#define NUM_PROT_MODELS 23
+
+/* bipartition stuff */
+
+#define BIPARTITIONS_ALL 0
+#define GET_BIPARTITIONS_BEST 1
+#define DRAW_BIPARTITIONS_BEST 2
+#define BIPARTITIONS_BOOTSTOP 3
+#define BIPARTITIONS_RF 4
+
+
+
+/* bootstopping stuff */
+
+#define BOOTSTOP_PERMUTATIONS 100
+#define START_BSTOP_TEST 10
+
+#define FC_THRESHOLD 99
+#define FC_SPACING 50
+#define FC_LOWER 0.99
+#define FC_INIT 20
+
+#define FREQUENCY_STOP 0
+#define MR_STOP 1
+#define MRE_STOP 2
+#define MRE_IGN_STOP 3
+
+#define MR_CONSENSUS 0
+#define MRE_CONSENSUS 1
+#define STRICT_CONSENSUS 2
+
+
+
+/* bootstopping stuff end */
+
+
+#define TIP_TIP 0
+#define TIP_INNER 1
+#define INNER_INNER 2
+
+#define MIN_MODEL -1
+#define BINARY_DATA 0
+#define DNA_DATA 1
+#define AA_DATA 2
+#define SECONDARY_DATA 3
+#define SECONDARY_DATA_6 4
+#define SECONDARY_DATA_7 5
+#define GENERIC_32 6
+#define GENERIC_64 7
+#define MAX_MODEL 8
+
+#define SEC_6_A 0
+#define SEC_6_B 1
+#define SEC_6_C 2
+#define SEC_6_D 3
+#define SEC_6_E 4
+
+#define SEC_7_A 5
+#define SEC_7_B 6
+#define SEC_7_C 7
+#define SEC_7_D 8
+#define SEC_7_E 9
+#define SEC_7_F 10
+
+#define SEC_16 11
+#define SEC_16_A 12
+#define SEC_16_B 13
+#define SEC_16_C 14
+#define SEC_16_D 15
+#define SEC_16_E 16
+#define SEC_16_F 17
+#define SEC_16_I 18
+#define SEC_16_J 19
+#define SEC_16_K 20
+
+#define ORDERED_MULTI_STATE 0
+#define MK_MULTI_STATE 1
+#define GTR_MULTI_STATE 2
+
+
+
+
+
+#define CAT 0
+#define GAMMA 1
+#define GAMMA_I 2
+
+
+
+typedef int boolean;
+
+
+typedef struct {
+ double lh;
+ int tree;
+ double weight;
+} elw;
+
+struct ent
+{
+ unsigned int *bitVector;
+ unsigned int *treeVector;
+ unsigned int amountTips;
+ int *supportVector;
+ unsigned int bipNumber;
+ unsigned int bipNumber2;
+ unsigned int supportFromTreeset[2];
+ struct ent *next;
+};
+
+typedef struct ent entry;
+
+typedef unsigned int hashNumberType;
+
+
+
+/*typedef uint_fast32_t parsimonyNumber;*/
+
+#define PCF 32
+
+/*
+ typedef uint64_t parsimonyNumber;
+
+ #define PCF 16
+
+
+typedef unsigned char parsimonyNumber;
+
+#define PCF 2
+*/
+
+typedef struct
+{
+ hashNumberType tableSize;
+ entry **table;
+ hashNumberType entryCount;
+}
+ hashtable;
+
+
+struct stringEnt
+{
+ int nodeNumber;
+ char *word;
+ struct stringEnt *next;
+};
+
+typedef struct stringEnt stringEntry;
+
+typedef struct
+{
+ hashNumberType tableSize;
+ stringEntry **table;
+}
+ stringHashtable;
+
+
+
+
+
+typedef struct ratec
+{
+ double accumulatedSiteLikelihood;
+ double rate;
+}
+ rateCategorize;
+
+
+typedef struct
+{
+ int tipCase;
+ int pNumber;
+ int qNumber;
+ int rNumber;
+ double qz[NUM_BRANCHES];
+ double rz[NUM_BRANCHES];
+} traversalInfo;
+
+typedef struct
+{
+ traversalInfo *ti;
+ int count;
+ int functionType;
+ boolean traversalHasChanged;
+ boolean *executeModel;
+ double *parameterValues;
+} traversalData;
+
+
+struct noderec;
+
+
+
+typedef struct
+{
+
+
+ unsigned int *vector;
+ int support;
+ struct noderec *oP;
+ struct noderec *oQ;
+} branchInfo;
+
+
+
+
+
+
+
+
+typedef struct
+{
+ boolean valid;
+ int partitions;
+ int *partitionList;
+}
+ linkageData;
+
+typedef struct
+{
+ int entries;
+ linkageData* ld;
+}
+ linkageList;
+
+
+typedef struct noderec
+{
+ double z[NUM_BRANCHES];
+#ifdef _BAYESIAN
+ double z_tmp[NUM_BRANCHES];
+#endif
+ struct noderec *next;
+ struct noderec *back;
+ hashNumberType hash;
+ int number;
+ char x;
+ char xPars;
+ char xBips;
+}
+ node, *nodeptr;
+
+typedef struct
+ {
+ double lh;
+ int number;
+ }
+ info;
+
+typedef struct bInf {
+ double likelihood;
+ nodeptr node;
+} bestInfo;
+
+typedef struct iL {
+ bestInfo *list;
+ int n;
+ int valid;
+} infoList;
+
+
+
+typedef unsigned int parsimonyNumber;
+
+
+
+
+typedef struct {
+ int states;
+ int maxTipStates;
+ size_t lower;
+ size_t upper;
+ size_t width;
+
+ size_t offset; /* NEW: makes the data assigned to
+ this process identifiable (since we
+ now, that all data from one
+ partition must be in one contiguous
+ chunk). */
+
+ int dataType;
+ int protModels;
+ int autoProtModels;
+ int protFreqs;
+ boolean nonGTR;
+ boolean optimizeBaseFrequencies;
+ int numberOfCategories;
+
+ char *partitionName;
+ int *symmetryVector;
+ int *frequencyGrouping;
+
+ double *sumBuffer;
+ double gammaRates[4];
+ double *EIGN;
+ double *EV;
+ double *EI;
+ double *left;
+ double *right;
+
+ /* LG4 */
+
+ double *rawEIGN_LG4[4];
+ double *EIGN_LG4[4];
+ double *EV_LG4[4];
+ double *EI_LG4[4];
+
+ double *frequencies_LG4[4];
+ double *tipVector_LG4[4];
+ double *substRates_LG4[4];
+
+ /* LG4X */
+
+ double weights[4];
+ double weightExponents[4];
+
+ double weightsBuffer[4];
+ double weightExponentsBuffer[4];
+
+ /* LG4 */
+
+ double *frequencies;
+ double *freqExponents;
+ double *empiricalFrequencies;
+ double *tipVector;
+ double *substRates;
+ double *perSiteRates;
+ int *wgt;
+ int *rateCategory;
+ double alpha;
+
+ double **xVector;
+ size_t *xSpaceVector;
+ unsigned char **yVector;
+ unsigned char *yResource; /* contains the entire array, that is referenced in yVector */
+ unsigned int *globalScaler;
+
+ int gapVectorLength;
+ unsigned int *gapVector;
+ double *gapColumn;
+
+ size_t parsimonyLength;
+ parsimonyNumber *parsVect;
+
+ double *lhs;
+ double *patrat;
+
+#ifdef _USE_OMP
+ /* thread-private data for OMP version */
+ unsigned int **threadGlobalScaler;
+ double *reductionBuffer;
+ double *reductionBuffer2;
+#endif
+
+#ifdef __MIC_NATIVE
+ double *mic_EV;
+ double *mic_tipVector;
+
+ /* these arrays will store the precomputed product of tipVector and left/right P-matrix */
+ double *mic_umpLeft;
+ double *mic_umpRight;
+#endif
+
+} pInfo;
+
+
+
+typedef struct
+{
+ int left;
+ int right;
+ double likelihood;
+} lhEntry;
+
+
+typedef struct
+{
+ int count;
+ int size;
+ lhEntry *entries;
+} lhList;
+
+
+typedef struct List_{
+ void *value;
+ struct List_ *next;
+} List;
+
+
+#define REARR_SETTING 1
+#define FAST_SPRS 2
+#define SLOW_SPRS 3
+#define MOD_OPT 4
+#define QUARTETS 5
+
+typedef struct {
+ boolean useMedian;
+ int saveBestTrees;
+ boolean saveMemory;
+ boolean searchConvergenceCriterion;
+ boolean perGeneBranchLengths; //adef
+ double likelihoodEpsilon; //adef
+ int categories;
+ int mode; //adef
+ int fastTreeEvaluation;
+ boolean initialSet;//adef
+ int initial;//adef
+ int rateHetModel;
+ int autoProteinSelectionType;
+
+ //quartets
+ boolean useQuartetGrouping;//adef
+ unsigned long int numberRandomQuartets;//adef
+
+} commandLine;
+
+typedef struct {
+
+ int state;
+
+ /* search algorithm */
+
+ unsigned int vLength;
+
+ boolean constraintTree;
+
+ int rearrangementsMax;
+ int rearrangementsMin;
+ int thoroughIterations;
+ int fastIterations;
+ int treeVectorLength;
+ int mintrav;
+ int maxtrav;
+ int bestTrav;
+ int Thorough;
+ int optimizeRateCategoryInvocations;
+
+ double accumulatedTime;
+
+ double startLH;
+ double lh;
+ double previousLh;
+ double difference;
+ double epsilon;
+
+ boolean impr;
+ boolean cutoff;
+
+ double tr_startLH;
+ double tr_endLH;
+ double tr_likelihood;
+ double tr_bestOfNode;
+
+ double tr_lhCutoff;
+ double tr_lhAVG;
+ double tr_lhDEC;
+ int tr_NumberOfCategories;
+ int tr_itCount;
+ int tr_doCutoff;
+
+ /* modOpt */
+
+ int catOpt;
+ int treeIteration;
+ /* quartets */
+
+ long seed;
+ int flavor;
+ uint64_t quartetCounter;
+ long filePosition;
+ char quartetFileName[1024];
+ //FILE NAME???
+
+ /* command line settings */
+
+ commandLine cmd;
+
+} checkPointState;
+
+
+typedef struct {
+ double EIGN[19] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double EV[400] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double EI[380] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double substRates[190];
+ double frequencies[20] ;
+ double tipVector[460] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double left[1600] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double right[1600] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+} siteAAModels;
+
+
+typedef struct assign
+{
+ int partitionId;
+ int procId; /* to which process is the partition assigned */
+ size_t offset; /* what is the offset of this assignment */
+ size_t width;
+} Assign ;
+
+
+typedef struct {
+
+ int *ti;
+
+ unsigned int randomSeed;
+ boolean constraintTree;
+ boolean useGappedImplementation;
+ boolean saveMemory;
+ int saveBestTrees;
+
+ stringHashtable *nameHash;
+
+ pInfo *partitionData;
+
+
+ char *secondaryStructureInput;
+
+ boolean *executeModel;
+
+ double *perPartitionLH;
+
+ traversalData td[1];
+
+ int maxCategories;
+ int categories;
+
+ double coreLZ[NUM_BRANCHES];
+ int numBranches;
+
+
+
+ branchInfo *bInf;
+
+ int multiStateModel;
+
+
+ boolean curvatOK[NUM_BRANCHES];
+ /* the stuff below is shared among DNA and AA, span does
+ not change depending on datatype */
+
+ /* model stuff end */
+
+ unsigned char **yVector;
+ int secondaryStructureModel;
+ size_t originalCrunchedLength;
+
+
+ int *secondaryStructurePairs;
+
+
+ double *partitionContributions;
+ double *partitionWeights;
+
+ double lhCutoff;
+ double lhAVG;
+ unsigned long lhDEC;
+ unsigned long itCount;
+ int numberOfInvariableColumns;
+ int weightOfInvariableColumns;
+ int rateHetModel;
+
+ double startLH;
+ double endLH;
+ double likelihood;
+
+
+ node **nodep;
+ nodeptr nodeBaseAddress;
+ node *start;
+ int mxtips;
+
+ int *constraintVector;
+ int numberOfSecondaryColumns;
+ boolean searchConvergenceCriterion;
+ int ntips;
+ int nextnode;
+ int NumberOfModels;
+
+ boolean bigCutoff;
+ boolean partitionSmoothed[NUM_BRANCHES];
+ boolean partitionConverged[NUM_BRANCHES];
+ boolean rooted;
+ boolean doCutoff;
+
+
+
+ double gapyness;
+
+ char **nameList;
+ char *tree_string;
+ char *treeStrings;
+ char *tree0;
+ char *tree1;
+ int treeStringLength;
+
+ unsigned int bestParsimony;
+ unsigned int *parsimonyScore;
+
+ double bestOfNode;
+ nodeptr removeNode;
+ nodeptr insertNode;
+
+ double zqr[NUM_BRANCHES];
+ double currentZQR[NUM_BRANCHES];
+
+ double currentLZR[NUM_BRANCHES];
+ double currentLZQ[NUM_BRANCHES];
+ double currentLZS[NUM_BRANCHES];
+ double currentLZI[NUM_BRANCHES];
+ double lzs[NUM_BRANCHES];
+ double lzq[NUM_BRANCHES];
+ double lzr[NUM_BRANCHES];
+ double lzi[NUM_BRANCHES];
+
+
+
+
+
+ unsigned int **bitVectors;
+
+ unsigned int vLength;
+
+ hashtable *h;
+
+ char bits_in_16bits [0x1u << 16];
+
+ boolean useMedian;
+
+ int autoProteinSelectionType;
+
+ int numberOfTrees;
+
+ double *likelihoods;
+
+ boolean fastTreeEvaluation;
+
+ int numAssignments;
+ Assign *partAssigns;
+
+ /**
+ IMPORTANT:
+
+ introducing a few resource pointers. All memeory needed for
+ example for per-partition patrats is owned by these
+ basepointers, the per-partition pointer just points at the
+ contiguous block of memory.
+
+ The big advantage, why I really think, this is worth it is, that
+ these base pointers can be used to conveniently gather/scatter
+ all data at a master. The master still has to reorder the
+ gathered data, but less copying is necessary at the workers
+
+ REQUIREMENTS:
+
+ * all memory necessary for a partition must be in a contiguous
+ block,
+
+ * memory for partitions is ordered by partition id (first
+ * partition 1, then partition 2,... )
+ */
+
+ double *patrat_basePtr;
+ int *rateCategory_basePtr;
+ double *lhs_basePtr;
+
+#ifdef _USE_OMP
+ /* number of OMP threads*/
+ int nThreads;
+
+ /* maximum number of partitions assigned to a single thread */
+ int maxModelsPerThread;
+
+ /* maximum number of threads assigned to a single partition */
+ int maxThreadsPerModel;
+
+ /* partition-to-threads assignments: indexed by thread */
+ Assign **threadPartAssigns;
+
+ /* partition-to-threads assignments: indexed by partition id */
+ Assign **partThreadAssigns;
+
+#endif
+
+} tree;
+
+
+/***************************************************************/
+
+typedef struct {
+ int partitionNumber;
+ int partitionLength;
+} partitionType;
+
+typedef struct
+{
+ double z[NUM_BRANCHES];
+ nodeptr p, q;
+ int cp, cq;
+}
+ connectRELL, *connptrRELL;
+
+typedef struct
+{
+ connectRELL *connect;
+ int start;
+ double likelihood;
+}
+ topolRELL;
+
+
+typedef struct
+{
+ int max;
+ topolRELL **t;
+}
+ topolRELL_LIST;
+
+
+/**************************************************************/
+
+
+
+typedef struct conntyp {
+ double z[NUM_BRANCHES]; /* branch length */
+ node *p, *q; /* parent and child sectors */
+ void *valptr; /* pointer to value of subtree */
+ int descend; /* pointer to first connect of child */
+ int sibling; /* next connect from same parent */
+ } connect, *connptr;
+
+typedef struct {
+ double likelihood;
+ int initialTreeNumber;
+ connect *links; /* pointer to first connect (start) */
+ node *start;
+ int nextlink; /* index of next available connect */
+ /* tr->start = tpl->links->p */
+ int ntips;
+ int nextnode;
+ int scrNum; /* position in sorted list of scores */
+ int tplNum; /* position in sorted list of trees */
+
+ } topol;
+
+typedef struct {
+ double best; /* highest score saved */
+ double worst; /* lowest score saved */
+ topol *start; /* starting tree for optimization */
+ topol **byScore;
+ topol **byTopol;
+ int nkeep; /* maximum topologies to save */
+ int nvalid; /* number of topologies saved */
+ int ninit; /* number of topologies initialized */
+ int numtrees; /* number of alternatives tested */
+ boolean improved;
+ } bestlist;
+
+#define randomTree 0
+#define givenTree 1
+#define parsimonyTree 2
+
+typedef struct {
+ int bestTrav;
+ int max_rearrange;
+ int stepwidth;
+ int initial;
+ boolean initialSet;
+ int mode;
+ boolean perGeneBranchLengths;
+ boolean permuteTreeoptimize;
+ boolean compressPatterns;
+ double likelihoodEpsilon;
+ boolean useCheckpoint;
+ boolean useQuartetGrouping;
+ unsigned long int numberRandomQuartets;
+ unsigned long int quartetCkpInterval;
+
+#ifdef _BAYESIAN
+ boolean bayesian;
+#endif
+} analdef;
+
+
+
+
+typedef struct
+{
+ int leftLength;
+ int rightLength;
+ int eignLength;
+ int evLength;
+ int eiLength;
+ int substRatesLength;
+ int frequenciesLength;
+ int tipVectorLength;
+ int symmetryVectorLength;
+ int frequencyGroupingLength;
+
+ boolean nonGTR;
+
+ int undetermined;
+
+ const char *inverseMeaning;
+
+ int states;
+
+ boolean smoothFrequencies;
+
+ const unsigned int *bitVector;
+
+} partitionLengths;
+
+/****************************** FUNCTIONS ****************************************************/
+
+#ifdef _BAYESIAN
+extern void mcmc(tree *tr, analdef *adef);
+#endif
+
+
+boolean isThisMyPartition(tree *localTree, int tid, int model);
+
+extern boolean allSmoothed(tree *tr);
+
+extern int treeFindTipName(FILE *fp, tree *tr, boolean check);
+
+extern void computePlacementBias(tree *tr, analdef *adef);
+
+extern int lookupWord(char *s, stringHashtable *h);
+
+extern void getDataTypeString(tree *tr, int model, char typeOfData[1024]);
+
+extern unsigned int genericBitCount(unsigned int* bitVector, unsigned int bitVectorLength);
+extern int countTips(nodeptr p, int numsp);
+extern entry *initEntry(void);
+extern void computeRogueTaxa(tree *tr, char* treeSetFileName, analdef *adef);
+extern unsigned int precomputed16_bitcount(unsigned int n, char *bits_in_16bits);
+
+
+
+
+
+extern size_t discreteRateCategories(int rateHetModel);
+
+extern partitionLengths * getPartitionLengths(pInfo *p);
+extern boolean getSmoothFreqs(int dataType);
+extern const unsigned int *getBitVector(int dataType);
+extern int getUndetermined(int dataType);
+extern int getStates(int dataType);
+extern char getInverseMeaning(int dataType, unsigned char state);
+extern double gettime ( void );
+extern int gettimeSrand ( void );
+extern double randum ( long *seed );
+
+extern void getxnode ( nodeptr p );
+extern void hookup ( nodeptr p, nodeptr q, double *z, int numBranches);
+extern void hookupDefault ( nodeptr p, nodeptr q, int numBranches);
+extern boolean whitechar ( int ch );
+extern void errorExit ( int e );
+extern void printResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printBootstrapResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printBipartitionResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printLog ( tree *tr);
+extern void printStartingTree ( tree *tr, analdef *adef, boolean finalPrint );
+extern void writeInfoFile ( analdef *adef, tree *tr, double t );
+extern int main ( int argc, char *argv[] );
+extern void calcBipartitions ( tree *tr, analdef *adef, char *bestTreeFileName, char *bootStrapFileName );
+extern void initReversibleGTR (tree *tr, int model);
+extern double LnGamma ( double alpha );
+extern double IncompleteGamma ( double x, double alpha, double ln_gamma_alpha );
+extern double PointNormal ( double prob );
+extern double PointChi2 ( double prob, double v );
+extern void makeGammaCats (double alpha, double *gammaRates, int K, boolean useMedian);
+extern void initModel ( tree *tr);
+extern void doAllInOne ( tree *tr, analdef *adef );
+
+extern void classifyML(tree *tr, analdef *adef);
+
+extern void resetBranches ( tree *tr );
+extern void modOpt ( tree *tr, double likelihoodEpsilon, analdef *adef, int treeIteration);
+
+
+
+extern void computeBOOTRAPID (tree *tr, analdef *adef, long *radiusSeed);
+extern void optimizeRAPID ( tree *tr, analdef *adef );
+extern void thoroughOptimization ( tree *tr, analdef *adef, topolRELL_LIST *rl, int index );
+extern int treeOptimizeThorough ( tree *tr, int mintrav, int maxtrav);
+extern void computeQuartets(tree *tr, analdef *adef);
+
+extern void makeRandomTree ( tree *tr);
+extern void nodeRectifier ( tree *tr );
+extern void makeParsimonyTreeFast(tree *tr);
+extern void allocateParsimonyDataStructures(tree *tr);
+extern void freeParsimonyDataStructures(tree *tr);
+extern void parsimonySPR(nodeptr p, tree *tr);
+
+extern FILE *myfopen(const char *path, const char *mode);
+
+
+extern boolean initrav ( tree *tr, nodeptr p );
+extern void initravPartition ( tree *tr, nodeptr p, int model );
+extern boolean update ( tree *tr, nodeptr p );
+extern boolean smooth ( tree *tr, nodeptr p );
+extern boolean smoothTree ( tree *tr, int maxtimes );
+extern boolean localSmooth ( tree *tr, nodeptr p, int maxtimes );
+extern boolean localSmoothMulti(tree *tr, nodeptr p, int maxtimes, int model);
+extern void initInfoList ( int n );
+extern void freeInfoList ( void );
+extern void insertInfoList ( nodeptr node, double likelihood );
+extern boolean smoothRegion ( tree *tr, nodeptr p, int region );
+extern boolean regionalSmooth ( tree *tr, nodeptr p, int maxtimes, int region );
+extern nodeptr removeNodeBIG ( tree *tr, nodeptr p, int numBranches);
+extern nodeptr removeNodeRestoreBIG ( tree *tr, nodeptr p );
+extern boolean insertBIG ( tree *tr, nodeptr p, nodeptr q, int numBranches);
+extern boolean insertRestoreBIG ( tree *tr, nodeptr p, nodeptr q );
+extern boolean testInsertBIG ( tree *tr, nodeptr p, nodeptr q );
+extern void addTraverseBIG ( tree *tr, nodeptr p, nodeptr q, int mintrav, int maxtrav );
+extern int rearrangeBIG ( tree *tr, nodeptr p, int mintrav, int maxtrav );
+extern void traversalOrder ( nodeptr p, int *count, nodeptr *nodeArray );
+extern double treeOptimizeRapid ( tree *tr, int mintrav, int maxtrav, analdef *adef, bestlist *bt, bestlist *bestML);
+extern boolean testInsertRestoreBIG ( tree *tr, nodeptr p, nodeptr q );
+extern void restoreTreeFast ( tree *tr );
+extern int determineRearrangementSetting ( tree *tr, analdef *adef, bestlist *bestT, bestlist *bt, bestlist *bestML);
+extern void computeBIGRAPID ( tree *tr, analdef *adef, boolean estimateModel);
+extern boolean treeEvaluate ( tree *tr, double smoothFactor );
+extern boolean treeEvaluatePartition ( tree *tr, double smoothFactor, int model );
+
+extern void meshTreeSearch(tree *tr, analdef *adef, int thorough);
+
+extern void initTL ( topolRELL_LIST *rl, tree *tr, int n );
+extern void freeTL ( topolRELL_LIST *rl);
+extern void restoreTL ( topolRELL_LIST *rl, tree *tr, int n );
+extern void resetTL ( topolRELL_LIST *rl );
+extern void saveTL ( topolRELL_LIST *rl, tree *tr, int index );
+
+extern int saveBestTree (bestlist *bt, tree *tr, boolean keepIdenticalTrees);
+extern int recallBestTree (bestlist *bt, int rank, tree *tr);
+extern int initBestTree ( bestlist *bt, int newkeep, int numsp );
+extern void resetBestTree ( bestlist *bt );
+extern boolean freeBestTree ( bestlist *bt );
+
+
+extern char *Tree2String ( char *treestr, tree *tr, nodeptr p, boolean printBranchLengths, boolean printNames, boolean printLikelihood,
+ boolean rellTree, boolean finalPrint, int perGene, boolean branchLabelSupport, boolean printSHSupport);
+extern void printTreePerGene(tree *tr, analdef *adef, char *fileName, char *permission);
+
+
+
+extern int treeReadLen (FILE *fp, tree *tr, boolean readBranches, boolean readNodeLabels, boolean topologyOnly);
+extern void treeReadTopologyString(char *treeString, tree *tr);
+extern boolean treeReadLenMULT ( FILE *fp, tree *tr, int *partCount);
+
+extern void getStartingTree ( tree *tr);
+
+extern void computeBootStopOnly(tree *tr, char *bootStrapFileName, analdef *adef);
+extern boolean bootStop(tree *tr, hashtable *h, int numberOfTrees, double *pearsonAverage, unsigned int **bitVectors, int treeVectorLength, unsigned int vectorLength);
+extern void computeConsensusOnly(tree *tr, char* treeSetFileName, analdef *adef);
+extern double evaluatePartialGeneric (tree *, int i, double ki, int _model);
+extern void evaluateGeneric (tree *tr, nodeptr p, boolean fullTraversal);
+extern void newviewGeneric (tree *tr, nodeptr p, boolean masked);
+extern void newviewGenericMulti (tree *tr, nodeptr p, int model);
+extern void makenewzGeneric(tree *tr, nodeptr p, nodeptr q, double *z0, int maxiter, double *result, boolean mask);
+extern void makenewzGenericDistance(tree *tr, int maxiter, double *z0, double *result, int taxon1, int taxon2);
+extern double evaluatePartitionGeneric (tree *tr, nodeptr p, int model);
+extern void newviewPartitionGeneric (tree *tr, nodeptr p, int model);
+extern double evaluateGenericVector (tree *tr, nodeptr p);
+extern void categorizeGeneric (tree *tr, nodeptr p);
+extern double makenewzPartitionGeneric(tree *tr, nodeptr p, nodeptr q, double z0, int maxiter, int model);
+extern boolean isTip(int number, int maxTips);
+extern void computeTraversalInfo(nodeptr p, traversalInfo *ti, int *counter, int maxTips, int numBranches, boolean partialTraversal);
+
+
+
+extern void newviewIterative(tree *tr, int startIndex);
+
+extern void evaluateIterative(tree *);
+
+extern void *malloc_aligned( size_t size);
+
+extern void storeExecuteMaskInTraversalDescriptor(tree *tr);
+extern void storeValuesInTraversalDescriptor(tree *tr, double *value);
+
+
+
+
+extern void makenewzIterative(tree *);
+extern void execCore(tree *, volatile double *dlnLdlz, volatile double *d2lnLdlz2);
+
+
+
+extern void determineFullTraversal(nodeptr p, tree *tr);
+/*extern void optRateCat(tree *, int i, double lower_spacing, double upper_spacing, double *lhs);*/
+
+
+
+
+
+extern double evaluateGenericInitravPartition(tree *tr, nodeptr p, int model);
+extern void evaluateGenericVectorIterative(tree *, int startIndex, int endIndex);
+extern void categorizeIterative(tree *, int startIndex, int endIndex);
+
+extern void fixModelIndices(tree *tr, int endsite, boolean fixRates);
+extern void calculateModelOffsets(tree *tr);
+extern void gammaToCat(tree *tr);
+extern void catToGamma(tree *tr, analdef *adef);
+
+
+extern nodeptr findAnyTip(nodeptr p, int numsp);
+
+extern void parseProteinModel(analdef *adef);
+
+
+
+extern void computeNextReplicate(tree *tr, long *seed, int *originalRateCategories, int *originalInvariant, boolean isRapid, boolean fixRates);
+/*extern void computeNextReplicate(tree *tr, analdef *adef, int *originalRateCategories, int *originalInvariant);*/
+
+extern void putWAG(double *ext_initialRates);
+
+extern void reductionCleanup(tree *tr, int *originalRateCategories, int *originalInvariant);
+extern void parseSecondaryStructure(tree *tr, analdef *adef, int sites);
+extern void printPartitions(tree *tr);
+extern void compareBips(tree *tr, char *bootStrapFileName, analdef *adef);
+extern void computeRF(tree *tr, char *bootStrapFileName, analdef *adef);
+
+
+extern unsigned int **initBitVector(int mxtips, unsigned int *vectorLength);
+extern hashtable *copyHashTable(hashtable *src, unsigned int vectorLength);
+extern hashtable *initHashTable(unsigned int n);
+extern void cleanupHashTable(hashtable *h, int state);
+extern double convergenceCriterion(hashtable *h, int mxtips);
+extern void freeBitVectors(unsigned int **v, int n);
+extern void freeHashTable(hashtable *h);
+extern stringHashtable *initStringHashTable(hashNumberType n);
+extern void addword(char *s, stringHashtable *h, int nodeNumber);
+
+
+extern void printBothOpen(const char* format, ... );
+extern void initRateMatrix(tree *tr);
+
+extern void bitVectorInitravSpecial(unsigned int **bitVectors, nodeptr p, int numsp, unsigned int vectorLength, hashtable *h, int treeNumber, int function, branchInfo *bInf,
+ int *countBranches, int treeVectorLength, boolean traverseOnly, boolean computeWRF);
+
+extern int getIncrement(tree *tr, int model);
+
+
+
+extern void writeBinaryModel(tree *tr);
+extern void readBinaryModel(tree *tr);
+extern void treeEvaluateRandom (tree *tr, double smoothFactor);
+extern void treeEvaluateProgressive(tree *tr);
+
+extern void testGapped(tree *tr);
+
+extern boolean issubset(unsigned int* bipA, unsigned int* bipB, unsigned int vectorLen);
+extern boolean compatible(entry* e1, entry* e2, unsigned int bvlen);
+
+
+
+extern int *permutationSH(tree *tr, int nBootstrap, long _randomSeed);
+
+extern void checkPerSiteRates(const tree * const tr );
+
+extern void restart(tree *tr, analdef *adef);
+
+extern void writeCheckpoint(tree *tr, analdef *adef);
+
+extern boolean isGap(unsigned int *x, int pos);
+extern boolean noGap(unsigned int *x, int pos);
+
+extern void scaleLG4X_EIGN(tree *tr, int model);
+
+extern void myBinFwrite(void *ptr, size_t size, size_t nmemb, FILE *byteFile);
+extern void myBinFread(void *ptr, size_t size, size_t nmemb, FILE *byteFile);
+
+#ifdef __AVX
+
+extern void newviewGTRGAMMAPROT_AVX_LG4(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV[4], double *tipVector[4],
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling);
+
+extern void newviewGTRCAT_AVX(int tipCase, double *EV, int *cptr,
+ double *x1_start, double *x2_start, double *x3_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement);
+
+
+extern void newviewGTRCATPROT_AVX(int tipCase, double *extEV,
+ int *cptr,
+ double *x1, double *x2, double *x3, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement);
+
+
+extern void newviewGTRGAMMA_AVX(int tipCase,
+ double *x1_start, double *x2_start, double *x3_start,
+ double *EV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ const int n, double *left, double *right, int *wgt, int *scalerIncrement
+ );
+
+extern void newviewGTRGAMMAPROT_AVX(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *left, double *right, int *wgt, int *scalerIncrement);
+
+/* memory saving functions */
+
+void newviewGTRCAT_AVX_GAPPED_SAVE(int tipCase, double *EV, int *cptr,
+ double *x1_start, double *x2_start, double *x3_start, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats);
+
+void newviewGTRCATPROT_AVX_GAPPED_SAVE(int tipCase, double *extEV,
+ int *cptr,
+ double *x1, double *x2, double *x3, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats);
+
+void newviewGTRGAMMA_AVX_GAPPED_SAVE(int tipCase,
+ double *x1_start, double *x2_start, double *x3_start,
+ double *extEV, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ const int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn
+ );
+
+void newviewGTRGAMMAPROT_AVX_GAPPED_SAVE(int tipCase,
+ double *x1_start, double *x2_start, double *x3_start, double *extEV, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn);
+#endif
+
+
+
+/* from communication.c */
+void calculateLengthAndDisplPerProcess(tree *tr, int **length_result, int **disp_result);
+void scatterDistrbutedArray(tree *tr, void *src, void *destination, MPI_Datatype type, int *countPerProc, int *displPerProc);
+void gatherDistributedArray(tree *tr, void **destination, void *src, MPI_Datatype type, int* countPerProc, int *displPerProc);
+
+
+#endif
+
+
+
+#
diff --git a/examl/bipartitionList.c b/examl/bipartitionList.c
new file mode 100644
index 0000000..7e3e80d
--- /dev/null
+++ b/examl/bipartitionList.c
@@ -0,0 +1,592 @@
+/* RAxML-HPC, a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright March 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * stamatak at ics.forth.gr
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis: "An Efficient Program for phylogenetic Inference Using Simulated Annealing".
+ * Proceedings of IPDPS2005, Denver, Colorado, April 2005.
+ *
+ * AND
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <limits.h>
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdint.h>
+#include "axml.h"
+
+
+
+extern const unsigned int mask32[32];
+
+extern int processID;
+
+static void getxnodeBips (nodeptr p)
+{
+ nodeptr s;
+
+ if ((s = p->next)->xBips || (s = s->next)->xBips)
+ {
+ p->xBips = s->xBips;
+ s->xBips = 0;
+ }
+
+ assert(p->xBips);
+}
+
+
+entry *initEntry(void)
+{
+ entry *e = (entry*)malloc(sizeof(entry));
+
+ e->bitVector = (unsigned int*)NULL;
+ e->treeVector = (unsigned int*)NULL;
+ e->supportVector = (int*)NULL;
+ e->bipNumber = 0;
+ e->bipNumber2 = 0;
+ e->supportFromTreeset[0] = 0;
+ e->supportFromTreeset[1] = 0;
+ e->next = (entry*)NULL;
+
+ return e;
+}
+
+hashtable *initHashTable(hashNumberType n)
+{
+ /*
+ init with primes
+
+ static const hashNumberType initTable[] = {53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317,
+ 196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843,
+ 50331653, 100663319, 201326611, 402653189, 805306457, 1610612741};
+ */
+
+ /* init with powers of two */
+
+ static const hashNumberType initTable[] = {64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384,
+ 32768, 65536, 131072, 262144, 524288, 1048576, 2097152,
+ 4194304, 8388608, 16777216, 33554432, 67108864, 134217728,
+ 268435456, 536870912, 1073741824, 2147483648U};
+
+ hashtable *h = (hashtable*)malloc(sizeof(hashtable));
+
+ hashNumberType
+ tableSize,
+ i,
+ primeTableLength = sizeof(initTable)/sizeof(initTable[0]),
+ maxSize = (hashNumberType)-1;
+
+ assert(n <= maxSize);
+
+ i = 0;
+
+ while(initTable[i] < n && i < primeTableLength)
+ i++;
+
+ assert(i < primeTableLength);
+
+ tableSize = initTable[i];
+
+
+
+ h->table = (entry**)calloc(tableSize, sizeof(entry*));
+ h->tableSize = tableSize;
+ h->entryCount = 0;
+
+ return h;
+}
+
+
+
+
+void freeHashTable(hashtable *h)
+{
+ hashNumberType
+ i,
+ entryCount = 0;
+
+
+ for(i = 0; i < h->tableSize; i++)
+ {
+ if(h->table[i] != NULL)
+ {
+ entry *e = h->table[i];
+ entry *previous;
+
+ do
+ {
+ previous = e;
+ e = e->next;
+
+ if(previous->bitVector)
+ free(previous->bitVector);
+
+ if(previous->treeVector)
+ free(previous->treeVector);
+
+ if(previous->supportVector)
+ free(previous->supportVector);
+
+ free(previous);
+ entryCount++;
+ }
+ while(e != NULL);
+ }
+
+ }
+
+ assert(entryCount == h->entryCount);
+
+ free(h->table);
+}
+
+
+
+void cleanupHashTable(hashtable *h, int state)
+{
+ hashNumberType
+ k,
+ entryCount = 0,
+ removeCount = 0;
+
+ assert(state == 1 || state == 0);
+
+ for(k = 0, entryCount = 0; k < h->tableSize; k++)
+ {
+ if(h->table[k] != NULL)
+ {
+ entry *e = h->table[k];
+ entry *start = (entry*)NULL;
+ entry *lastValid = (entry*)NULL;
+
+ do
+ {
+ if(state == 0)
+ {
+ e->treeVector[0] = e->treeVector[0] & 2;
+ assert(!(e->treeVector[0] & 1));
+ }
+ else
+ {
+ e->treeVector[0] = e->treeVector[0] & 1;
+ assert(!(e->treeVector[0] & 2));
+ }
+
+ if(e->treeVector[0] != 0)
+ {
+ if(!start)
+ start = e;
+ lastValid = e;
+ e = e->next;
+ }
+ else
+ {
+ entry *remove = e;
+ e = e->next;
+
+ removeCount++;
+
+ if(lastValid)
+ lastValid->next = remove->next;
+
+ if(remove->bitVector)
+ free(remove->bitVector);
+ if(remove->treeVector)
+ free(remove->treeVector);
+ if(remove->supportVector)
+ free(remove->supportVector);
+ free(remove);
+ }
+
+ entryCount++;
+ }
+ while(e != NULL);
+
+ if(!start)
+ {
+ assert(!lastValid);
+ h->table[k] = NULL;
+ }
+ else
+ {
+ h->table[k] = start;
+ }
+ }
+ }
+
+ assert(entryCount == h->entryCount);
+
+ h->entryCount -= removeCount;
+}
+
+
+
+
+
+
+
+
+
+
+
+unsigned int **initBitVector(int mxtips, unsigned int *vectorLength)
+{
+ unsigned int
+ **bitVectors = (unsigned int **)malloc(sizeof(unsigned int*) * 2 * mxtips);
+
+ int
+ i;
+
+ if(mxtips % MASK_LENGTH == 0)
+ *vectorLength = mxtips / MASK_LENGTH;
+ else
+ *vectorLength = 1 + (mxtips / MASK_LENGTH);
+
+ for(i = 1; i <= mxtips; i++)
+ {
+ bitVectors[i] = (unsigned int *)calloc(*vectorLength, sizeof(unsigned int));
+ assert(bitVectors[i]);
+ bitVectors[i][(i - 1) / MASK_LENGTH] |= mask32[(i - 1) % MASK_LENGTH];
+ }
+
+ for(i = mxtips + 1; i < 2 * mxtips; i++)
+ {
+ bitVectors[i] = (unsigned int *)malloc(sizeof(unsigned int) * *vectorLength);
+ assert(bitVectors[i]);
+ }
+
+ return bitVectors;
+}
+
+void freeBitVectors(unsigned int **v, int n)
+{
+ int i;
+
+ for(i = 1; i < n; i++)
+ free(v[i]);
+}
+
+
+
+
+
+static void newviewBipartitions(unsigned int **bitVectors, nodeptr p, int numsp, unsigned int vectorLength)
+{
+
+ if(isTip(p->number, numsp))
+ return;
+ {
+ nodeptr
+ q = p->next->back,
+ r = p->next->next->back;
+
+
+
+ unsigned int
+ *vector = bitVectors[p->number],
+ *left = bitVectors[q->number],
+ *right = bitVectors[r->number];
+ unsigned
+ int i;
+
+ assert(processID == 0);
+
+
+ while(!p->xBips)
+ {
+ if(!p->xBips)
+ getxnodeBips(p);
+ }
+
+ p->hash = q->hash ^ r->hash;
+
+ if(isTip(q->number, numsp) && isTip(r->number, numsp))
+ {
+ for(i = 0; i < vectorLength; i++)
+ vector[i] = left[i] | right[i];
+ }
+ else
+ {
+ if(isTip(q->number, numsp) || isTip(r->number, numsp))
+ {
+ if(isTip(r->number, numsp))
+ {
+ nodeptr tmp = r;
+ r = q;
+ q = tmp;
+ }
+
+ while(!r->xBips)
+ {
+ if(!r->xBips)
+ newviewBipartitions(bitVectors, r, numsp, vectorLength);
+ }
+
+ for(i = 0; i < vectorLength; i++)
+ vector[i] = left[i] | right[i];
+ }
+ else
+ {
+ while((!r->xBips) || (!q->xBips))
+ {
+ if(!q->xBips)
+ newviewBipartitions(bitVectors, q, numsp, vectorLength);
+ if(!r->xBips)
+ newviewBipartitions(bitVectors, r, numsp, vectorLength);
+ }
+
+ for(i = 0; i < vectorLength; i++)
+ vector[i] = left[i] | right[i];
+ }
+
+ }
+ }
+}
+
+
+
+
+static void insertHashRF(unsigned int *bitVector, hashtable *h, unsigned int vectorLength, int treeNumber, int treeVectorLength, hashNumberType position, int support,
+ boolean computeWRF)
+{
+ if(h->table[position] != NULL)
+ {
+ entry *e = h->table[position];
+
+ do
+ {
+ unsigned int i;
+
+ for(i = 0; i < vectorLength; i++)
+ if(bitVector[i] != e->bitVector[i])
+ break;
+
+ if(i == vectorLength)
+ {
+ e->treeVector[treeNumber / MASK_LENGTH] |= mask32[treeNumber % MASK_LENGTH];
+ if(computeWRF)
+ {
+ e->supportVector[treeNumber] = support;
+
+ assert(0 <= treeNumber && treeNumber < treeVectorLength * MASK_LENGTH);
+ }
+ return;
+ }
+
+ e = e->next;
+ }
+ while(e != (entry*)NULL);
+
+ e = initEntry();
+
+ /*e->bitVector = (unsigned int*)calloc(vectorLength, sizeof(unsigned int));*/
+ e->bitVector = (unsigned int*)malloc_aligned(vectorLength * sizeof(unsigned int));
+ memset(e->bitVector, 0, vectorLength * sizeof(unsigned int));
+
+
+ e->treeVector = (unsigned int*)calloc(treeVectorLength, sizeof(unsigned int));
+ if(computeWRF)
+ e->supportVector = (int*)calloc(treeVectorLength * MASK_LENGTH, sizeof(int));
+
+ e->treeVector[treeNumber / MASK_LENGTH] |= mask32[treeNumber % MASK_LENGTH];
+ if(computeWRF)
+ {
+ e->supportVector[treeNumber] = support;
+
+ assert(0 <= treeNumber && treeNumber < treeVectorLength * MASK_LENGTH);
+ }
+
+ memcpy(e->bitVector, bitVector, sizeof(unsigned int) * vectorLength);
+
+ e->next = h->table[position];
+ h->table[position] = e;
+ }
+ else
+ {
+ entry *e = initEntry();
+
+ /*e->bitVector = (unsigned int*)calloc(vectorLength, sizeof(unsigned int)); */
+
+ e->bitVector = (unsigned int*)malloc_aligned(vectorLength * sizeof(unsigned int));
+ memset(e->bitVector, 0, vectorLength * sizeof(unsigned int));
+
+ e->treeVector = (unsigned int*)calloc(treeVectorLength, sizeof(unsigned int));
+ if(computeWRF)
+ e->supportVector = (int*)calloc(treeVectorLength * MASK_LENGTH, sizeof(int));
+
+
+ e->treeVector[treeNumber / MASK_LENGTH] |= mask32[treeNumber % MASK_LENGTH];
+ if(computeWRF)
+ {
+ e->supportVector[treeNumber] = support;
+
+ assert(0 <= treeNumber && treeNumber < treeVectorLength * MASK_LENGTH);
+ }
+
+ memcpy(e->bitVector, bitVector, sizeof(unsigned int) * vectorLength);
+
+ h->table[position] = e;
+ }
+
+ h->entryCount = h->entryCount + 1;
+}
+
+
+
+void bitVectorInitravSpecial(unsigned int **bitVectors, nodeptr p, int numsp, unsigned int vectorLength, hashtable *h, int treeNumber, int function, branchInfo *bInf,
+ int *countBranches, int treeVectorLength, boolean traverseOnly, boolean computeWRF)
+{
+ if(isTip(p->number, numsp))
+ return;
+ else
+ {
+ nodeptr
+ q = p->next;
+
+ do
+ {
+ bitVectorInitravSpecial(bitVectors, q->back, numsp, vectorLength, h, treeNumber, function, bInf, countBranches, treeVectorLength, traverseOnly, computeWRF);
+ q = q->next;
+ }
+ while(q != p);
+
+ newviewBipartitions(bitVectors, p, numsp, vectorLength);
+
+ assert(p->xBips);
+
+ assert(!traverseOnly);
+
+ if(!(isTip(p->back->number, numsp)))
+ {
+ unsigned int
+ *toInsert = bitVectors[p->number];
+
+ hashNumberType
+ position = p->hash % h->tableSize;
+
+ assert(!(toInsert[0] & 1));
+ assert(!computeWRF);
+
+ switch(function)
+ {
+ case BIPARTITIONS_RF:
+ insertHashRF(toInsert, h, vectorLength, treeNumber, treeVectorLength, position, 0, computeWRF);
+ *countBranches = *countBranches + 1;
+ break;
+ default:
+ assert(0);
+ }
+ }
+
+ }
+}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+double convergenceCriterion(hashtable *h, int mxtips)
+{
+ int
+ rf = 0;
+
+ unsigned int
+ collisions = 0,
+ k = 0,
+ entryCount = 0;
+
+ double
+ rrf;
+
+ for(k = 0, entryCount = 0; k < h->tableSize; k++)
+ {
+ if(h->table[k] != NULL)
+ {
+ entry *e = h->table[k];
+
+ unsigned int
+ slotCollisions = 0;
+
+ do
+ {
+ unsigned int *vector = e->treeVector;
+ if(((vector[0] & 1) > 0) + ((vector[0] & 2) > 0) == 1)
+ rf++;
+
+ entryCount++;
+ slotCollisions++;
+ e = e->next;
+ }
+ while(e != NULL);
+
+ collisions += (slotCollisions - 1);
+ }
+ }
+
+ assert(entryCount == h->entryCount);
+
+ rrf = (double)rf/((double)(2 * (mxtips - 3)));
+
+#ifdef _DEBUG_CHECKPOINTING
+ printf("Collisions: %u\n", collisions);
+#endif
+
+ return rrf;
+}
+
+
+
+
diff --git a/examl/byteFile.c b/examl/byteFile.c
new file mode 100644
index 0000000..49453f0
--- /dev/null
+++ b/examl/byteFile.c
@@ -0,0 +1,435 @@
+#include <string.h>
+
+#if defined(__APPLE__)
+#include <malloc/malloc.h>
+#else
+#include <malloc.h>
+#endif
+
+#include "byteFile.h"
+#include <stdlib.h>
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+#define READ_VAR(file,var) assert( fread(&var, sizeof(var),1, file ) == 1 )
+#define READ_ARRAY(file, arrPtr, numElem, size) assert( fread(arrPtr, size, numElem, file) == (unsigned int) numElem)
+
+extern int processID;
+
+/**
+ seekPos finds the position in the byte file where a certain type
+ of information is stored. See byteFile.h for possible values of
+ "pos" .
+
+ Notice, that this is a "fall-through" switch statement: if -- for
+ instance -- we want to get to the position of the taxa, we have to
+ skip everything that comes prior to the taxa in the file (but
+ naturally not the taxa themselves).
+ */
+static void seekPos(ByteFile *bf, int pos)
+{
+ exa_off_t
+ toSkip = 0;
+
+ int
+ i;
+
+ switch(pos)
+ {
+ case ALN_ALIGNMENT: /* skips partitions */
+ {
+ assert(bf->hasRead & ALN_PARTITIONS);
+ pInfo p ;
+
+ toSkip += bf->numPartitions * ( sizeof(p.states) + sizeof(p.maxTipStates) + sizeof(p.lower)
+ + sizeof(p.upper) + sizeof(p.width) + sizeof(p.dataType) + sizeof(p.protModels)
+ + sizeof(p.protFreqs) + sizeof(p.nonGTR) + sizeof(p.optimizeBaseFrequencies));
+
+ /* skip the names and their lengths */
+ for( i = 0 ; i < bf->numPartitions; ++i)
+ {
+ pInfo *p = bf->partitions[i];
+ toSkip += (strlen(p->partitionName)+1 ) * sizeof(char) + sizeof(int);
+ toSkip += sizeof(double) * p->states; /* also skip frequncies */
+ }
+ }
+ case ALN_PARTITIONS: /* skips taxa */
+ {
+ assert(bf->hasRead & ALN_TAXA);
+ for(i = 0; i < bf->numTax; ++i)
+ toSkip += (strlen(bf->taxaNames[i]) + 1) * sizeof(char) + sizeof(int);
+ }
+ case ALN_TAXA: /* skips weights */
+ {
+ assert(bf->hasRead & ALN_HEAD);
+ toSkip += bf->numPattern * sizeof(int);
+ }
+ case ALN_WEIGHTS: /* skips header */
+ {
+ toSkip +=
+ sizeof(bf->numTax) + sizeof(bf->numPattern)
+ + sizeof(bf->numPartitions) + sizeof(bf->gappyness);
+ }
+ case ALN_HEAD :
+ toSkip += (3 * sizeof(int)); /* skips the initial int that tells us how many bytes a size_t has as well as the integer for the version number and the magic integer number */
+ break;
+ default :
+ assert(0);
+ }
+
+ exa_fseek(bf->fh, toSkip, SEEK_SET);
+}
+
+/**
+ initializes ByteFile **bf
+ */
+void initializeByteFile(ByteFile **bf, char *name)
+{
+ *bf = (ByteFile *)calloc(1,sizeof(ByteFile));
+ ByteFile *result = *bf;
+ result->fh = myfopen(name, "rb");
+
+ int
+ sizeOfSizeT = 0,
+ version = 0,
+ magicNumber = 0;
+
+ READ_VAR(result->fh, sizeOfSizeT);
+
+ if(sizeOfSizeT != sizeof(size_t))
+ {
+ if(processID == 0)
+ {
+ printf("\nError: the address data type has a size of %d bits on the current system while on the system on which you created the binary alignment file using the parser the address size is %d bits!\n",
+ 8 * (int)sizeof(size_t), 8 * sizeOfSizeT);
+ printf("Usually this indicates that the parser was executed on a 32-bit system while you are trying to run ExaML on a 64-bit system.\n");
+ printf("Please parse the binary alignment file on the same hardware on which you intend to run ExaML.\n\n\n");
+ }
+
+ MPI_Barrier(MPI_COMM_WORLD);
+ MPI_Finalize();
+ exit(-1);
+ }
+
+ //check that version numbers of parser and ExaML match
+ READ_VAR(result->fh, version);
+
+ if(version != (int)programVersionInt)
+ {
+ if(processID == 0)
+ {
+ printf("\nError: Version number %d of ExaML parser and version number %d of ExaML don't match.\n", version, (int)programVersionInt);
+ printf("You are either using an outdated version of the parser or of ExaML.\n");
+ printf("Hasta siempre comandante.\n\n\n");
+ }
+
+ MPI_Barrier(MPI_COMM_WORLD);
+ MPI_Finalize();
+ exit(-1);
+ }
+
+ READ_VAR(result->fh, magicNumber);
+
+ if(magicNumber != 6517718)
+ {
+ if(processID == 0)
+ {
+ printf("\nError: The magic number %d of ExaML parser and magic number %d of ExaML don't match.\n", magicNumber, 6517718);
+ printf("Something went terribly wrong here.\n");
+ printf("Hasta la victoria siempre.\n\n\n");
+ }
+
+ MPI_Barrier(MPI_COMM_WORLD);
+ MPI_Finalize();
+ exit(-1);
+ }
+
+}
+
+
+/**
+ a shallow cleanup of ByteFile *bf. Notice, that various data may
+ have been copied (by pointer value) to our tree instance and
+ therefore should not be clean up.
+ */
+void deleteByteFile(ByteFile *bf)
+{
+ /* only a shallow free! pointers inside the pInfo must persist */
+ int i;
+
+ if(bf->partitions)
+ {
+ for( i = 0; i < bf->numPartitions; ++i)
+ free(bf->partitions[i]);
+ free(bf->partitions);
+ }
+
+ if(bf->fh)
+ fclose(bf->fh);
+
+ if(bf->taxaNames )
+ {
+ for(i = 0; i < bf->numTax; ++i)
+ free(bf->taxaNames[i] );
+ }
+ free(bf->taxaNames);
+ free(bf);
+}
+
+
+
+
+/**
+ only reads initial header information
+ */
+void readHeader(ByteFile* bf)
+{
+ seekPos(bf, ALN_HEAD);
+ READ_VAR(bf->fh, bf->numTax);
+ READ_VAR(bf->fh, bf->numPattern);
+ READ_VAR(bf->fh, bf->numPartitions);
+ READ_VAR(bf->fh, bf->gappyness) ;
+ bf->hasRead |= ALN_HEAD;
+
+}
+
+
+/**
+ reads partition information from the byte file.
+ */
+void readPartitions(ByteFile *bf)
+{
+ int i ;
+
+ seekPos(bf, ALN_PARTITIONS);
+
+ assert(bf->partitions == (pInfo **)NULL);
+ bf->partitions = (pInfo **)calloc(bf->numPartitions, sizeof(pInfo*) );
+ for(i = 0; i < bf->numPartitions; ++i)
+ {
+ bf->partitions[i] = (pInfo*)calloc(1,sizeof(pInfo));
+ pInfo* p = bf->partitions[i];
+
+ p->frequencies = (double*)NULL;
+ p->partitionName = (char *)NULL;
+
+ READ_VAR(bf->fh, p->states);
+ READ_VAR(bf->fh, p->maxTipStates);
+ READ_VAR(bf->fh, p->lower);
+ READ_VAR(bf->fh, p->upper);
+
+ /* DONT use this value! */
+ READ_VAR(bf->fh, p->width);
+ p->width = 0;
+
+ READ_VAR(bf->fh, p->dataType);
+ READ_VAR(bf->fh, p->protModels);
+ //READ_VAR(bf->fh, p->autoProtModels);
+ READ_VAR(bf->fh, p->protFreqs);
+ READ_VAR(bf->fh, p->nonGTR);
+ READ_VAR(bf->fh, p->optimizeBaseFrequencies);
+ // READ_VAR(bf->fh, p->numberOfCategories);
+
+ /* read string */
+ unsigned int len = 0;
+ READ_VAR(bf->fh, len);
+ p->partitionName = (char*)calloc(len,sizeof(char));
+ READ_ARRAY(bf->fh, p->partitionName, len, sizeof(char));
+
+ p->frequencies = (double*)calloc(p->states, sizeof(double));
+ READ_ARRAY(bf->fh, p->frequencies, p->states , sizeof(double));
+ }
+
+ bf->hasRead |= ALN_PARTITIONS;
+}
+
+
+/**
+ reads the taxon names from the byte file
+ */
+void readTaxa(ByteFile *bf)
+{
+ int i;
+
+ assert(bf->taxaNames == (char **)NULL);
+ seekPos(bf, ALN_TAXA);
+
+ bf->taxaNames = (char **)calloc(bf->numTax, sizeof(char*));
+ for(i = 0; i < bf->numTax; ++i)
+ {
+ int len = 0;
+ READ_VAR(bf->fh, len );
+ bf->taxaNames[i] = (char*)calloc(len, sizeof(char));
+ READ_ARRAY(bf->fh, bf->taxaNames[i], len, sizeof(char));
+ }
+
+ bf->hasRead |= ALN_TAXA;
+}
+
+
+ // #define OLD_LAYOUT
+
+/**
+ uses the information in the PartitionAssignment to only extract
+ data relevant to this process (weights and alignment characters).
+ */
+void readMyData(ByteFile *bf, PartitionAssignment *pa, int procId)
+{
+ seekPos(bf, ALN_ALIGNMENT);
+
+ exa_off_t
+ alnPos = exa_ftell(bf->fh);
+
+ size_t
+ len;
+
+ int numAssign = pa->numAssignPerProc[procId];
+ Assignment *myAssigns = pa->assignPerProc[procId];
+
+ /* first read aln characters */
+ int i,j ;
+ for(i = 0; i < numAssign; ++i )
+ {
+ Assignment a = myAssigns[i];
+ /* printf("reading for: ") ; */
+ /* printAssignment(a, procId); */
+
+ pInfo *partition = bf->partitions[a.partId];
+ partition->width = a.width;
+ partition->offset = a.offset;
+ len = bf->numTax * a.width;
+ partition->yResource = (unsigned char*)malloc_aligned( len * sizeof(unsigned char));
+ memset(partition->yResource,0,len * sizeof(unsigned char));
+ partition->yVector = (unsigned char**) calloc(bf->numTax + 1 , sizeof(unsigned char*));
+ for(j = 1; j <= bf->numTax; ++j)
+ partition->yVector[j] = partition->yResource + (j-1) * a.width;
+
+#ifdef OLD_LAYOUT
+ for(j = 1; j <= bf->numTax; ++j )
+ {
+ exa_off_t pos = alnPos + ( bf->numPattern * (j-1) + partition->lower + a.offset ) * sizeof(unsigned char);
+ assert(alnPos <= pos);
+ exa_fseek(bf->fh, pos, SEEK_SET);
+ READ_ARRAY(bf->fh, partition->yVector[j], a.width, sizeof(unsigned char));
+ }
+#else
+ /* if the entire partition is assigned to this process, read it
+ in one go. Otherwise, several seeks are necessary. */
+ if( a.width == (partition->upper - partition->lower ) )
+ {
+ exa_off_t
+ pos = alnPos + (partition->lower * bf->numTax) * sizeof(unsigned char);
+
+ assert(alnPos <= pos);
+ exa_fseek(bf->fh, pos, SEEK_SET);
+ READ_ARRAY(bf->fh, partition->yResource, a.width * bf->numTax, sizeof(unsigned char));
+ }
+ else
+ {
+ for(j = 1; j <= bf->numTax; ++j )
+ {
+ exa_off_t
+ pos = alnPos + sizeof(unsigned char)
+ * (
+ (partition->lower * bf->numTax ) /* until start of partition */
+ + ((j-1) * (partition->upper - partition->lower) ) /* until start of sequence of taxon within partition */
+ + a.offset ) ;
+
+ assert(alnPos <= pos);
+ exa_fseek(bf->fh, pos, SEEK_SET);
+ READ_ARRAY(bf->fh, partition->yVector[j], a.width, sizeof(unsigned char));
+ }
+ }
+#endif
+ }
+
+
+ /* now read weights */
+ seekPos(bf, ALN_WEIGHTS);
+
+ exa_off_t
+ wgtPos = exa_ftell(bf->fh);
+ assert( ! (wgtPos < 0) );
+
+ for(i = 0; i < numAssign; ++i)
+ {
+ Assignment a = myAssigns[i];
+ pInfo *partition = bf->partitions[a.partId];
+
+#ifdef __MIC_NATIVE
+ /* for Xeon Phi, wgt must be padded to the multiple of 8 (because of site blocking in kernels) */
+ const int padded_width = GET_PADDED_WIDTH(a.width);
+ len = padded_width * sizeof(int);
+#else
+ len = a.width * sizeof(int);
+#endif
+
+ partition->wgt = (int*)malloc_aligned( len);
+ memset(partition->wgt, 0, len);
+
+ exa_off_t pos = wgtPos + (partition->lower + a.offset) * sizeof(int);
+ assert(wgtPos <= pos );
+
+ exa_fseek(bf->fh, pos, SEEK_SET);
+ READ_ARRAY(bf->fh, partition->wgt, a.width, sizeof(int));
+
+ }
+
+ bf->hasRead |= ALN_ALIGNMENT;
+ bf->hasRead |= ALN_WEIGHTS;
+}
+
+
+/**
+ copies all relevant information from our byte file to the tree
+ instance.
+ */
+void initializeTreeFromByteFile(ByteFile *bf, tree *tr)
+{
+ assert( ( bf->hasRead & ALN_HEAD )
+ && (bf->hasRead & ALN_WEIGHTS)
+ && (bf->hasRead & ALN_TAXA)
+ && (bf->hasRead & ALN_PARTITIONS)
+ && (bf->hasRead & ALN_ALIGNMENT ) );
+
+ /* some additional stuff we read */
+ tr->mxtips = bf->numTax;
+ tr->originalCrunchedLength = bf->numPattern;
+ tr->NumberOfModels = bf->numPartitions;
+ tr->gapyness = bf->gappyness;
+
+ /* deep copy of taxa */
+ int i ;
+ tr->nameList = (char **)calloc((size_t)(tr->mxtips + 1), sizeof(char *) );
+
+ tr->nameList[0] = (char *)NULL;
+
+ for(i = 1; i <= bf->numTax; ++i)
+ {
+ tr->nameList[i] = (char*)calloc(strlen(bf->taxaNames[i-1]) + 1, sizeof(char));
+ strcpy(tr->nameList[i], bf->taxaNames[i-1]);
+ }
+
+ /*
+ * shallow copy of partitions
+ *
+ * partition contains only shallow copies of a few data arrays that
+ * needed to be initialized at this point
+ */
+ int
+ myLength = 0;
+
+ tr->partitionData = (pInfo*)calloc(tr->NumberOfModels, sizeof(pInfo));
+
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ {
+ tr->partitionData[i] = *(bf->partitions[i]);
+ myLength += tr->partitionData[i].width;
+ assert( bf->partitions[i]->wgt != (int*)NULL || bf->partitions[i]->width == 0);
+ assert( ( tr->partitionData[i].wgt != (int*)NULL) || ( tr->partitionData[i].width == 0 ) );
+ }
+}
+
+
diff --git a/examl/byteFile.h b/examl/byteFile.h
new file mode 100644
index 0000000..c4306ae
--- /dev/null
+++ b/examl/byteFile.h
@@ -0,0 +1,60 @@
+#ifndef _BYTE_FILE
+#define _BYTE_FILE
+
+#include "axml.h"
+
+#include "partitionAssignment.h"
+
+#define ALN_HEAD 1
+#define ALN_WEIGHTS 2
+#define ALN_TAXA 4
+#define ALN_PARTITIONS 8
+#define ALN_ALIGNMENT 16
+
+
+
+typedef struct
+{
+ int numTax;
+ size_t numPattern;
+ int numPartitions;
+ double gappyness;
+ pInfo **partitions;
+ char **taxaNames;
+ FILE *fh;
+ char hasRead ;
+} ByteFile;
+
+/*
+ constructor
+*/
+void initializeByteFile(ByteFile **bf, char *name);
+/*
+ deletor
+*/
+void deleteByteFile(ByteFile *bf) ;
+/*
+ reads the header of a byte file
+*/
+void readHeader(ByteFile* bf);
+/*
+ reads partition information in a byte file
+*/
+void readPartitions(ByteFile *bf);
+/*
+ reads the taxon names in a byte file
+*/
+void readTaxa(ByteFile *bf);
+/*
+ reads weights and alignment characters in a byte file
+*/
+void readMyData(ByteFile *bf, PartitionAssignment *pa, int procId);
+/*
+ initializes a tree from a byte file.
+
+ @notice Since shallow copies are involved, you cannot copy the
+ information from a byte file into multiple tree instances.
+ */
+void initializeTreeFromByteFile(ByteFile *bf, tree *tr);
+
+#endif
diff --git a/examl/communication.c b/examl/communication.c
new file mode 100644
index 0000000..d700edc
--- /dev/null
+++ b/examl/communication.c
@@ -0,0 +1,182 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+
+#include <mpi.h>
+
+#include "axml.h"
+
+
+extern int processes;
+extern int processID;
+
+
+
+/**
+ computes the count and displacement for gatherv/scatterv, assuming
+ that the new partition assignment algorithm was used
+*/
+void calculateLengthAndDisplPerProcess(tree *tr, int **length_result, int **disp_result)
+{
+ int i;
+
+ *length_result = (int*) calloc((size_t) processes , sizeof(int));
+ *disp_result = (int*) calloc((size_t) processes, sizeof(int));
+
+ int* numPerProc = *length_result;
+ int* displPerProc= *disp_result;
+
+ for(i = 0; i < tr->numAssignments; ++i)
+ {
+ Assign* ass = &(tr->partAssigns[i]);
+ numPerProc[ass->procId] += ass->width;
+ }
+
+ displPerProc[0] = 0;
+ for(i = 1; i < processes ; ++i)
+ displPerProc[i] = displPerProc[i-1] + numPerProc[i-1];
+}
+
+
+static size_t mapMpiTypeToSize(MPI_Datatype type)
+{
+ if(type == MPI_INT)
+ return sizeof(int);
+ else if(type == MPI_DOUBLE)
+ return sizeof(double);
+ else
+ {
+ assert(0);
+ return 0;
+ }
+}
+
+
+/**
+ scatters a distributed array (e.g., what used to be
+ tr->rateCategory) to partition-specfic arrays (e.g.,
+ tr->partitionData[i].rateCategory).
+
+ This works, because tr->partitionData[i].rateCategory is a
+ non-owning pointer to a position in the global resource array
+ (e.g., tr->rateCategory_basePtr).
+*/
+void scatterDistrbutedArray(tree *tr, void *src, void *destination, MPI_Datatype type, int *countPerProc, int *displPerProc)
+{
+ int
+ i;
+
+ size_t
+ typeLen = mapMpiTypeToSize(type);
+
+ char
+ *srcReordered = (char *)NULL;
+
+ /* master must reorder the data */
+ if(processID == 0)
+ {
+ srcReordered = (char *)malloc(tr->originalCrunchedLength * typeLen);
+ int *seenPerProcesses = (int *)calloc((size_t) processes, sizeof(int));
+
+ Assign *aIter = tr->partAssigns;
+ Assign *aEnd = &(tr->partAssigns[ tr->numAssignments ] );
+
+ while(aIter != aEnd)
+ {
+ pInfo *partition = &(tr->partitionData[ aIter->partitionId ]) ;
+ memcpy( srcReordered + ( (size_t) displPerProc[aIter->procId] + (size_t) seenPerProcesses[aIter->procId] ) * typeLen ,
+ ((char*) src) + (partition->lower + aIter->offset) * typeLen,
+ aIter->width * typeLen);
+ seenPerProcesses[aIter->procId] += aIter->width;
+ ++aIter;
+ }
+
+ for(i = 0; i < processes; ++i)
+ assert(seenPerProcesses[i] == countPerProc[i]) ;
+
+ free(seenPerProcesses);
+ }
+
+ MPI_Scatterv(srcReordered, countPerProc, displPerProc, type, destination, countPerProc[processID], type, 0, MPI_COMM_WORLD);
+
+ /* after this scatter, every process already has the data correctly
+ ordered at its repective base pointer */
+
+ if(processID == 0)
+ free(srcReordered);
+}
+
+
+/**
+ gathers a distributed array (e.g., what used to be
+ tr->rateCategory) to partition-specfic arrays (e.g.,
+ tr->partitionData[i].rateCategory).
+
+ This works, because tr->partitionData[i].rateCategory is a
+ non-owning pointer to a position in the global resource array
+ (e.g., tr->rateCategory_basePtr).
+*/
+void gatherDistributedArray(tree *tr, void **destinationPtr, void *src, MPI_Datatype type, int* countPerProc, int *displPerProc)
+{
+ /* this is the raw array that the master will obtain from his
+ peers. Data in this arrays are ordered per process */
+ char
+ *destinationUnordered = (char*)NULL;
+
+ char
+ *destination = (char*)NULL;
+
+ size_t
+ typeLen = mapMpiTypeToSize(type);
+
+ if(processID == 0)
+ {
+ //TODO one pointer is of type void the other of type char, not really nice
+ *destinationPtr = (void *)malloc( tr->originalCrunchedLength * typeLen);
+ destinationUnordered = (char *)malloc( tr->originalCrunchedLength * typeLen);
+ destination = *destinationPtr;
+ }
+
+ MPI_Gatherv(src, countPerProc[processID], type, destinationUnordered, countPerProc, displPerProc, type,0 , MPI_COMM_WORLD );
+
+ /*
+ here the master reorders the array it has obtained. Afterwards,
+ destinationPtr is a pointer to the array that contains the global
+ array that can be indexed by alignment position (i.e., if we have
+ gathered tr->partitionData[i].lhs, then *destinationPtr
+ corresponds to what previously was tr->lhs). This strongly couples
+ the respective distributed array to tr->partAssigns.
+ */
+ if(processID == 0)
+ {
+ int
+ i,
+ *seenPerProcesses = (int*) calloc(processes, sizeof(int));
+
+ Assign
+ *aIter = tr->partAssigns;
+
+ Assign
+ *aEnd = tr->partAssigns + tr->numAssignments;
+
+ while(aIter != aEnd)
+ {
+ pInfo
+ *partition = &(tr->partitionData[aIter->partitionId]);
+
+ memcpy(destination + (size_t) (partition->lower + aIter->offset) * typeLen,
+ destinationUnordered + (size_t) (displPerProc[aIter->procId] + seenPerProcesses[aIter->procId]) * typeLen ,
+ typeLen * aIter->width);
+ seenPerProcesses[aIter->procId] += aIter->width;
+ ++aIter ;
+ }
+
+ /* check, if everything has been reordered */
+ for(i = 0; i < processes; ++i)
+ assert(seenPerProcesses[i] == countPerProc[i]);
+
+ free(seenPerProcesses);
+ free(destinationUnordered);
+ }
+}
diff --git a/examl/evaluateGenericSpecial.c b/examl/evaluateGenericSpecial.c
new file mode 100644
index 0000000..27388f5
--- /dev/null
+++ b/examl/evaluateGenericSpecial.c
@@ -0,0 +1,2083 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include "axml.h"
+
+/* the set of functions in here computes the log likelihood at a given branch (the virtual root of a tree) */
+
+/* includes for using SSE3 intrinsics */
+
+#ifdef __SIM_SSE3
+#include <xmmintrin.h>
+#include <pmmintrin.h>
+/*#include <tmmintrin.h>*/
+#endif
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+
+/*
+ global variables of pthreads version, reductionBuffer is the global array
+ that is used for implementing deterministic reduction operations, that is,
+ the total log likelihood over the partial log lieklihoods for the sites that each thread has computed
+
+ NumberOfThreads is just the number of threads.
+
+ Note the volatile modifier here, that guarantees that the compiler will not do weird optimizations
+ rearraengements of the code accessing those variables, because it does not know that several concurrent threads
+ will access those variables simulatenously
+*/
+
+
+extern const char inverseMeaningDNA[16];
+extern int processID;
+
+/* a pre-computed 32-bit integer mask */
+
+extern const unsigned int mask32[32];
+
+/* the function below computes the P matrix from the decomposition of the Q matrix and the respective rate categories for a single partition */
+
+
+static void calcDiagptable(const double z, const int states, const int numberOfCategories, const double *rptr, const double *EIGN, double *diagptable)
+{
+ int
+ i,
+ l;
+
+ double
+ lz,
+ *lza = (double *)malloc(sizeof(double) * states);
+
+ /* transform the root branch length to the log and check if it is not too small */
+
+ if (z < zmin)
+ lz = log(zmin);
+ else
+ lz = log(z);
+
+ /* do some pre-computations to avoid redundant computations further below */
+
+ for(i = 0; i < states; i++)
+ lza[i] = EIGN[i] * lz;
+
+ /* loop over the number of per-site or discrete gamma rate categories */
+
+ for(i = 0; i < numberOfCategories; i++)
+ {
+ /*
+ diagptable is a pre-allocated array of doubles that stores the P-Matrix
+ the first entry is always 1.0
+ */
+ diagptable[i * states] = 1.0;
+
+ /* compute the P matrix for all remaining states of the model */
+
+ for(l = 1; l < states; l++)
+ diagptable[i * states + l] = EXP(rptr[i] * lza[l]);
+ }
+
+ free(lza);
+}
+
+
+static void calcDiagptableFlex_LG4(double z, int numberOfCategories, double *rptr, double *EIGN[4], double *diagptable, const int numStates)
+{
+ int
+ i,
+ l;
+
+ double
+ lz;
+
+ assert(numStates <= 64);
+
+ if (z < zmin)
+ lz = log(zmin);
+ else
+ lz = log(z);
+
+ for(i = 0; i < numberOfCategories; i++)
+ {
+ diagptable[i * numStates + 0] = 1.0;
+
+ for(l = 1; l < numStates; l++)
+ diagptable[i * numStates + l] = EXP(rptr[i] * EIGN[i][l] * lz);
+ }
+}
+
+
+
+
+#ifndef _OPTIMIZED_FUNCTIONS
+
+/* below a a slow generic implementation of the likelihood computation at the root under the GAMMA model */
+
+static double evaluateGAMMA_FLEX(int *wptr,
+ double *x1_start, double *x2_start,
+ double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable, const int states)
+{
+ double
+ sum = 0.0,
+ term,
+ *x1,
+ *x2;
+
+ int
+ i,
+ j,
+ k;
+
+ /* span is the offset within the likelihood array at an inner node that gets us from the values
+ of site i to the values of site i + 1 */
+
+ const int
+ span = states * 4;
+
+ /* we distingusih between two cases here: one node of the two nodes defining the branch at which we put the virtual root is
+ a tip. Both nodes can not be tips because we do not allow for two-taxon trees ;-)
+ Nota that, if a node is a tip, this will always be tipX1. This is done for code simplicity and the flipping of the nodes
+ is done before when we compute the traversal descriptor.
+ */
+
+ /* the left node is a tip */
+ if(tipX1)
+ {
+ /* loop over the sites of this partition */
+ for (i = 0; i < n; i++)
+ {
+ /* access pre-computed tip vector values via a lookup table */
+ x1 = &(tipVector[states * tipX1[i]]);
+ /* access the other(inner) node at the other end of the branch */
+ x2 = &(x2_start[span * i]);
+
+ /* loop over GAMMA rate categories, hard-coded as 4 in RAxML */
+ for(j = 0, term = 0.0; j < 4; j++)
+ /* loop over states and multiply them with the P matrix */
+ for(k = 0; k < states; k++)
+ term += x1[k] * x2[j * states + k] * diagptable[j * states + k];
+
+ /* take the log of the likelihood and multiply the per-gamma rate likelihood by 1/4.
+ Under the GAMMA model the 4 discrete GAMMA rates all have the same probability
+ of 0.25 */
+
+ term = LOG(0.25 * FABS(term));
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+ for (i = 0; i < n; i++)
+ {
+ /* same as before, only that now we access two inner likelihood vectors x1 and x2 */
+
+ x1 = &(x1_start[span * i]);
+ x2 = &(x2_start[span * i]);
+
+ for(j = 0, term = 0.0; j < 4; j++)
+ for(k = 0; k < states; k++)
+ term += x1[j * states + k] * x2[j * states + k] * diagptable[j * states + k];
+
+ term = LOG(0.25 * FABS(term));
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+/* a generic and slow implementation of the CAT model of rate heterogeneity */
+
+static double evaluateCAT_FLEX (int *cptr, int *wptr,
+ double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable_start, const int states)
+{
+ double
+ sum = 0.0,
+ term,
+ *diagptable,
+ *left,
+ *right;
+
+ int
+ i,
+ l;
+
+ /* chosing between tip vectors and non tip vectors is identical in all flavors of this function ,regardless
+ of whether we are using CAT, GAMMA, DNA or protein data etc */
+
+ if(tipX1)
+ {
+ for (i = 0; i < n; i++)
+ {
+ /* same as in the GAMMA implementation */
+ left = &(tipVector[states * tipX1[i]]);
+ right = &(x2[states * i]);
+
+ /* important difference here, we do not have, as for GAMMA
+ 4 P matrices assigned to each site, but just one. However those
+ P-Matrices can be different for the sites.
+ Hence we index into the precalculated P-matrices for individual sites
+ via the category pointer cptr[i]
+ */
+ diagptable = &diagptable_start[states * cptr[i]];
+
+ /* similar to gamma, with the only difference that we do not integrate (sum)
+ over the discrete gamma rates, but simply compute the likelihood of the
+ site and the given P-matrix */
+
+ for(l = 0, term = 0.0; l < states; l++)
+ term += left[l] * right[l] * diagptable[l];
+
+ /* take the log */
+
+ term = LOG(FABS(term));
+
+ /*
+ multiply the log with the pattern weight of this site.
+ The site pattern for which we just computed the likelihood may
+ represent several alignment columns sites that have been compressed
+ into one site pattern if they are exactly identical AND evolve under the same model,
+ i.e., form part of the same partition.
+ */
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+ for (i = 0; i < n; i++)
+ {
+ /* as before we now access the likelihood arrayes of two inner nodes */
+ left = &x1[states * i];
+ right = &x2[states * i];
+
+ diagptable = &diagptable_start[states * cptr[i]];
+
+ for(l = 0, term = 0.0; l < states; l++)
+ term += left[l] * right[l] * diagptable[l];
+
+ term = LOG(FABS(term));
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+#endif
+
+/* below are the function headers for unreadeble highly optimized versions of the above functions
+ for DNA and protein data that also use SSE3 intrinsics and implement some memory saving tricks.
+ The actual functions can be found at the end of this source file.
+ All other likelihood function implementation files:
+
+ newviewGenericSpacial.c
+ makenewzSpecial.c
+ evaluatePartialGenericSpecial.c
+
+ are also structured like this
+
+ To decide which set of function implementations to use you will have to undefine or define _OPTIMIZED_FUNCTIONS
+ in the Makefile
+*/
+
+
+#ifdef _OPTIMIZED_FUNCTIONS
+static double evaluateGTRGAMMA_BINARY(int *ex1, int *ex2, int *wptr,
+ double *x1_start, double *x2_start,
+ double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable, const boolean fastScaling);
+
+static double evaluateGTRCAT_BINARY (int *ex1, int *ex2, int *cptr, int *wptr,
+ double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable_start, const boolean fastScaling);
+
+static double evaluateGTRGAMMAPROT_LG4(int *ex1, int *ex2, int *wptr,
+ double *x1, double *x2,
+ double *tipVector[4],
+ unsigned char *tipX1, int n, double *diagptable, const boolean fastScaling, double *weights);
+
+/* GAMMA for proteins with memory saving */
+
+static double evaluateGTRGAMMAPROT_GAPPED_SAVE (int *wptr,
+ double *x1, double *x2,
+ double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+
+/* GAMMA for proteins */
+
+static double evaluateGTRGAMMAPROT (int *wptr,
+ double *x1, double *x2,
+ double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable);
+
+/* CAT for proteins */
+
+static double evaluateGTRCATPROT (int *cptr, int *wptr,
+ double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable_start);
+
+
+/* CAT for proteins with memory saving */
+
+static double evaluateGTRCATPROT_SAVE (int *cptr, int *wptr,
+ double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable_start,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+/* analogous DNA fuctions */
+
+static double evaluateGTRCAT_SAVE (int *cptr, int *wptr,
+ double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable_start,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static double evaluateGTRGAMMA_GAPPED_SAVE(int *wptr,
+ double *x1_start, double *x2_start,
+ double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static double evaluateGTRGAMMA(int *wptr,
+ double *x1_start, double *x2_start,
+ double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable);
+
+
+static double evaluateGTRCAT (int *cptr, int *wptr,
+ double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable_start);
+
+
+#endif
+
+
+/* This is the core function for computing the log likelihood at a branch */
+
+void evaluateIterative(tree *tr)
+{
+ /* the branch lengths and node indices of the virtual root branch are always the first one that
+ are stored in the very important traversal array data structure that describes a partial or full tree traversal */
+
+ /* get the branch length at the root */
+ double
+ *pz = tr->td[0].ti[0].qz;
+
+ /* get the node number of the node to the left and right of the branch that defines the virtual rooting */
+
+ int
+ pNumber = tr->td[0].ti[0].pNumber,
+ qNumber = tr->td[0].ti[0].qNumber;
+
+ /* before we can compute the likelihood at the virtual root, we need to do a partial or full tree traversal to compute
+ the conditional likelihoods of the vectors as specified in the traversal descriptor. Maintaining this tarversal descriptor consistent
+ will unfortunately be the responsibility of users. This is tricky, if as planned for here, we use a rooted view (described somewhere in Felsenstein's book)
+ for the conditional vectors with respect to the tree
+ */
+
+ /* iterate over all valid entries in the traversal descriptor */
+ newviewIterative(tr, 1);
+
+ int
+ m;
+
+#ifdef _USE_OMP
+#pragma omp parallel for
+#endif
+ for(m = 0; m < tr->NumberOfModels; m++)
+ {
+ /* check if this partition has to be processed now - otherwise no need to compute P matrix */
+ if(!tr->td[0].executeModel[m] || tr->partitionData[m].width == 0)
+ continue;
+
+ int
+ categories,
+ states = tr->partitionData[m].states;
+
+ double
+ z,
+ *rateCategories,
+ *diagptable = tr->partitionData[m].left;
+
+ /* if we are using a per-partition branch length estimate, the branch has an index, otherwise, for a joint branch length
+ estimate over all partitions we just use the branch length value with index 0 */
+ if(tr->numBranches > 1)
+ z = pz[m];
+ else
+ z = pz[0];
+
+
+ /*
+ figure out if we are using the CAT or GAMMA model of rate heterogeneity
+ and set pointers to the rate heterogeneity rate arrays and also set the
+ number of distinct rate categories appropriately.
+
+ Under GAMMA this is constant and hard-coded as 4, weheras under CAT
+ the number of site-wise rate categories can vary in the course of computations
+ up to a user defined maximum value of site categories (default: 25)
+ */
+ if(tr->rateHetModel == CAT)
+ {
+ rateCategories = tr->partitionData[m].perSiteRates;
+ categories = tr->partitionData[m].numberOfCategories;
+ }
+ else
+ {
+ rateCategories = tr->partitionData[m].gammaRates;
+ categories = 4;
+ }
+
+ if(tr->partitionData[m].protModels == LG4M || tr->partitionData[m].protModels == LG4X)
+ calcDiagptableFlex_LG4(z, 4, tr->partitionData[m].gammaRates, tr->partitionData[m].EIGN_LG4, diagptable, 20);
+ else
+ calcDiagptable(z, states, categories, rateCategories, tr->partitionData[m].EIGN, diagptable);
+ }
+
+ /* after the above call we are sure that we have properly and consistently computed the
+ conditionals to the right and left of the virtual root and we can now invoke the
+ the log likelihood computation */
+
+ /* we need to loop over all partitions. Note that we may have a mix of DNA, protein binary data etc partitions */
+#ifdef _USE_OMP
+#pragma omp parallel
+#endif
+ {
+ int
+ m,
+ model,
+ maxModel;
+
+#ifdef _USE_OMP
+ maxModel = tr->maxModelsPerThread;
+#else
+ maxModel = tr->NumberOfModels;
+#endif
+
+ for(m = 0; m < maxModel; m++)
+ {
+ /* just defaults -> if partion wasn't assigned to this thread, it will be ignored later on */
+ size_t
+ width = 0,
+ offset = 0;
+
+ double
+ *diagptable = (double*)NULL,
+ *perPartitionLH = (double*)NULL;
+
+ unsigned int
+ *globalScaler = (unsigned int*)NULL;
+
+
+#ifdef _USE_OMP
+ int
+ tid = omp_get_thread_num();
+
+ /* check if this thread should process this partition */
+ Assign*
+ pAss = tr->threadPartAssigns[tid * tr->maxModelsPerThread + m];
+
+ if(pAss)
+ {
+ model = pAss->partitionId;
+ width = pAss->width;
+ offset = pAss->offset;
+
+ assert(model < tr->NumberOfModels);
+
+ diagptable = tr->partitionData[model].left;
+ globalScaler = tr->partitionData[model].threadGlobalScaler[tid];
+ perPartitionLH = &tr->partitionData[model].reductionBuffer[tid];
+ }
+ else
+ break;
+
+#else
+ model = m;
+
+ /* number of sites in this partition */
+ width = (size_t)tr->partitionData[model].width;
+ offset = 0;
+
+ /* set this pointer to the memory area where space has been reserved a priori for storing the
+ P matrix at the root */
+ diagptable = tr->partitionData[model].left;
+ globalScaler = tr->partitionData[model].globalScaler;
+ perPartitionLH = &tr->perPartitionLH[model];
+#endif
+
+
+ /*
+ Important part of the tarversal descriptor:
+ figure out if we need to recalculate the likelihood of this
+ partition:
+
+ The reasons why this is important in terms of performance are given in this paper
+ here which you should actually read:
+
+ A. Stamatakis, M. Ott: "Load Balance in the Phylogenetic Likelihood Kernel". Proceedings of ICPP 2009, accepted for publication, Vienna, Austria, September 2009
+
+ The width > 0 check is for checking if under the cyclic data distribution of per-partition sites to threads this thread does indeed have a site
+ of the current partition.
+
+ */
+
+ if(tr->td[0].executeModel[model] && width > 0)
+ {
+ int
+ rateHet = (int)discreteRateCategories(tr->rateHetModel),
+
+ /* get the number of states in the partition, e.g.: 4 = DNA, 20 = Protein */
+ states = tr->partitionData[model].states,
+
+ /* span for single alignment site (in doubles!) */
+ span = rateHet * states;
+
+ size_t
+ /* offset for current thread's data in global xVector (in doubles!) */
+ x_offset = offset * (size_t)span;
+
+ int
+ /* integer weight vector with pattern compression weights */
+ *wgt = tr->partitionData[model].wgt + offset,
+
+ /* integer rate category vector (for each pattern, _number_ of PSR category assigned to it, NOT actual rate!) */
+ *rateCategory = tr->partitionData[model].rateCategory + offset;
+
+ double
+ partitionLikelihood = 0.0,
+ *weights = tr->partitionData[model].weights,
+ *x1_start = (double*)NULL,
+ *x2_start = (double*)NULL,
+ *x1_gapColumn = (double*)NULL,
+ *x2_gapColumn = (double*)NULL;
+
+ unsigned int
+ *x1_gap = (unsigned int*)NULL,
+ *x2_gap = (unsigned int*)NULL;
+
+ unsigned char
+ *tip = (unsigned char*)NULL;
+
+ /* figure out if we need to address tip vectors (a char array that indexes into a precomputed tip likelihood
+ value array or if we need to address inner vectors */
+
+ /* either node p or node q is a tip */
+
+ if(isTip(pNumber, tr->mxtips) || isTip(qNumber, tr->mxtips))
+ {
+ /* q is a tip */
+
+ if(isTip(qNumber, tr->mxtips))
+ {
+ /* get the start address of the inner likelihood vector x2 for partition model,
+ note that inner nodes are enumerated/indexed starting at 0 to save allocating some
+ space for additional pointers */
+
+ x2_start = tr->partitionData[model].xVector[pNumber - tr->mxtips -1] + x_offset;
+
+ /* get the corresponding tip vector */
+
+ tip = tr->partitionData[model].yVector[qNumber] + offset;
+
+ /* memory saving stuff, let's deal with this later or ask Fernando ;-) */
+
+ if(tr->saveMemory)
+ {
+ x2_gap = &(tr->partitionData[model].gapVector[pNumber * tr->partitionData[model].gapVectorLength]);
+ x2_gapColumn = &(tr->partitionData[model].gapColumn[(pNumber - tr->mxtips - 1) * states * rateHet]);
+ }
+ }
+ else
+ {
+ /* p is a tip, same as above */
+
+ x2_start = tr->partitionData[model].xVector[qNumber - tr->mxtips - 1] + x_offset;
+ tip = tr->partitionData[model].yVector[pNumber] + offset;
+
+ if(tr->saveMemory)
+ {
+ x2_gap = &(tr->partitionData[model].gapVector[qNumber * tr->partitionData[model].gapVectorLength]);
+ x2_gapColumn = &(tr->partitionData[model].gapColumn[(qNumber - tr->mxtips - 1) * states * rateHet]);
+ }
+
+ }
+ }
+ else
+ {
+
+ /* neither p nor q are tips, hence we need to get the addresses of two inner vectors */
+
+ x1_start = tr->partitionData[model].xVector[pNumber - tr->mxtips - 1] + x_offset;
+ x2_start = tr->partitionData[model].xVector[qNumber - tr->mxtips - 1] + x_offset;
+
+ /* memory saving option */
+
+ if(tr->saveMemory)
+ {
+ x1_gap = &(tr->partitionData[model].gapVector[pNumber * tr->partitionData[model].gapVectorLength]);
+ x2_gap = &(tr->partitionData[model].gapVector[qNumber * tr->partitionData[model].gapVectorLength]);
+ x1_gapColumn = &tr->partitionData[model].gapColumn[(pNumber - tr->mxtips - 1) * states * rateHet];
+ x2_gapColumn = &tr->partitionData[model].gapColumn[(qNumber - tr->mxtips - 1) * states * rateHet];
+ }
+
+ }
+
+#ifndef _OPTIMIZED_FUNCTIONS
+
+ /* generic slow functions, memory saving option is not implemented for these */
+
+ assert(!tr->saveMemory);
+
+ /* decide wheter CAT or GAMMA is used and compute log like */
+
+ if(tr->rateHetModel == CAT)
+ partitionLikelihood = evaluateCAT_FLEX(tr->partitionData[model].rateCategory, wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector,
+ tip, width, diagptable, states);
+ else
+ partitionLikelihood = evaluateGAMMA_FLEX(wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector,
+ tip, width, diagptable, states);
+#else
+
+ /* for the optimized functions we have a dedicated, optimized function implementation
+ for each rate heterogeneity and data type combination, we switch over the number of states
+ and the rate heterogeneity model */
+
+ switch(states)
+ {
+ case 2:
+#ifdef __MIC_NATIVE
+ assert(0 && "Binary data model is not implemented on Intel MIC");
+#else
+ assert(!tr->saveMemory);
+ if(tr->rateHetModel == CAT)
+ partitionLikelihood = evaluateGTRCAT_BINARY((int *)NULL, (int *)NULL, rateCategory, wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector,
+ tip, width, diagptable, TRUE);
+ else
+ partitionLikelihood = evaluateGTRGAMMA_BINARY((int *)NULL, (int *)NULL, wgt,
+ x1_start, x2_start,
+ tr->partitionData[model].tipVector,
+ tip, width, diagptable, TRUE);
+#endif
+ break;
+ case 4: /* DNA */
+ {
+ if(tr->rateHetModel == CAT)
+ {
+ if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#else
+ partitionLikelihood = evaluateGTRCAT_SAVE(rateCategory, wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector,
+ tip, width, diagptable, x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+ else
+#ifdef __MIC_NATIVE
+ assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#else
+ partitionLikelihood = evaluateGTRCAT(rateCategory, wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector,
+ tip, width, diagptable);
+#endif
+ }
+ else
+ {
+ if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ assert(0 && "Memory saving is not implemented on Intel MIC");
+#else
+ partitionLikelihood = evaluateGTRGAMMA_GAPPED_SAVE(wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector,
+ tip, width, diagptable,
+ x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+ else
+#ifdef __MIC_NATIVE
+ partitionLikelihood = evaluateGAMMA_MIC(wgt,
+ x1_start, x2_start, tr->partitionData[model].mic_tipVector,
+ tip, width, diagptable);
+#else
+ partitionLikelihood = evaluateGTRGAMMA(wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector,
+ tip, width, diagptable);
+#endif
+ }
+ }
+ break;
+ case 20: /* proteins */
+ {
+ if(tr->rateHetModel == CAT)
+ {
+ if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#else
+ partitionLikelihood = evaluateGTRCATPROT_SAVE(rateCategory, wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector,
+ tip, width, diagptable, x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+ else
+#ifdef __MIC_NATIVE
+ assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#else
+ partitionLikelihood = evaluateGTRCATPROT(rateCategory, wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector,
+ tip, width, diagptable);
+#endif
+ }
+ else
+ {
+ if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ assert(0 && "Memory saving is not implemented on Intel MIC");
+#else
+ partitionLikelihood = evaluateGTRGAMMAPROT_GAPPED_SAVE(wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector,
+ tip, width, diagptable,
+ x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+ else
+ {
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+#ifdef __MIC_NATIVE
+ partitionLikelihood = evaluateGAMMAPROT_LG4_MIC(wgt,
+ x1_start, x2_start, tr->partitionData[model].mic_tipVector,
+ tip, width, diagptable, weights);
+#else
+ partitionLikelihood = evaluateGTRGAMMAPROT_LG4((int *)NULL, (int *)NULL, wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector_LG4,
+ tip, width, diagptable, TRUE, weights);
+#endif
+ else
+#ifdef __MIC_NATIVE
+ partitionLikelihood = evaluateGAMMAPROT_MIC(wgt,
+ x1_start, x2_start, tr->partitionData[model].mic_tipVector,
+ tip, width, diagptable);
+#else
+ partitionLikelihood = evaluateGTRGAMMAPROT(wgt,
+ x1_start, x2_start, tr->partitionData[model].tipVector,
+ tip, width, diagptable);
+#endif
+ }
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+#endif
+
+ /* now here is a nasty part, for each partition and each node we maintain an integer counter to count how often
+ how many entries per node were scaled by a constant factor. Here we use this information generated during Felsenstein's
+ pruning algorithm by the newview() functions to undo the preceding scaling multiplications at the root, for mathematical details
+ you should actually read:
+
+ A. Stamatakis: "Orchestrating the Phylogenetic Likelihood Function on Emerging Parallel Architectures".
+ In B. Schmidt, editor, Bioinformatics: High Performance Parallel Computer Architectures, 85-115, CRC Press, Taylor & Francis, 2010.
+
+ There's a copy of this book in my office
+ */
+
+ partitionLikelihood += (globalScaler[pNumber] + globalScaler[qNumber]) * LOG(minlikelihood);
+
+ /* check that there was no major numerical screw-up, the log likelihood should be < 0.0 always */
+
+
+
+ assert(partitionLikelihood < 0.0);
+
+ /* now we have the correct log likelihood for the current partition after undoing scaling multiplications */
+
+ /* finally, we also store the per partition log likelihood which is important for optimizing the alpha parameter
+ of this partition for example */
+
+ *perPartitionLH = partitionLikelihood;
+ }
+ else
+ {
+ /* if the current thread does not have a single site of this partition
+ it is important to set the per partition log like to 0.0 because
+ of the reduction operation that will take place later-on.
+ That is, the values of tr->perPartitionLH across all threads
+ need to be in a consistent state, always !
+ */
+
+ if(width == 0)
+ *perPartitionLH = 0.0;
+ else
+ {
+ assert(tr->td[0].executeModel[model] == FALSE && *perPartitionLH < 0.0);
+ }
+ }
+ } /* for model */
+ } /* OMP parallel */
+
+
+#ifdef _USE_OMP
+ /* perform reduction of per-partition LH scores */
+ int
+ model,
+ t;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ if (!tr->td[0].executeModel[model])
+ continue;
+
+ tr->perPartitionLH[model] = 0.0;
+ for(t = 0; t < tr->maxThreadsPerModel; t++)
+ {
+ Assign*
+ pAss = tr->partThreadAssigns[model * tr->maxThreadsPerModel + t];
+
+ if (pAss)
+ {
+ int
+ tid = pAss->procId;
+
+ tr->perPartitionLH[model] += tr->partitionData[model].reductionBuffer[tid];
+ }
+ }
+ }
+#endif
+}
+
+
+
+
+void evaluateGeneric (tree *tr, nodeptr p, boolean fullTraversal)
+{
+ /* now this may be the entry point of the library to compute
+ the log like at a branch defined by p and p->back == q */
+
+ volatile double
+ result = 0.0;
+
+ nodeptr
+ q = p->back;
+
+ int
+ i,
+ model;
+
+
+ /* set the first entry of the traversal descriptor to contain the indices
+ of nodes p and q */
+
+ tr->td[0].ti[0].pNumber = p->number;
+ tr->td[0].ti[0].qNumber = q->number;
+
+ /* copy the branch lengths of the tree into the first entry of the traversal descriptor.
+ if -M is not used tr->numBranches must be 1 */
+
+ for(i = 0; i < tr->numBranches; i++)
+ tr->td[0].ti[0].qz[i] = q->z[i];
+
+ /* now compute how many conditionals must be re-computed/re-oriented by newview
+ to be able to calculate the likelihood at the root defined by p and q.
+ */
+
+ /* one entry in the traversal descriptor is already used, hence set the tarversal length counter to 1 */
+ tr->td[0].count = 1;
+
+ /* do we need to recompute any of the vectors at or below p ? */
+
+ if(fullTraversal)
+ {
+ assert(isTip(p->number, tr->mxtips));
+ computeTraversalInfo(q, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, FALSE);
+ }
+ else
+ {
+ if(!p->x)
+ computeTraversalInfo(p, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, TRUE);
+
+ /* recompute/reorient any descriptors at or below q ?
+ computeTraversalInfo computes and stores the newview() to be executed for the traversal descriptor */
+
+ if(!q->x)
+ computeTraversalInfo(q, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, TRUE);
+ }
+
+ /* now we copy this partition execute mask into the traversal descriptor which must come from the
+ calling program, the logic of this should not form part of the library */
+
+ storeExecuteMaskInTraversalDescriptor(tr);
+
+ /* also store in the traversal descriptor that something has changed i.e., in the parallel case that the
+ traversal descriptor list of nodes needs to be broadcast once again */
+
+ tr->td[0].traversalHasChanged = TRUE;
+
+
+ evaluateIterative(tr);
+
+ {
+ double
+ *recv = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+
+#ifdef _USE_ALLREDUCE
+ MPI_Allreduce(tr->perPartitionLH, recv, tr->NumberOfModels, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
+#else
+ MPI_Reduce(tr->perPartitionLH, recv, tr->NumberOfModels, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
+ MPI_Bcast(recv, tr->NumberOfModels, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+#endif
+
+ memcpy(tr->perPartitionLH, recv, tr->NumberOfModels * sizeof(double));
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ result += tr->perPartitionLH[model];
+
+ free(recv);
+ }
+
+
+ /* set the tree data structure likelihood value to the total likelihood */
+
+ tr->likelihood = result;
+
+ /*
+ MPI_Barrier(MPI_COMM_WORLD);
+ printf("Process %d likelihood: %f\n", processID, tr->likelihood);
+ MPI_Barrier(MPI_COMM_WORLD);
+ */
+
+ /* do some bookkeeping to have traversalHasChanged in a consistent state */
+
+ tr->td[0].traversalHasChanged = FALSE;
+
+
+
+
+}
+
+
+
+
+
+
+
+/* below are the optimized function versions with geeky intrinsics */
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+/* binary data */
+
+static double evaluateGTRCAT_BINARY (int *ex1, int *ex2, int *cptr, int *wptr,
+ double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable_start, const boolean fastScaling)
+{
+ double sum = 0.0, term;
+ int i;
+ double *diagptable, *x1, *x2;
+
+ if(tipX1)
+ {
+ for (i = 0; i < n; i++)
+ {
+ double
+ t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ x1 = &(tipVector[2 * tipX1[i]]);
+ x2 = &(x2_start[2 * i]);
+
+ diagptable = &(diagptable_start[2 * cptr[i]]);
+
+
+ _mm_store_pd(t, _mm_mul_pd(_mm_load_pd(x1), _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(diagptable))));
+
+ if(fastScaling)
+ term = log(fabs(t[0] + t[1]));
+ else
+ term = log(fabs(t[0] + t[1])) + (ex2[i] * log(minlikelihood));
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+ for (i = 0; i < n; i++)
+ {
+ double
+ t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ x1 = &x1_start[2 * i];
+ x2 = &x2_start[2 * i];
+
+ diagptable = &diagptable_start[2 * cptr[i]];
+
+ _mm_store_pd(t, _mm_mul_pd(_mm_load_pd(x1), _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(diagptable))));
+
+ if(fastScaling)
+ term = log(fabs(t[0] + t[1]));
+ else
+ term = log(fabs(t[0] + t[1])) + ((ex1[i] + ex2[i]) * log(minlikelihood));
+
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+static double evaluateGTRGAMMA_BINARY(int *ex1, int *ex2, int *wptr,
+ double *x1_start, double *x2_start,
+ double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable, const boolean fastScaling)
+{
+ double sum = 0.0, term;
+ int i, j;
+ double *x1, *x2;
+
+ if(tipX1)
+ {
+ for (i = 0; i < n; i++)
+ {
+ double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ __m128d termv, x1v, x2v, dv;
+
+ x1 = &(tipVector[2 * tipX1[i]]);
+ x2 = &x2_start[8 * i];
+
+ termv = _mm_set1_pd(0.0);
+
+ for(j = 0; j < 4; j++)
+ {
+ x1v = _mm_load_pd(&x1[0]);
+ x2v = _mm_load_pd(&x2[j * 2]);
+ dv = _mm_load_pd(&diagptable[j * 2]);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+ x1v = _mm_mul_pd(x1v, dv);
+
+ termv = _mm_add_pd(termv, x1v);
+ }
+
+ _mm_store_pd(t, termv);
+
+ if(fastScaling)
+ term = log(0.25 * (fabs(t[0] + t[1])));
+ else
+ term = log(0.25 * (fabs(t[0] + t[1]))) + (ex2[i] * log(minlikelihood));
+
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+ for (i = 0; i < n; i++)
+ {
+
+ double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ __m128d termv, x1v, x2v, dv;
+
+ x1 = &x1_start[8 * i];
+ x2 = &x2_start[8 * i];
+
+
+ termv = _mm_set1_pd(0.0);
+
+ for(j = 0; j < 4; j++)
+ {
+ x1v = _mm_load_pd(&x1[j * 2]);
+ x2v = _mm_load_pd(&x2[j * 2]);
+ dv = _mm_load_pd(&diagptable[j * 2]);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+ x1v = _mm_mul_pd(x1v, dv);
+
+ termv = _mm_add_pd(termv, x1v);
+ }
+
+ _mm_store_pd(t, termv);
+
+
+ if(fastScaling)
+ term = log(0.25 * (fabs(t[0] + t[1])));
+ else
+ term = log(0.25 * (fabs(t[0] + t[1]))) + ((ex1[i] +ex2[i]) * log(minlikelihood));
+
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+/* binary data end */
+
+
+static double evaluateGTRGAMMAPROT_LG4(int *ex1, int *ex2, int *wptr,
+ double *x1, double *x2,
+ double *tipVector[4],
+ unsigned char *tipX1, int n, double *diagptable, const boolean fastScaling, double *weights)
+{
+ double sum = 0.0, term;
+ int i, j, l;
+ double *left, *right;
+
+ if(tipX1)
+ {
+ for (i = 0; i < n; i++)
+ {
+#ifdef __SIM_SSE3
+ __m128d
+ tv = _mm_setzero_pd();
+
+ for(j = 0, term = 0.0; j < 4; j++)
+ {
+ double
+ *d = &diagptable[j * 20];
+
+ __m128d
+ t = _mm_setzero_pd(),
+ w = _mm_set1_pd(weights[j]);
+
+
+ left = &(tipVector[j][20 * tipX1[i]]);
+ right = &(x2[80 * i + 20 * j]);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+ t = _mm_add_pd(t, _mm_mul_pd(mul, _mm_load_pd(&d[l])));
+ }
+
+ tv = _mm_add_pd(tv, _mm_mul_pd(t, w));
+ }
+
+ tv = _mm_hadd_pd(tv, tv);
+ _mm_storel_pd(&term, tv);
+
+#else
+ for(j = 0, term = 0.0; j < 4; j++)
+ {
+ double
+ t = 0.0;
+
+ left = &(tipVector[j][20 * tipX1[i]]);
+ right = &(x2[80 * i + 20 * j]);
+ for(l = 0; l < 20; l++)
+ t += left[l] * right[l] * diagptable[j * 20 + l];
+
+ term += weights[j] * t;
+ }
+#endif
+
+ if(fastScaling)
+ term = LOG(FABS(term));
+ else
+ term = LOG(FABS(term)) + (ex2[i] * LOG(minlikelihood));
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+ for (i = 0; i < n; i++)
+ {
+#ifdef __SIM_SSE3
+ __m128d
+ tv = _mm_setzero_pd();
+
+ for(j = 0, term = 0.0; j < 4; j++)
+ {
+ double
+ *d = &diagptable[j * 20];
+
+ __m128d
+ t = _mm_setzero_pd(),
+ w = _mm_set1_pd(weights[j]);
+
+ left = &(x1[80 * i + 20 * j]);
+ right = &(x2[80 * i + 20 * j]);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+ t = _mm_add_pd(t, _mm_mul_pd(mul, _mm_load_pd(&d[l])));
+ }
+
+ tv = _mm_add_pd(tv, _mm_mul_pd(t, w));
+ }
+
+ tv = _mm_hadd_pd(tv, tv);
+ _mm_storel_pd(&term, tv);
+
+#else
+ for(j = 0, term = 0.0; j < 4; j++)
+ {
+ double
+ t = 0.0;
+
+ left = &(x1[80 * i + 20 * j]);
+ right = &(x2[80 * i + 20 * j]);
+
+ for(l = 0; l < 20; l++)
+ t += left[l] * right[l] * diagptable[j * 20 + l];
+
+ term += weights[j] * t;
+ }
+#endif
+
+ if(fastScaling)
+ term = LOG(FABS(term));
+ else
+ term = LOG(FABS(term)) + ((ex1[i] + ex2[i])*LOG(minlikelihood));
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+
+static double evaluateGTRGAMMAPROT_GAPPED_SAVE (int *wptr,
+ double *x1, double *x2,
+ double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+ double sum = 0.0, term;
+ int i, j, l;
+ double
+ *left,
+ *right,
+ *x1_ptr = x1,
+ *x2_ptr = x2,
+ *x1v,
+ *x2v;
+
+ if(tipX1)
+ {
+ for (i = 0; i < n; i++)
+ {
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2v = x2_gapColumn;
+ else
+ {
+ x2v = x2_ptr;
+ x2_ptr += 80;
+ }
+
+ __m128d tv = _mm_setzero_pd();
+ left = &(tipVector[20 * tipX1[i]]);
+
+ for(j = 0, term = 0.0; j < 4; j++)
+ {
+ double *d = &diagptable[j * 20];
+ right = &(x2v[20 * j]);
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+ tv = _mm_add_pd(tv, _mm_mul_pd(mul, _mm_load_pd(&d[l])));
+ }
+ }
+
+ tv = _mm_hadd_pd(tv, tv);
+ _mm_storel_pd(&term, tv);
+
+
+
+ term = LOG(0.25 * FABS(term));
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+ for (i = 0; i < n; i++)
+ {
+ if(x1_gap[i / 32] & mask32[i % 32])
+ x1v = x1_gapColumn;
+ else
+ {
+ x1v = x1_ptr;
+ x1_ptr += 80;
+ }
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2v = x2_gapColumn;
+ else
+ {
+ x2v = x2_ptr;
+ x2_ptr += 80;
+ }
+
+ __m128d tv = _mm_setzero_pd();
+
+ for(j = 0, term = 0.0; j < 4; j++)
+ {
+ double *d = &diagptable[j * 20];
+ left = &(x1v[20 * j]);
+ right = &(x2v[20 * j]);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+ tv = _mm_add_pd(tv, _mm_mul_pd(mul, _mm_load_pd(&d[l])));
+ }
+ }
+ tv = _mm_hadd_pd(tv, tv);
+ _mm_storel_pd(&term, tv);
+
+
+ term = LOG(0.25 * FABS(term));
+
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+
+static double evaluateGTRGAMMAPROT (int *wptr,
+ double *x1, double *x2,
+ double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable)
+{
+ double sum = 0.0, term;
+ int i, j, l;
+ double *left, *right;
+
+ if(tipX1)
+ {
+ for (i = 0; i < n; i++)
+ {
+
+ __m128d tv = _mm_setzero_pd();
+ left = &(tipVector[20 * tipX1[i]]);
+
+ for(j = 0, term = 0.0; j < 4; j++)
+ {
+ double *d = &diagptable[j * 20];
+ right = &(x2[80 * i + 20 * j]);
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+ tv = _mm_add_pd(tv, _mm_mul_pd(mul, _mm_load_pd(&d[l])));
+ }
+ }
+ tv = _mm_hadd_pd(tv, tv);
+ _mm_storel_pd(&term, tv);
+
+
+
+ term = LOG(0.25 * FABS(term));
+
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+ for (i = 0; i < n; i++)
+ {
+ __m128d tv = _mm_setzero_pd();
+
+ for(j = 0, term = 0.0; j < 4; j++)
+ {
+ double *d = &diagptable[j * 20];
+ left = &(x1[80 * i + 20 * j]);
+ right = &(x2[80 * i + 20 * j]);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d mul = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+ tv = _mm_add_pd(tv, _mm_mul_pd(mul, _mm_load_pd(&d[l])));
+ }
+ }
+ tv = _mm_hadd_pd(tv, tv);
+ _mm_storel_pd(&term, tv);
+
+
+ term = LOG(0.25 * FABS(term));
+
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+static double evaluateGTRCATPROT (int *cptr, int *wptr,
+ double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable_start)
+{
+ double sum = 0.0, term;
+ double *diagptable, *left, *right;
+ int i, l;
+
+ if(tipX1)
+ {
+ for (i = 0; i < n; i++)
+ {
+ left = &(tipVector[20 * tipX1[i]]);
+ right = &(x2[20 * i]);
+
+ diagptable = &diagptable_start[20 * cptr[i]];
+
+ __m128d tv = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d lv = _mm_load_pd(&left[l]);
+ __m128d rv = _mm_load_pd(&right[l]);
+ __m128d mul = _mm_mul_pd(lv, rv);
+ __m128d dv = _mm_load_pd(&diagptable[l]);
+
+ tv = _mm_add_pd(tv, _mm_mul_pd(mul, dv));
+ }
+
+ tv = _mm_hadd_pd(tv, tv);
+ _mm_storel_pd(&term, tv);
+
+
+ term = LOG(FABS(term));
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+
+ for (i = 0; i < n; i++)
+ {
+ left = &x1[20 * i];
+ right = &x2[20 * i];
+
+ diagptable = &diagptable_start[20 * cptr[i]];
+
+ __m128d tv = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d lv = _mm_load_pd(&left[l]);
+ __m128d rv = _mm_load_pd(&right[l]);
+ __m128d mul = _mm_mul_pd(lv, rv);
+ __m128d dv = _mm_load_pd(&diagptable[l]);
+
+ tv = _mm_add_pd(tv, _mm_mul_pd(mul, dv));
+ }
+
+ tv = _mm_hadd_pd(tv, tv);
+ _mm_storel_pd(&term, tv);
+
+ term = LOG(FABS(term));
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+static double evaluateGTRCATPROT_SAVE (int *cptr, int *wptr,
+ double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable_start,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+ double
+ sum = 0.0,
+ term,
+ *diagptable,
+ *left,
+ *right,
+ *left_ptr = x1,
+ *right_ptr = x2;
+
+ int
+ i,
+ l;
+
+ if(tipX1)
+ {
+ for (i = 0; i < n; i++)
+ {
+ left = &(tipVector[20 * tipX1[i]]);
+
+ if(isGap(x2_gap, i))
+ right = x2_gapColumn;
+ else
+ {
+ right = right_ptr;
+ right_ptr += 20;
+ }
+
+ diagptable = &diagptable_start[20 * cptr[i]];
+
+ __m128d tv = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d lv = _mm_load_pd(&left[l]);
+ __m128d rv = _mm_load_pd(&right[l]);
+ __m128d mul = _mm_mul_pd(lv, rv);
+ __m128d dv = _mm_load_pd(&diagptable[l]);
+
+ tv = _mm_add_pd(tv, _mm_mul_pd(mul, dv));
+ }
+
+ tv = _mm_hadd_pd(tv, tv);
+ _mm_storel_pd(&term, tv);
+
+
+ term = LOG(FABS(term));
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+
+ for (i = 0; i < n; i++)
+ {
+ if(isGap(x1_gap, i))
+ left = x1_gapColumn;
+ else
+ {
+ left = left_ptr;
+ left_ptr += 20;
+ }
+
+ if(isGap(x2_gap, i))
+ right = x2_gapColumn;
+ else
+ {
+ right = right_ptr;
+ right_ptr += 20;
+ }
+
+ diagptable = &diagptable_start[20 * cptr[i]];
+
+ __m128d tv = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d lv = _mm_load_pd(&left[l]);
+ __m128d rv = _mm_load_pd(&right[l]);
+ __m128d mul = _mm_mul_pd(lv, rv);
+ __m128d dv = _mm_load_pd(&diagptable[l]);
+
+ tv = _mm_add_pd(tv, _mm_mul_pd(mul, dv));
+ }
+
+ tv = _mm_hadd_pd(tv, tv);
+ _mm_storel_pd(&term, tv);
+
+ term = LOG(FABS(term));
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+static double evaluateGTRCAT_SAVE (int *cptr, int *wptr,
+ double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable_start,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+ double sum = 0.0, term;
+ int i;
+
+ double *diagptable,
+ *x1,
+ *x2,
+ *x1_ptr = x1_start,
+ *x2_ptr = x2_start;
+
+ if(tipX1)
+ {
+ for (i = 0; i < n; i++)
+ {
+ double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ __m128d x1v1, x1v2, x2v1, x2v2, dv1, dv2;
+
+ x1 = &(tipVector[4 * tipX1[i]]);
+
+ if(isGap(x2_gap, i))
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 4;
+ }
+
+ diagptable = &diagptable_start[4 * cptr[i]];
+
+ x1v1 = _mm_load_pd(&x1[0]);
+ x1v2 = _mm_load_pd(&x1[2]);
+ x2v1 = _mm_load_pd(&x2[0]);
+ x2v2 = _mm_load_pd(&x2[2]);
+ dv1 = _mm_load_pd(&diagptable[0]);
+ dv2 = _mm_load_pd(&diagptable[2]);
+
+ x1v1 = _mm_mul_pd(x1v1, x2v1);
+ x1v1 = _mm_mul_pd(x1v1, dv1);
+
+ x1v2 = _mm_mul_pd(x1v2, x2v2);
+ x1v2 = _mm_mul_pd(x1v2, dv2);
+
+ x1v1 = _mm_add_pd(x1v1, x1v2);
+
+ _mm_store_pd(t, x1v1);
+
+ term = LOG(FABS(t[0] + t[1]));
+
+
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+ for (i = 0; i < n; i++)
+ {
+ double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ __m128d x1v1, x1v2, x2v1, x2v2, dv1, dv2;
+
+ if(isGap(x1_gap, i))
+ x1 = x1_gapColumn;
+ else
+ {
+ x1 = x1_ptr;
+ x1_ptr += 4;
+ }
+
+ if(isGap(x2_gap, i))
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 4;
+ }
+
+ diagptable = &diagptable_start[4 * cptr[i]];
+
+ x1v1 = _mm_load_pd(&x1[0]);
+ x1v2 = _mm_load_pd(&x1[2]);
+ x2v1 = _mm_load_pd(&x2[0]);
+ x2v2 = _mm_load_pd(&x2[2]);
+ dv1 = _mm_load_pd(&diagptable[0]);
+ dv2 = _mm_load_pd(&diagptable[2]);
+
+ x1v1 = _mm_mul_pd(x1v1, x2v1);
+ x1v1 = _mm_mul_pd(x1v1, dv1);
+
+ x1v2 = _mm_mul_pd(x1v2, x2v2);
+ x1v2 = _mm_mul_pd(x1v2, dv2);
+
+ x1v1 = _mm_add_pd(x1v1, x1v2);
+
+ _mm_store_pd(t, x1v1);
+
+
+ term = LOG(FABS(t[0] + t[1]));
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+static double evaluateGTRGAMMA_GAPPED_SAVE(int *wptr,
+ double *x1_start, double *x2_start,
+ double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+ double sum = 0.0, term;
+ int i, j;
+ double
+ *x1,
+ *x2,
+ *x1_ptr = x1_start,
+ *x2_ptr = x2_start;
+
+
+
+ if(tipX1)
+ {
+
+
+ for (i = 0; i < n; i++)
+ {
+ double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ __m128d termv, x1v, x2v, dv;
+
+ x1 = &(tipVector[4 * tipX1[i]]);
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 16;
+ }
+
+
+ termv = _mm_set1_pd(0.0);
+
+ for(j = 0; j < 4; j++)
+ {
+ x1v = _mm_load_pd(&x1[0]);
+ x2v = _mm_load_pd(&x2[j * 4]);
+ dv = _mm_load_pd(&diagptable[j * 4]);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+ x1v = _mm_mul_pd(x1v, dv);
+
+ termv = _mm_add_pd(termv, x1v);
+
+ x1v = _mm_load_pd(&x1[2]);
+ x2v = _mm_load_pd(&x2[j * 4 + 2]);
+ dv = _mm_load_pd(&diagptable[j * 4 + 2]);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+ x1v = _mm_mul_pd(x1v, dv);
+
+ termv = _mm_add_pd(termv, x1v);
+ }
+
+ _mm_store_pd(t, termv);
+
+
+ term = LOG(0.25 * FABS(t[0] + t[1]));
+
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+
+ for (i = 0; i < n; i++)
+ {
+
+ double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ __m128d termv, x1v, x2v, dv;
+
+ if(x1_gap[i / 32] & mask32[i % 32])
+ x1 = x1_gapColumn;
+ else
+ {
+ x1 = x1_ptr;
+ x1_ptr += 16;
+ }
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 16;
+ }
+
+ termv = _mm_set1_pd(0.0);
+
+ for(j = 0; j < 4; j++)
+ {
+ x1v = _mm_load_pd(&x1[j * 4]);
+ x2v = _mm_load_pd(&x2[j * 4]);
+ dv = _mm_load_pd(&diagptable[j * 4]);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+ x1v = _mm_mul_pd(x1v, dv);
+
+ termv = _mm_add_pd(termv, x1v);
+
+ x1v = _mm_load_pd(&x1[j * 4 + 2]);
+ x2v = _mm_load_pd(&x2[j * 4 + 2]);
+ dv = _mm_load_pd(&diagptable[j * 4 + 2]);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+ x1v = _mm_mul_pd(x1v, dv);
+
+ termv = _mm_add_pd(termv, x1v);
+ }
+
+ _mm_store_pd(t, termv);
+
+
+ term = LOG(0.25 * FABS(t[0] + t[1]));
+
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+static double evaluateGTRGAMMA(int *wptr,
+ double *x1_start, double *x2_start,
+ double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable)
+{
+ double sum = 0.0, term;
+ int i, j;
+
+ double *x1, *x2;
+
+
+
+ if(tipX1)
+ {
+ for (i = 0; i < n; i++)
+ {
+
+ double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ __m128d termv, x1v, x2v, dv;
+
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &x2_start[16 * i];
+
+
+ termv = _mm_set1_pd(0.0);
+
+ for(j = 0; j < 4; j++)
+ {
+ x1v = _mm_load_pd(&x1[0]);
+ x2v = _mm_load_pd(&x2[j * 4]);
+ dv = _mm_load_pd(&diagptable[j * 4]);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+ x1v = _mm_mul_pd(x1v, dv);
+
+ termv = _mm_add_pd(termv, x1v);
+
+ x1v = _mm_load_pd(&x1[2]);
+ x2v = _mm_load_pd(&x2[j * 4 + 2]);
+ dv = _mm_load_pd(&diagptable[j * 4 + 2]);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+ x1v = _mm_mul_pd(x1v, dv);
+
+ termv = _mm_add_pd(termv, x1v);
+ }
+
+ _mm_store_pd(t, termv);
+
+
+
+ term = LOG(0.25 * FABS(t[0] + t[1]));
+
+
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+ for (i = 0; i < n; i++)
+ {
+
+ double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ __m128d termv, x1v, x2v, dv;
+
+
+ x1 = &x1_start[16 * i];
+ x2 = &x2_start[16 * i];
+
+
+ termv = _mm_set1_pd(0.0);
+
+ for(j = 0; j < 4; j++)
+ {
+ x1v = _mm_load_pd(&x1[j * 4]);
+ x2v = _mm_load_pd(&x2[j * 4]);
+ dv = _mm_load_pd(&diagptable[j * 4]);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+ x1v = _mm_mul_pd(x1v, dv);
+
+ termv = _mm_add_pd(termv, x1v);
+
+ x1v = _mm_load_pd(&x1[j * 4 + 2]);
+ x2v = _mm_load_pd(&x2[j * 4 + 2]);
+ dv = _mm_load_pd(&diagptable[j * 4 + 2]);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+ x1v = _mm_mul_pd(x1v, dv);
+
+ termv = _mm_add_pd(termv, x1v);
+ }
+
+ _mm_store_pd(t, termv);
+
+
+ term = LOG(0.25 * FABS(t[0] + t[1]));
+
+
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+static double evaluateGTRCAT (int *cptr, int *wptr,
+ double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, int n, double *diagptable_start)
+{
+ double sum = 0.0, term;
+ int i;
+
+ double *diagptable, *x1, *x2;
+
+ if(tipX1)
+ {
+ for (i = 0; i < n; i++)
+ {
+
+ double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ __m128d x1v1, x1v2, x2v1, x2v2, dv1, dv2;
+
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &x2_start[4 * i];
+
+ diagptable = &diagptable_start[4 * cptr[i]];
+
+
+ x1v1 = _mm_load_pd(&x1[0]);
+ x1v2 = _mm_load_pd(&x1[2]);
+ x2v1 = _mm_load_pd(&x2[0]);
+ x2v2 = _mm_load_pd(&x2[2]);
+ dv1 = _mm_load_pd(&diagptable[0]);
+ dv2 = _mm_load_pd(&diagptable[2]);
+
+ x1v1 = _mm_mul_pd(x1v1, x2v1);
+ x1v1 = _mm_mul_pd(x1v1, dv1);
+
+ x1v2 = _mm_mul_pd(x1v2, x2v2);
+ x1v2 = _mm_mul_pd(x1v2, dv2);
+
+ x1v1 = _mm_add_pd(x1v1, x1v2);
+
+ _mm_store_pd(t, x1v1);
+
+
+ term = LOG(FABS(t[0] + t[1]));
+
+
+ sum += wptr[i] * term;
+ }
+ }
+ else
+ {
+ for (i = 0; i < n; i++)
+ {
+
+ double t[2] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ __m128d x1v1, x1v2, x2v1, x2v2, dv1, dv2;
+
+ x1 = &x1_start[4 * i];
+ x2 = &x2_start[4 * i];
+
+ diagptable = &diagptable_start[4 * cptr[i]];
+
+
+ x1v1 = _mm_load_pd(&x1[0]);
+ x1v2 = _mm_load_pd(&x1[2]);
+ x2v1 = _mm_load_pd(&x2[0]);
+ x2v2 = _mm_load_pd(&x2[2]);
+ dv1 = _mm_load_pd(&diagptable[0]);
+ dv2 = _mm_load_pd(&diagptable[2]);
+
+ x1v1 = _mm_mul_pd(x1v1, x2v1);
+ x1v1 = _mm_mul_pd(x1v1, dv1);
+
+ x1v2 = _mm_mul_pd(x1v2, x2v2);
+ x1v2 = _mm_mul_pd(x1v2, dv2);
+
+ x1v1 = _mm_add_pd(x1v1, x1v2);
+
+ _mm_store_pd(t, x1v1);
+
+
+ term = LOG(FABS(t[0] + t[1]));
+
+
+ sum += wptr[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+
+
+
+
+#endif
+
+
diff --git a/examl/evaluatePartialGenericSpecial.c b/examl/evaluatePartialGenericSpecial.c
new file mode 100644
index 0000000..8ea1e1f
--- /dev/null
+++ b/examl/evaluatePartialGenericSpecial.c
@@ -0,0 +1,1058 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include "axml.h"
+
+#ifdef __SIM_SSE3
+#include <xmmintrin.h>
+#include <pmmintrin.h>
+#endif
+
+
+
+#if defined(_OPTIMIZED_FUNCTIONS) && !defined(__MIC_NATIVE)
+static inline void computeVectorGTRCATPROT(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+ traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector,
+ unsigned char **yVector, int mxtips);
+
+static double evaluatePartialGTRCATPROT(int i, double ki, int counter, traversalInfo *ti, double qz,
+ int w, double *EIGN, double *EI, double *EV,
+ double *tipVector, unsigned char **yVector,
+ int branchReference, int mxtips);
+
+static inline void computeVectorGTRGAMMAPROT(double *lVector, int *eVector, double *gammaRates, int i, double qz, double rz,
+ traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector,
+ unsigned char **yVector, int mxtips);
+
+static double evaluatePartialGTRGAMMAPROT(int i, int counter, traversalInfo *ti, double qz,
+ int w, double *EIGN, double *EI, double *EV,
+ double *tipVector, unsigned char **yVector,
+ double *gammaRates,
+ int branchReference, int mxtips);
+
+static inline void computeVectorGTRCAT(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+ traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector,
+ unsigned char **yVector, int mxtips);
+
+static double evaluatePartialGTRCAT(int i, double ki, int counter, traversalInfo *ti, double qz,
+ int w, double *EIGN, double *EI, double *EV,
+ double *tipVector, unsigned char **yVector,
+ int branchReference, int mxtips);
+
+
+static inline void computeVectorGTRCAT_BINARY(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+ traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector,
+ unsigned char **yVector, int mxtips);
+
+static double evaluatePartialGTRCAT_BINARY(int i, double ki, int counter, traversalInfo *ti, double qz,
+ int w, double *EIGN, double *EI, double *EV,
+ double *tipVector, unsigned char **yVector,
+ int branchReference, int mxtips);
+
+
+#else
+
+static inline void computeVectorCAT_FLEX(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+ traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector,
+ unsigned char **yVector, int mxtips, const int states)
+{
+ double
+ *d1 = (double *)malloc(sizeof(double) * states),
+ *d2 = (double *)malloc(sizeof(double) * states),
+ *x1px2 = (double *)malloc(sizeof(double) * states),
+ ump_x1,
+ ump_x2,
+ lz1,
+ lz2,
+ *x1,
+ *x2,
+ *x3;
+
+ int
+ scale,
+ j,
+ k,
+ pNumber = ti->pNumber,
+ rNumber = ti->rNumber,
+ qNumber = ti->qNumber;
+
+ x3 = &lVector[states * (pNumber - mxtips)];
+
+ switch(ti->tipCase)
+ {
+ case TIP_TIP:
+ x1 = &(tipVector[states * yVector[qNumber][i]]);
+ x2 = &(tipVector[states * yVector[rNumber][i]]);
+ break;
+ case TIP_INNER:
+ x1 = &(tipVector[states * yVector[qNumber][i]]);
+ x2 = &(lVector[states * (rNumber - mxtips)]);
+ break;
+ case INNER_INNER:
+ x1 = &(lVector[states * (qNumber - mxtips)]);
+ x2 = &(lVector[states * (rNumber - mxtips)]);
+ break;
+ default:
+ assert(0);
+ }
+
+ lz1 = qz * ki;
+ lz2 = rz * ki;
+
+ d1[0] = x1[0];
+ d2[0] = x2[0];
+
+
+ for(j = 1; j < states; j++)
+ {
+ d1[j] = x1[j] * EXP(EIGN[j] * lz1);
+ d2[j] = x2[j] * EXP(EIGN[j] * lz2);
+ }
+
+
+ for(j = 0; j < states; j++)
+ {
+ ump_x1 = 0.0;
+ ump_x2 = 0.0;
+
+ for(k = 0; k < states; k++)
+ {
+ ump_x1 += d1[k] * EI[j * states + k];
+ ump_x2 += d2[k] * EI[j * states + k];
+ }
+
+ x1px2[j] = ump_x1 * ump_x2;
+ }
+
+ for(j = 0; j < states; j++)
+ x3[j] = 0.0;
+
+ for(j = 0; j < states; j++)
+ for(k = 0; k < states; k++)
+ x3[k] += x1px2[j] * EV[states * j + k];
+
+ scale = 1;
+ for(j = 0; scale && (j < states); j++)
+ scale = ((x3[j] < minlikelihood) && (x3[j] > minusminlikelihood));
+
+ if(scale)
+ {
+ for(j = 0; j < states; j++)
+ x3[j] *= twotothe256;
+ *eVector = *eVector + 1;
+ }
+
+ free(d1);
+ free(d2);
+ free(x1px2);
+
+ return;
+}
+
+
+static double evaluatePartialCAT_FLEX(int i, double ki, int counter, traversalInfo *ti, double qz,
+ int w, double *EIGN, double *EI, double *EV,
+ double *tipVector, unsigned char **yVector,
+ int branchReference, int mxtips, const int states)
+{
+ int
+ scale = 0,
+ k;
+
+ double
+ *lVector = (double *)malloc_aligned(sizeof(double) * states * mxtips),
+ *d = (double *)malloc_aligned(sizeof(double) * states),
+ lz,
+ term,
+ *x1,
+ *x2;
+
+ traversalInfo
+ *trav = &ti[0];
+
+ assert(isTip(trav->pNumber, mxtips));
+
+ x1 = &(tipVector[states * yVector[trav->pNumber][i]]);
+
+ for(k = 1; k < counter; k++)
+ {
+ double
+ qz = ti[k].qz[branchReference],
+ rz = ti[k].rz[branchReference];
+
+ qz = (qz > zmin) ? log(qz) : log(zmin);
+ rz = (rz > zmin) ? log(rz) : log(zmin);
+
+ computeVectorCAT_FLEX(lVector, &scale, ki, i, qz, rz, &ti[k],
+ EIGN, EI, EV,
+ tipVector, yVector, mxtips, states);
+ }
+
+ x2 = &lVector[states * (trav->qNumber - mxtips)];
+
+ assert(0 <= (trav->qNumber - mxtips) && (trav->qNumber - mxtips) < mxtips);
+
+ if(qz < zmin)
+ lz = zmin;
+ lz = log(qz);
+ lz *= ki;
+
+ d[0] = 1.0;
+
+ for(k = 1; k < states; k++)
+ d[k] = EXP (EIGN[k] * lz);
+
+ term = 0.0;
+
+ for(k = 0; k < states; k++)
+ term += x1[k] * x2[k] * d[k];
+
+ term = LOG(FABS(term)) + (scale * LOG(minlikelihood));
+
+ term = term * w;
+
+ free(lVector);
+ free(d);
+
+ return term;
+}
+
+#endif
+
+double evaluatePartialGeneric (tree *tr, int i, double ki, int _model)
+{
+ double result;
+ int
+ branchReference,
+ states = tr->partitionData[_model].states;
+
+ int
+ index;
+
+ index = i;
+
+
+ if(tr->numBranches > 1)
+ branchReference = _model;
+ else
+ branchReference = 0;
+
+#ifndef _OPTIMIZED_FUNCTIONS
+ if(tr->rateHetModel == CAT)
+ result = evaluatePartialCAT_FLEX(index, ki, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference],
+ tr->partitionData[_model].wgt[index],
+ tr->partitionData[_model].EIGN,
+ tr->partitionData[_model].EI,
+ tr->partitionData[_model].EV,
+ tr->partitionData[_model].tipVector,
+ tr->partitionData[_model].yVector, branchReference, tr->mxtips, states);
+ else
+ /*
+ the per-site site likelihood function should only be called for the CAT model
+ under the GAMMA model this is required only for estimating per-site protein models
+ which has however been removed in this version of the code
+ */
+ assert(0);
+
+
+#elif defined(__MIC_NATIVE)
+if (tr->rateHetModel == CAT)
+ result = evaluatePartialCAT_FLEX(index, ki, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference],
+ tr->partitionData[_model].wgt[index],
+ tr->partitionData[_model].EIGN,
+ tr->partitionData[_model].EI,
+ tr->partitionData[_model].EV,
+ tr->partitionData[_model].tipVector,
+ tr->partitionData[_model].yVector, branchReference, tr->mxtips, states);
+else
+ assert(0);
+
+#else
+ switch(states)
+ {
+ case 2:
+ assert(!tr->saveMemory);
+ assert(tr->rateHetModel == CAT);
+
+ result = evaluatePartialGTRCAT_BINARY(index, ki, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference],
+ tr->partitionData[_model].wgt[index],
+ tr->partitionData[_model].EIGN,
+ tr->partitionData[_model].EI,
+ tr->partitionData[_model].EV,
+ tr->partitionData[_model].tipVector,
+ tr->partitionData[_model].yVector, branchReference, tr->mxtips);
+
+
+
+ break;
+ case 4: /* DNA */
+ assert(tr->rateHetModel == CAT);
+
+ result = evaluatePartialGTRCAT(index, ki, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference],
+ tr->partitionData[_model].wgt[index],
+ tr->partitionData[_model].EIGN,
+ tr->partitionData[_model].EI,
+ tr->partitionData[_model].EV,
+ tr->partitionData[_model].tipVector,
+ tr->partitionData[_model].yVector, branchReference, tr->mxtips);
+ break;
+ case 20: /* proteins */
+ if(tr->rateHetModel == CAT)
+ result = evaluatePartialGTRCATPROT(index, ki, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference],
+ tr->partitionData[_model].wgt[index],
+ tr->partitionData[_model].EIGN,
+ tr->partitionData[_model].EI,
+ tr->partitionData[_model].EV,
+ tr->partitionData[_model].tipVector,
+ tr->partitionData[_model].yVector, branchReference, tr->mxtips);
+ else
+ result = evaluatePartialGTRGAMMAPROT(index, tr->td[0].count, tr->td[0].ti, tr->td[0].ti[0].qz[branchReference],
+ tr->partitionData[_model].wgt[index],
+ tr->partitionData[_model].EIGN,
+ tr->partitionData[_model].EI,
+ tr->partitionData[_model].EV,
+ tr->partitionData[_model].tipVector,
+ tr->partitionData[_model].yVector,
+ tr->partitionData[_model].gammaRates,
+ branchReference, tr->mxtips);
+ break;
+ default:
+ assert(0);
+ }
+ #endif
+
+
+ return result;
+}
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+
+static inline void computeVectorGTRCAT_BINARY(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+ traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector,
+ unsigned char **yVector, int mxtips)
+{
+ double d1, d2, ump_x1, ump_x2, x1px2[2], lz1, lz2;
+ double *x1, *x2, *x3;
+ int
+ j, k,
+ pNumber = ti->pNumber,
+ rNumber = ti->rNumber,
+ qNumber = ti->qNumber;
+
+ x3 = &lVector[2 * (pNumber - mxtips)];
+
+ switch(ti->tipCase)
+ {
+ case TIP_TIP:
+ x1 = &(tipVector[2 * yVector[qNumber][i]]);
+ x2 = &(tipVector[2 * yVector[rNumber][i]]);
+ break;
+ case TIP_INNER:
+ x1 = &(tipVector[2 * yVector[qNumber][i]]);
+ x2 = &lVector[2 * (rNumber - mxtips)];
+ break;
+ case INNER_INNER:
+ x1 = &lVector[2 * (qNumber - mxtips)];
+ x2 = &lVector[2 * (rNumber - mxtips)];
+ break;
+ default:
+ assert(0);
+ }
+
+ lz1 = qz * ki;
+ lz2 = rz * ki;
+
+
+ d1 = x1[1] * EXP(EIGN[1] * lz1);
+ d2 = x2[1] * EXP(EIGN[1] * lz2);
+
+ for(j = 0; j < 2; j++)
+ {
+ ump_x1 = x1[0];
+ ump_x2 = x2[0];
+
+ ump_x1 += d1 * EI[j * 2 + 1];
+ ump_x2 += d2 * EI[j * 2 + 1];
+
+ x1px2[j] = ump_x1 * ump_x2;
+ }
+
+ for(j = 0; j < 2; j++)
+ x3[j] = 0.0;
+
+ for(j = 0; j < 2; j++)
+ for(k = 0; k < 2; k++)
+ x3[k] += x1px2[j] * EV[2 * j + k];
+
+
+ if (x3[0] < minlikelihood && x3[0] > minusminlikelihood &&
+ x3[1] < minlikelihood && x3[1] > minusminlikelihood
+ )
+ {
+ x3[0] *= twotothe256;
+ x3[1] *= twotothe256;
+ *eVector = *eVector + 1;
+ }
+
+ return;
+}
+
+static double evaluatePartialGTRCAT_BINARY(int i, double ki, int counter, traversalInfo *ti, double qz,
+ int w, double *EIGN, double *EI, double *EV,
+ double *tipVector, unsigned char **yVector,
+ int branchReference, int mxtips)
+{
+ double lz, term;
+ double d;
+ double *x1, *x2;
+ int scale = 0, k;
+ double *lVector = (double *)malloc(sizeof(double) * 2 * mxtips);
+ traversalInfo *trav = &ti[0];
+
+ assert(isTip(trav->pNumber, mxtips));
+
+ x1 = &(tipVector[2 * yVector[trav->pNumber][i]]);
+
+ for(k = 1; k < counter; k++)
+ {
+ double
+ qz = ti[k].qz[branchReference],
+ rz = ti[k].rz[branchReference];
+
+ qz = (qz > zmin) ? log(qz) : log(zmin);
+ rz = (rz > zmin) ? log(rz) : log(zmin);
+
+ computeVectorGTRCAT_BINARY(lVector, &scale, ki, i, qz, rz, &ti[k],
+ EIGN, EI, EV,
+ tipVector, yVector, mxtips);
+ }
+
+ x2 = &lVector[2 * (trav->qNumber - mxtips)];
+
+ assert(0 <= (trav->qNumber - mxtips) && (trav->qNumber - mxtips) < mxtips);
+
+ if(qz < zmin)
+ lz = zmin;
+ lz = log(qz);
+ lz *= ki;
+
+ d = EXP(EIGN[1] * lz);
+
+ term = x1[0] * x2[0];
+ term += x1[1] * x2[1] * d;
+
+ term = LOG(FABS(term)) + (scale * LOG(minlikelihood));
+
+ term = term * w;
+
+ free(lVector);
+
+ return term;
+}
+
+
+
+static inline void computeVectorGTRCATPROT(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+ traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector,
+ unsigned char **yVector, int mxtips)
+{
+ double *x1, *x2, *x3;
+ int
+ pNumber = ti->pNumber,
+ rNumber = ti->rNumber,
+ qNumber = ti->qNumber;
+
+ x3 = &(lVector[20 * (pNumber - mxtips)]);
+
+ switch(ti->tipCase)
+ {
+ case TIP_TIP:
+ x1 = &(tipVector[20 * yVector[qNumber][i]]);
+ x2 = &(tipVector[20 * yVector[rNumber][i]]);
+ break;
+ case TIP_INNER:
+ x1 = &(tipVector[20 * yVector[qNumber][i]]);
+ x2 = &( lVector[20 * (rNumber - mxtips)]);
+ break;
+ case INNER_INNER:
+ x1 = &(lVector[20 * (qNumber - mxtips)]);
+ x2 = &(lVector[20 * (rNumber - mxtips)]);
+ break;
+ default:
+ assert(0);
+ }
+
+ {
+ double
+ e1[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ e2[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ d1[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ d2[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ lz1,
+ lz2;
+
+ int
+ l,
+ k,
+ scale;
+
+ lz1 = qz * ki;
+ lz2 = rz * ki;
+
+ e1[0] = 1.0;
+ e2[0] = 1.0;
+
+ for(l = 1; l < 20; l++)
+ {
+ e1[l] = EXP(EIGN[l] * lz1);
+ e2[l] = EXP(EIGN[l] * lz2);
+ }
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d d1v = _mm_mul_pd(_mm_load_pd(&x1[l]), _mm_load_pd(&e1[l]));
+ __m128d d2v = _mm_mul_pd(_mm_load_pd(&x2[l]), _mm_load_pd(&e2[l]));
+
+ _mm_store_pd(&d1[l], d1v);
+ _mm_store_pd(&d2[l], d2v);
+ }
+
+ __m128d zero = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&x3[l], zero);
+
+ for(l = 0; l < 20; l++)
+ {
+ double *ev = &EV[l * 20];
+ __m128d ump_x1v = _mm_setzero_pd();
+ __m128d ump_x2v = _mm_setzero_pd();
+ __m128d x1px2v;
+
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d eiv = _mm_load_pd(&EI[20 * l + k]);
+ __m128d d1v = _mm_load_pd(&d1[k]);
+ __m128d d2v = _mm_load_pd(&d2[k]);
+ ump_x1v = _mm_add_pd(ump_x1v, _mm_mul_pd(d1v, eiv));
+ ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(d2v, eiv));
+ }
+
+ ump_x1v = _mm_hadd_pd(ump_x1v, ump_x1v);
+ ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+
+ x1px2v = _mm_mul_pd(ump_x1v, ump_x2v);
+
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&x3[k]);
+ __m128d EVV = _mm_load_pd(&ev[k]);
+ ex3v = _mm_add_pd(ex3v, _mm_mul_pd(x1px2v, EVV));
+
+ _mm_store_pd(&x3[k], ex3v);
+ }
+ }
+
+ scale = 1;
+ for(l = 0; scale && (l < 20); l++)
+ scale = ((x3[l] < minlikelihood) && (x3[l] > minusminlikelihood));
+
+ if(scale)
+ {
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d ex3v = _mm_mul_pd(_mm_load_pd(&x3[l]),twoto);
+ _mm_store_pd(&x3[l], ex3v);
+ }
+
+
+
+ *eVector = *eVector + 1;
+ }
+
+ return;
+ }
+}
+
+static double evaluatePartialGTRCATPROT(int i, double ki, int counter, traversalInfo *ti, double qz,
+ int w, double *EIGN, double *EI, double *EV,
+ double *tipVector, unsigned char **yVector,
+ int branchReference, int mxtips)
+{
+ double lz, term;
+ double d[20];
+ double *x1, *x2;
+ int scale = 0, k, l;
+ double
+ *lVector = (double *)malloc_aligned(sizeof(double) * 20 * mxtips),
+ myEI[400] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ traversalInfo *trav = &ti[0];
+
+
+
+ for(k = 0; k < 20; k++)
+ {
+ for(l = 0; l < 20; l++)
+ myEI[k * 20 + l] = EI[k * 20 + l];
+ }
+
+ assert(isTip(trav->pNumber, mxtips));
+
+ x1 = &(tipVector[20 * yVector[trav->pNumber][i]]);
+
+ for(k = 1; k < counter; k++)
+ {
+ double
+ qz = ti[k].qz[branchReference],
+ rz = ti[k].rz[branchReference];
+
+ qz = (qz > zmin) ? log(qz) : log(zmin);
+ rz = (rz > zmin) ? log(rz) : log(zmin);
+
+ computeVectorGTRCATPROT(lVector, &scale, ki, i, qz, rz,
+ &ti[k], EIGN, myEI, EV,
+ tipVector, yVector, mxtips);
+ }
+
+ x2 = &lVector[20 * (trav->qNumber - mxtips)];
+
+
+
+ assert(0 <= (trav->qNumber - mxtips) && (trav->qNumber - mxtips) < mxtips);
+
+ if(qz < zmin)
+ lz = zmin;
+ lz = log(qz);
+ lz *= ki;
+
+ d[0] = 1.0;
+ for(l = 1; l < 20; l++)
+ d[l] = EXP (EIGN[l] * lz);
+
+ term = 0.0;
+
+ for(l = 0; l < 20; l++)
+ term += x1[l] * x2[l] * d[l];
+
+ term = LOG(FABS(term)) + (scale * LOG(minlikelihood));
+
+ term = term * w;
+
+ free(lVector);
+
+
+ return term;
+}
+static inline void computeVectorGTRGAMMAPROT(double *lVector, int *eVector, double *gammaRates, int i, double qz, double rz,
+ traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector,
+ unsigned char **yVector, int mxtips)
+{
+ double
+ *x1,
+ *x2,
+ *x3;
+
+ int
+ s,
+ pNumber = ti->pNumber,
+ rNumber = ti->rNumber,
+ qNumber = ti->qNumber,
+ index1[4],
+ index2[4];
+
+
+ x3 = &(lVector[80 * (pNumber - mxtips)]);
+
+ switch(ti->tipCase)
+ {
+ case TIP_TIP:
+ x1 = &(tipVector[20 * yVector[qNumber][i]]);
+ x2 = &(tipVector[20 * yVector[rNumber][i]]);
+ for(s = 0; s < 4; s++)
+ {
+ index1[s] = 0;
+ index2[s] = 0;
+ }
+ break;
+ case TIP_INNER:
+ x1 = &(tipVector[20 * yVector[qNumber][i]]);
+ x2 = &( lVector[80 * (rNumber - mxtips)]);
+ for(s = 0; s < 4; s++)
+ index1[s] = 0;
+ for(s = 0; s < 4; s++)
+ index2[s] = s;
+ break;
+ case INNER_INNER:
+ x1 = &(lVector[80 * (qNumber - mxtips)]);
+ x2 = &(lVector[80 * (rNumber - mxtips)]);
+ for(s = 0; s < 4; s++)
+ {
+ index1[s] = s;
+ index2[s] = s;
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ {
+ double
+ e1[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ e2[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ d1[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ d2[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ lz1, lz2;
+
+ int
+ l,
+ k,
+ scale,
+ j;
+
+ for(j = 0; j < 4; j++)
+ {
+ lz1 = qz * gammaRates[j];
+ lz2 = rz * gammaRates[j];
+
+ e1[0] = 1.0;
+ e2[0] = 1.0;
+
+ for(l = 1; l < 20; l++)
+ {
+ e1[l] = EXP(EIGN[l] * lz1);
+ e2[l] = EXP(EIGN[l] * lz2);
+ }
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d d1v = _mm_mul_pd(_mm_load_pd(&x1[20 * index1[j] + l]), _mm_load_pd(&e1[l]));
+ __m128d d2v = _mm_mul_pd(_mm_load_pd(&x2[20 * index2[j] + l]), _mm_load_pd(&e2[l]));
+
+ _mm_store_pd(&d1[l], d1v);
+ _mm_store_pd(&d2[l], d2v);
+ }
+
+ __m128d zero = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&x3[j * 20 + l], zero);
+
+ for(l = 0; l < 20; l++)
+ {
+ double *ev = &EV[l * 20];
+ __m128d ump_x1v = _mm_setzero_pd();
+ __m128d ump_x2v = _mm_setzero_pd();
+ __m128d x1px2v;
+
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d eiv = _mm_load_pd(&EI[20 * l + k]);
+ __m128d d1v = _mm_load_pd(&d1[k]);
+ __m128d d2v = _mm_load_pd(&d2[k]);
+ ump_x1v = _mm_add_pd(ump_x1v, _mm_mul_pd(d1v, eiv));
+ ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(d2v, eiv));
+ }
+
+ ump_x1v = _mm_hadd_pd(ump_x1v, ump_x1v);
+ ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+
+ x1px2v = _mm_mul_pd(ump_x1v, ump_x2v);
+
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&x3[j * 20 + k]);
+ __m128d EVV = _mm_load_pd(&ev[k]);
+ ex3v = _mm_add_pd(ex3v, _mm_mul_pd(x1px2v, EVV));
+
+ _mm_store_pd(&x3[j * 20 + k], ex3v);
+ }
+ }
+ }
+
+ scale = 1;
+ for(l = 0; scale && (l < 80); l++)
+ scale = ((x3[l] < minlikelihood) && (x3[l] > minusminlikelihood));
+
+ if(scale)
+ {
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 80; l+=2)
+ {
+ __m128d ex3v = _mm_mul_pd(_mm_load_pd(&x3[l]),twoto);
+ _mm_store_pd(&x3[l], ex3v);
+ }
+
+ *eVector = *eVector + 1;
+ }
+
+ return;
+ }
+}
+
+
+static double evaluatePartialGTRGAMMAPROT(int i, int counter, traversalInfo *ti, double qz,
+ int w, double *EIGN, double *EI, double *EV,
+ double *tipVector, unsigned char **yVector,
+ double *gammaRates,
+ int branchReference, int mxtips)
+{
+ double lz, term;
+ double d[80];
+ double *x1, *x2;
+ int scale = 0, k, l, j;
+ double
+ *lVector = (double *)malloc_aligned(sizeof(double) * 80 * mxtips),
+ myEI[400] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ traversalInfo
+ *trav = &ti[0];
+
+ for(k = 0; k < 20; k++)
+ {
+ for(l = 0; l < 20; l++)
+ myEI[k * 20 + l] = EI[k * 20 + l];
+ }
+
+ assert(isTip(trav->pNumber, mxtips));
+
+ x1 = &(tipVector[20 * yVector[trav->pNumber][i]]);
+
+ for(k = 1; k < counter; k++)
+ {
+ double
+ qz = ti[k].qz[branchReference],
+ rz = ti[k].rz[branchReference];
+
+ qz = (qz > zmin) ? log(qz) : log(zmin);
+ rz = (rz > zmin) ? log(rz) : log(zmin);
+
+ computeVectorGTRGAMMAPROT(lVector, &scale, gammaRates, i, qz, rz,
+ &ti[k], EIGN, myEI, EV,
+ tipVector, yVector, mxtips);
+ }
+
+ x2 = &lVector[80 * (trav->qNumber - mxtips)];
+
+ assert(0 <= (trav->qNumber - mxtips) && (trav->qNumber - mxtips) < mxtips);
+
+ if(qz < zmin)
+ lz = zmin;
+ lz = log(qz);
+
+ for(j = 0; j < 4; j++)
+ {
+ d[20 * j] = 1.0;
+ for(l = 1; l < 20; l++)
+ d[20 * j + l] = EXP(EIGN[l] * lz * gammaRates[j]);
+ }
+
+
+ for(j = 0, term = 0.0; j < 4; j++)
+ {
+ for(l = 0; l < 20; l++)
+ term += x1[l] * x2[20 * j + l] * d[j * 20 + l];
+ }
+
+ term = LOG(0.25 * FABS(term)) + (scale * LOG(minlikelihood));
+
+ term = term * w;
+
+ free(lVector);
+
+
+ return term;
+}
+
+
+
+
+
+static inline void computeVectorGTRCAT(double *lVector, int *eVector, double ki, int i, double qz, double rz,
+ traversalInfo *ti, double *EIGN, double *EI, double *EV, double *tipVector,
+ unsigned char **yVector, int mxtips)
+{
+ double d1[3], d2[3], ump_x1, ump_x2, x1px2[4], lz1, lz2;
+ double *x1, *x2, *x3;
+ int j, k,
+ pNumber = ti->pNumber,
+ rNumber = ti->rNumber,
+ qNumber = ti->qNumber;
+
+ x3 = &lVector[4 * (pNumber - mxtips)];
+
+
+ switch(ti->tipCase)
+ {
+ case TIP_TIP:
+ x1 = &(tipVector[4 * yVector[qNumber][i]]);
+ x2 = &(tipVector[4 * yVector[rNumber][i]]);
+ break;
+ case TIP_INNER:
+ x1 = &(tipVector[4 * yVector[qNumber][i]]);
+ x2 = &lVector[4 * (rNumber - mxtips)];
+ break;
+ case INNER_INNER:
+ x1 = &lVector[4 * (qNumber - mxtips)];
+ x2 = &lVector[4 * (rNumber - mxtips)];
+ break;
+ default:
+ assert(0);
+ }
+
+ lz1 = qz * ki;
+ lz2 = rz * ki;
+
+ for(j = 0; j < 3; j++)
+ {
+ d1[j] =
+ x1[j + 1] *
+ EXP(EIGN[j + 1] * lz1);
+ d2[j] = x2[j + 1] * EXP(EIGN[j + 1] * lz2);
+ }
+
+
+ for(j = 0; j < 4; j++)
+ {
+ ump_x1 = x1[0];
+ ump_x2 = x2[0];
+ for(k = 0; k < 3; k++)
+ {
+ ump_x1 += d1[k] * EI[j * 4 + k + 1];
+ ump_x2 += d2[k] * EI[j * 4 + k + 1];
+ }
+ x1px2[j] = ump_x1 * ump_x2;
+ }
+
+ for(j = 0; j < 4; j++)
+ x3[j] = 0.0;
+
+ for(j = 0; j < 4; j++)
+ for(k = 0; k < 4; k++)
+ x3[k] += x1px2[j] * EV[4 * j + k];
+
+
+ if (x3[0] < minlikelihood && x3[0] > minusminlikelihood &&
+ x3[1] < minlikelihood && x3[1] > minusminlikelihood &&
+ x3[2] < minlikelihood && x3[2] > minusminlikelihood &&
+ x3[3] < minlikelihood && x3[3] > minusminlikelihood)
+ {
+ x3[0] *= twotothe256;
+ x3[1] *= twotothe256;
+ x3[2] *= twotothe256;
+ x3[3] *= twotothe256;
+ *eVector = *eVector + 1;
+ }
+
+ return;
+}
+
+
+
+
+
+
+
+
+static double evaluatePartialGTRCAT(int i, double ki, int counter, traversalInfo *ti, double qz,
+ int w, double *EIGN, double *EI, double *EV,
+ double *tipVector, unsigned char **yVector,
+ int branchReference, int mxtips)
+{
+ double lz, term;
+ double d[3];
+ double *x1, *x2;
+ int scale = 0, k;
+ double *lVector = (double *)malloc_aligned(sizeof(double) * 4 * mxtips);
+
+ traversalInfo *trav = &ti[0];
+
+ assert(isTip(trav->pNumber, mxtips));
+
+ x1 = &(tipVector[4 * yVector[trav->pNumber][i]]);
+
+ for(k = 1; k < counter; k++)
+ {
+ double
+ qz = ti[k].qz[branchReference],
+ rz = ti[k].rz[branchReference];
+
+ qz = (qz > zmin) ? log(qz) : log(zmin);
+ rz = (rz > zmin) ? log(rz) : log(zmin);
+
+ computeVectorGTRCAT(lVector, &scale, ki, i, qz, rz, &ti[k],
+ EIGN, EI, EV,
+ tipVector, yVector, mxtips);
+ }
+
+ x2 = &lVector[4 * (trav->qNumber - mxtips)];
+
+ assert(0 <= (trav->qNumber - mxtips) && (trav->qNumber - mxtips) < mxtips);
+
+ if(qz < zmin)
+ lz = zmin;
+ lz = log(qz);
+ lz *= ki;
+
+ d[0] = EXP (EIGN[1] * lz);
+ d[1] = EXP (EIGN[2] * lz);
+ d[2] = EXP (EIGN[3] * lz);
+
+ term = x1[0] * x2[0];
+ term += x1[1] * x2[1] * d[0];
+ term += x1[2] * x2[2] * d[1];
+ term += x1[3] * x2[3] * d[2];
+
+ term = LOG(FABS(term)) + (scale * LOG(minlikelihood));
+
+ term = term * w;
+
+ free(lVector);
+
+ return term;
+}
+
+
+
+#endif
diff --git a/examl/globalVariables.h b/examl/globalVariables.h
new file mode 100644
index 0000000..7080dc9
--- /dev/null
+++ b/examl/globalVariables.h
@@ -0,0 +1,180 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+
+
+int processes;
+double *globalResult;
+
+
+int processID;
+infoList iList;
+
+int Thorough = 0;
+
+checkPointState ckp;
+
+char run_id[128] = "",
+ workdir[1024] = "",
+ seq_file[1024] = "",
+ tree_file[1024]="",
+ weightFileName[1024] = "",
+ resultFileName[1024] = "",
+ logFileName[1024] = "",
+ infoFileName[1024] = "",
+ randomFileName[1024] = "",
+ proteinModelFileName[1024] = "",
+ binaryCheckpointName[1024] = "",
+ binaryCheckpointInputName[1024] = "",
+ byteFileName[1024] = "",
+ modelFileName[1024] = "",
+ treeFileName[1024] = "",
+ quartetGroupingFileName[1024],
+ quartetFileName[1024];
+
+char *protModels[NUM_PROT_MODELS] = {"DAYHOFF", "DCMUT", "JTT", "MTREV", "WAG", "RTREV", "CPREV", "VT", "BLOSUM62", "MTMAM", "LG", "MTART", "MTZOA", "PMB",
+ "HIVB", "HIVW", "JTTDCMUT", "FLU", "STMTREV", "AUTO", "LG4M", "LG4X", "GTR"};
+
+const char inverseMeaningBINARY[4] = {'_', '0', '1', '-'};
+const char inverseMeaningDNA[16] = {'_', 'A', 'C', 'M', 'G', 'R', 'S', 'V', 'T', 'W', 'Y', 'H', 'K', 'D', 'B', '-'};
+const char inverseMeaningPROT[23] = {'A','R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S',
+ 'T', 'W', 'Y', 'V', 'B', 'Z', '-'};
+const char inverseMeaningGeneric32[33] = {'0', '1', '2', '3', '4', '5', '6', '7',
+ '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
+ 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
+ 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
+ '-'};
+const char inverseMeaningGeneric64[33] = {'0', '1', '2', '3', '4', '5', '6', '7',
+ '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
+ 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
+ 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
+ '-'};
+
+const unsigned int bitVectorIdentity[256] = {0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13 ,14 ,15 ,16 ,17 ,18 ,19 ,20 ,21 ,22 ,23 ,24 ,25 ,26 ,
+ 27 ,28 ,29 ,30 ,31 ,32 ,33 ,34 ,35 ,36 ,37 ,38 ,39 ,40 ,41 ,42 ,43 ,44 ,45 ,46 ,47 ,48 ,49 ,50 ,51 ,
+ 52 ,53 ,54 ,55 ,56 ,57 ,58 ,59 ,60 ,61 ,62 ,63 ,64 ,65 ,66 ,67 ,68 ,69 ,70 ,71 ,72 ,73 ,74 ,75 ,76 ,
+ 77 ,78 ,79 ,80 ,81 ,82 ,83 ,84 ,85 ,86 ,87 ,88 ,89 ,90 ,91 ,92 ,93 ,94 ,95 ,96 ,97 ,98 ,99 ,100 ,101 ,
+ 102 ,103 ,104 ,105 ,106 ,107 ,108 ,109 ,110 ,111 ,112 ,113 ,114 ,115 ,116 ,117 ,118 ,119 ,120 ,121 ,122 ,
+ 123 ,124 ,125 ,126 ,127 ,128 ,129 ,130 ,131 ,132 ,133 ,134 ,135 ,136 ,137 ,138 ,139 ,140 ,141 ,142 ,143 ,
+ 144 ,145 ,146 ,147 ,148 ,149 ,150 ,151 ,152 ,153 ,154 ,155 ,156 ,157 ,158 ,159 ,160 ,161 ,162 ,163 ,164 ,
+ 165 ,166 ,167 ,168 ,169 ,170 ,171 ,172 ,173 ,174 ,175 ,176 ,177 ,178 ,179 ,180 ,181 ,182 ,183 ,184 ,185 ,
+ 186 ,187 ,188 ,189 ,190 ,191 ,192 ,193 ,194 ,195 ,196 ,197 ,198 ,199 ,200 ,201 ,202 ,203 ,204 ,205 ,206 ,
+ 207 ,208 ,209 ,210 ,211 ,212 ,213 ,214 ,215 ,216 ,217 ,218 ,219 ,220 ,221 ,222 ,223 ,224 ,225 ,226 ,227 ,
+ 228 ,229 ,230 ,231 ,232 ,233 ,234 ,235 ,236 ,237 ,238 ,239 ,240 ,241 ,242 ,243 ,244 ,245 ,246 ,247 ,248 ,
+ 249 ,250 ,251 ,252 ,253 ,254 ,255};
+
+
+
+const unsigned int bitVectorAA[23] = {1, 2, 4, 8, 16, 32, 64, 128,
+ 256, 512, 1024, 2048, 4096,
+ 8192, 16384, 32768, 65536, 131072, 262144,
+ 524288, 12 /* N | D */, 96 /*Q | E*/, 1048575 /* - */};
+
+const unsigned int bitVectorSecondary[256] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
+ 10, 11, 12, 13, 14, 15, 0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192,
+ 208, 224, 240, 0, 17, 34, 51, 68, 85, 102, 119, 136, 153, 170, 187, 204, 221, 238,
+ 255, 0, 256, 512, 768, 1024, 1280, 1536, 1792, 2048, 2304, 2560, 2816, 3072, 3328,
+ 3584, 3840, 0, 257, 514, 771, 1028, 1285, 1542, 1799, 2056, 2313, 2570, 2827, 3084,
+ 3341, 3598, 3855, 0, 272, 544, 816, 1088, 1360, 1632, 1904, 2176, 2448, 2720, 2992,
+ 3264, 3536, 3808, 4080, 0, 273, 546, 819, 1092, 1365, 1638, 1911, 2184, 2457, 2730,
+ 3003, 3276, 3549, 3822, 4095, 0, 4096, 8192, 12288, 16384, 20480, 24576, 28672, 32768,
+ 36864, 40960, 45056, 49152, 53248, 57344, 61440, 0, 4097, 8194, 12291, 16388, 20485, 24582,
+ 28679, 32776, 36873, 40970, 45067, 49164, 53261, 57358, 61455, 0, 4112, 8224, 12336, 16448,
+ 20560, 24672, 28784, 32896, 37008, 41120, 45232, 49344, 53456, 57568, 61680, 0, 4113, 8226,
+ 12339, 16452, 20565, 24678, 28791, 32904, 37017, 41130, 45243, 49356, 53469, 57582, 61695,
+ 0, 4352, 8704, 13056, 17408, 21760, 26112, 30464, 34816, 39168, 43520, 47872, 52224, 56576,
+ 60928, 65280, 0, 4353, 8706, 13059, 17412, 21765, 26118, 30471, 34824, 39177, 43530, 47883,
+ 52236, 56589, 60942, 65295, 0, 4368, 8736, 13104, 17472, 21840, 26208, 30576, 34944, 39312,
+ 43680, 48048, 52416, 56784, 61152, 65520, 0, 4369, 8738, 13107, 17476, 21845, 26214, 30583,
+ 34952, 39321, 43690, 48059, 52428, 56797, 61166, 65535};
+
+const unsigned int bitVector32[33] = {1, 2, 4, 8, 16, 32, 64, 128,
+ 256, 512, 1024, 2048, 4096, 8192, 16384, 32768,
+ 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608,
+ 16777216, 33554432, 67108864, 134217728, 268435456, 536870912, 1073741824, 2147483648u,
+ 4294967295u};
+
+/*const unsigned int bitVector64[65] = {};*/
+
+const unsigned int mask32[32] = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072,
+ 262144, 524288, 1048576, 2097152, 4194304, 8388608, 16777216, 33554432, 67108864, 134217728,
+ 268435456, 536870912, 1073741824, 2147483648U};
+
+const char *secondaryModelList[21] = { "S6A (GTR)", "S6B", "S6C", "S6D", "S6E", "S7A (GTR)", "S7B", "S7C", "S7D", "S7E", "S7F", "S16 (GTR)", "S16A", "S16B", "S16C",
+ "S16D", "S16E", "S16F", "S16I", "S16J", "S16K"};
+
+double masterTime;
+double accumulatedTime;
+int optimizeRateCategoryInvocations = 1;
+
+
+
+
+
+partitionLengths pLengths[MAX_MODEL] = {
+
+ /* BINARY */
+ //{4, 4, 2, 4, 2, 1, 2, 8, 2, 2, FALSE, 3, inverseMeaningBINARY, 2, FALSE, bitVectorIdentity},
+ //eiLength changed from 2 -> 4
+ {4, 4, 2, 4, 4, 1, 2, 8, 2, 2, FALSE, 3, inverseMeaningBINARY, 2, FALSE, bitVectorIdentity},
+
+ /* DNA */
+ {16, 16, 4, 16, 16, 6, 4, 64, 6, 4, FALSE, 15, inverseMeaningDNA, 4, FALSE, bitVectorIdentity},
+
+ /* AA */
+ {400, 400, 20, 400, 400, 190, 20, 460, 190, 20, FALSE, 22, inverseMeaningPROT, 20, TRUE, bitVectorAA},
+
+ /* SECONDARY_DATA */
+
+ {256, 256, 16, 256, 256, 120, 16, 4096, 120, 16, FALSE, 255, (char*)NULL, 16, TRUE, bitVectorSecondary},
+
+
+ /* SECONDARY_DATA_6 */
+ {36, 36, 6, 36, 36, 15, 6, 384, 15, 6, FALSE, 63, (char*)NULL, 6, TRUE, bitVectorIdentity},
+
+
+ /* SECONDARY_DATA_7 */
+ {49, 49, 7, 49, 49, 21, 7, 896, 21, 7, FALSE, 127, (char*)NULL, 7, TRUE, bitVectorIdentity},
+
+ /* 32 states */
+ {1024, 1024, 32, 1024, 1024, 496, 32, 1056, 496, 32, FALSE, 32, inverseMeaningGeneric32, 32, TRUE, bitVector32},
+
+ /* 64 states */
+ {4096, 4096, 64, 4096, 4096, 2016, 64, 4160, 64, 2016, FALSE, 64, (char*)NULL, 64, TRUE, (unsigned int*)NULL}
+};
+
+partitionLengths pLength;
+
+
+
+
+
+
+
diff --git a/examl/makenewzGenericSpecial.c b/examl/makenewzGenericSpecial.c
new file mode 100644
index 0000000..6d80c67
--- /dev/null
+++ b/examl/makenewzGenericSpecial.c
@@ -0,0 +1,2747 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with
+ * thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <unistd.h>
+#endif
+
+
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include "axml.h"
+
+#ifdef __SIM_SSE3
+#include <xmmintrin.h>
+#include <pmmintrin.h>
+/*#include <tmmintrin.h>*/
+#endif
+
+/* includes MIC-optimized functions */
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+
+/* pointers to reduction buffers for storing and gathering the first and second derivative
+ of the likelihood in Pthreads and MPI */
+
+
+extern int processID;
+extern const unsigned int mask32[32];
+
+/*******************/
+
+
+/* generic function to get the required pointers to the data associated with the left and right node that define a branch */
+
+static void getVects(tree *tr, unsigned char **tipX1, unsigned char **tipX2, double **x1_start, double **x2_start, int *tipCase, int model,
+ double **x1_gapColumn, double **x2_gapColumn, unsigned int **x1_gap, unsigned int **x2_gap, size_t offset)
+{
+ int
+ rateHet = (int)discreteRateCategories(tr->rateHetModel),
+ states = tr->partitionData[model].states,
+ span = rateHet * states;
+
+ size_t
+ x_offset = offset * (size_t)span;
+
+ int
+ pNumber,
+ qNumber;
+
+ /* get the left and right node number of the nodes defining the branch we want to optimize */
+
+ pNumber = tr->td[0].ti[0].pNumber;
+ qNumber = tr->td[0].ti[0].qNumber;
+
+
+ /* initialize to NULL */
+
+ *x1_start = (double*)NULL,
+ *x2_start = (double*)NULL;
+ *tipX1 = (unsigned char*)NULL,
+ *tipX2 = (unsigned char*)NULL;
+
+ /* switch over the different tip cases again here */
+
+ if(isTip(pNumber, tr->mxtips) || isTip(qNumber, tr->mxtips))
+ {
+ if(!( isTip(pNumber, tr->mxtips) && isTip(qNumber, tr->mxtips)) )
+ {
+ *tipCase = TIP_INNER;
+ if(isTip(qNumber, tr->mxtips))
+ {
+ *tipX1 = tr->partitionData[model].yVector[qNumber] + offset;
+ *x2_start = tr->partitionData[model].xVector[pNumber - tr->mxtips - 1] + x_offset;
+
+ if(tr->saveMemory)
+ {
+ *x2_gap = &(tr->partitionData[model].gapVector[pNumber * tr->partitionData[model].gapVectorLength]);
+ *x2_gapColumn = &tr->partitionData[model].gapColumn[(pNumber - tr->mxtips - 1) * states * rateHet];
+ }
+ }
+ else
+ {
+ *tipX1 = tr->partitionData[model].yVector[pNumber] + offset;
+ *x2_start = tr->partitionData[model].xVector[qNumber - tr->mxtips - 1] + x_offset;
+
+ if(tr->saveMemory)
+ {
+ *x2_gap = &(tr->partitionData[model].gapVector[qNumber * tr->partitionData[model].gapVectorLength]);
+ *x2_gapColumn = &tr->partitionData[model].gapColumn[(qNumber - tr->mxtips - 1) * states * rateHet];
+ }
+ }
+ }
+ else
+ {
+ /* note that tip tip should normally not occur since this means that we are trying to optimize
+ a branch in a two-taxon tree. However, this has been inherited be some RAxML function
+ that optimized pair-wise distances between all taxa in a tree */
+
+ *tipCase = TIP_TIP;
+ *tipX1 = tr->partitionData[model].yVector[pNumber] + offset;
+ *tipX2 = tr->partitionData[model].yVector[qNumber] + offset;
+ }
+ }
+ else
+ {
+ *tipCase = INNER_INNER;
+
+ *x1_start = tr->partitionData[model].xVector[pNumber - tr->mxtips - 1] + x_offset;
+ *x2_start = tr->partitionData[model].xVector[qNumber - tr->mxtips - 1] + x_offset;
+
+ if(tr->saveMemory)
+ {
+ *x1_gap = &(tr->partitionData[model].gapVector[pNumber * tr->partitionData[model].gapVectorLength]);
+ *x1_gapColumn = &tr->partitionData[model].gapColumn[(pNumber - tr->mxtips - 1) * states * rateHet];
+
+ *x2_gap = &(tr->partitionData[model].gapVector[qNumber * tr->partitionData[model].gapVectorLength]);
+ *x2_gapColumn = &tr->partitionData[model].gapColumn[(qNumber - tr->mxtips - 1) * states * rateHet];
+ }
+ }
+
+}
+
+
+/* this is actually a pre-computation and storage of values that remain constant while we change the value of the branch length
+ we want to adapt. the target pointer sumtable is a single pre-allocated array that has the same
+ size as a conditional likelihood vector at an inner node.
+
+ So if we want to do a Newton-Rpahson optimization we only execute this function once in the beginning for each new branch we are considering !
+*/
+
+#ifndef _OPTIMIZED_FUNCTIONS
+
+static void sumCAT_FLEX(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n, const int states)
+{
+ int
+ i,
+ l;
+
+ double
+ *sum,
+ *left,
+ *right;
+
+ switch(tipCase)
+ {
+
+ /* switch over possible configurations of the nodes p and q defining the branch */
+
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ left = &(tipVector[states * tipX1[i]]);
+ right = &(tipVector[states * tipX2[i]]);
+ sum = &sumtable[states * i];
+
+ /* just multiply the values with each other for each site, note the similarity with evaluate()
+ we precompute the product which will remain constant and then just multiply this pre-computed
+ product with the changing P matrix exponentaions that depend on the branch lengths */
+
+ for(l = 0; l < states; l++)
+ sum[l] = left[l] * right[l];
+ }
+ break;
+ case TIP_INNER:
+
+ /* same as for TIP_TIP only that
+ we now access on tip vector and one
+ inner vector.
+
+ You may also observe that we do not consider using scaling vectors anywhere here.
+
+ This is because we are interested in the first and second derivatives of the likelihood and
+ hence the addition of the log() of the scaling factor times the number of scaling events
+ becomes obsolete through the derivative */
+
+ for (i = 0; i < n; i++)
+ {
+ left = &(tipVector[states * tipX1[i]]);
+ right = &x2[states * i];
+ sum = &sumtable[states * i];
+
+ for(l = 0; l < states; l++)
+ sum[l] = left[l] * right[l];
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ left = &x1[states * i];
+ right = &x2[states * i];
+ sum = &sumtable[states * i];
+
+ for(l = 0; l < states; l++)
+ sum[l] = left[l] * right[l];
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+
+/* same thing for GAMMA models. The only noteworthy thing here is that we have an additional inner loop over the
+ number of discrete gamma rates. The data access pattern is also different since for tip vector accesses through our
+ lookup table, we do not distnguish between rates
+
+ Note the different access pattern in TIP_INNER:
+
+ left = &(tipVector[states * tipX1[i]]);
+ right = &(x2[span * i + l * states]);
+
+*/
+
+static void sumGAMMA_FLEX(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n, const int states)
+{
+ int
+ i,
+ l,
+ k;
+
+ const int
+ span = 4 * states;
+
+ double
+ *left,
+ *right,
+ *sum;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for(i = 0; i < n; i++)
+ {
+ left = &(tipVector[states * tipX1[i]]);
+ right = &(tipVector[states * tipX2[i]]);
+
+ for(l = 0; l < 4; l++)
+ {
+ sum = &sumtable[i * span + l * states];
+
+ for(k = 0; k < states; k++)
+ sum[k] = left[k] * right[k];
+
+ }
+ }
+ break;
+ case TIP_INNER:
+ for(i = 0; i < n; i++)
+ {
+ left = &(tipVector[states * tipX1[i]]);
+
+ for(l = 0; l < 4; l++)
+ {
+ right = &(x2[span * i + l * states]);
+ sum = &sumtable[i * span + l * states];
+
+ for(k = 0; k < states; k++)
+ sum[k] = left[k] * right[k];
+
+ }
+ }
+ break;
+ case INNER_INNER:
+ for(i = 0; i < n; i++)
+ {
+ for(l = 0; l < 4; l++)
+ {
+ left = &(x1[span * i + l * states]);
+ right = &(x2[span * i + l * states]);
+ sum = &(sumtable[i * span + l * states]);
+
+
+ for(k = 0; k < states; k++)
+ sum[k] = left[k] * right[k];
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+#endif
+
+/* optimized functions for branch length optimization */
+
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+static void sumGAMMA_BINARY(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n);
+static void coreGTRGAMMA_BINARY(const int upper, double *sumtable,
+ volatile double *d1, volatile double *d2, double *EIGN, double *gammaRates, double lz, int *wrptr);
+static void coreGTRCAT_BINARY(int upper, int numberOfCategories, double *sum,
+ volatile double *d1, volatile double *d2,
+ double *rptr, double *EIGN, int *cptr, double lz, int *wgt);
+static void sumCAT_BINARY(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumCAT_SAVE(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n, double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static void sumGAMMA_GAPPED_SAVE(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static void sumGAMMA(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumCAT(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumGAMMAPROT_GAPPED_SAVE(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static void sumGAMMAPROT_LG4(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector[4],
+ unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumGAMMAPROT(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumGTRCATPROT(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n);
+
+static void sumGTRCATPROT_SAVE(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap);
+
+static void coreGTRGAMMAPROT_LG4(double *gammaRates, double *EIGN[4], double *sumtable, int upper, int *wrptr,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double lz, double *weights);
+
+static void coreGTRGAMMA(const int upper, double *sumtable,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wgt);
+
+static void coreGTRCAT(int upper, int numberOfCategories, double *sum,
+ volatile double *d1, volatile double *d2, int *wgt,
+ double *rptr, double *EIGN, int *cptr, double lz);
+
+
+static void coreGTRGAMMAPROT(double *gammaRates, double *EIGN, double *sumtable, int upper, int *wgt,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double lz);
+
+static void coreGTRCATPROT(double *EIGN, double lz, int numberOfCategories, double *rptr, int *cptr, int upper,
+ int *wgt, volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double *sumtable);
+
+#endif
+
+
+#ifndef _OPTIMIZED_FUNCTIONS
+
+/* now this is the core function of the newton-Raphson based branch length optimization that actually computes
+ the first and second derivative of the likelihood given a new proposed branch length lz */
+
+
+static void coreCAT_FLEX(int upper, int numberOfCategories, double *sum,
+ volatile double *d1, volatile double *d2, int *wgt,
+ double *rptr, double *EIGN, int *cptr, double lz, const int states)
+{
+ int
+ i,
+ l;
+
+ double
+ *d,
+
+ /* arrays to store stuff we can pre-compute */
+
+ *d_start = (double *)malloc_aligned(numberOfCategories * states * sizeof(double)),
+ *e =(double *)malloc_aligned(states * sizeof(double)),
+ *s = (double *)malloc_aligned(states * sizeof(double)),
+ *dd = (double *)malloc_aligned(states * sizeof(double)),
+ inv_Li,
+ dlnLidlz,
+ d2lnLidlz2,
+ dlnLdlz = 0.0,
+ d2lnLdlz2 = 0.0;
+
+ d = d_start;
+
+ e[0] = 0.0;
+ s[0] = 0.0;
+ dd[0] = 0.0;
+
+
+ /* we are pre-computing values for computing the first and second derivative of P(lz)
+ since this requires an exponetial that the only thing we really have to derive here */
+
+ for(l = 1; l < states; l++)
+ {
+ s[l] = EIGN[l];
+ e[l] = EIGN[l] * EIGN[l];
+ dd[l] = s[l] * lz;
+ }
+
+ /* compute the P matrices and their derivatives for
+ all per-site rate categories */
+
+ for(i = 0; i < numberOfCategories; i++)
+ {
+ d[states * i] = 1.0;
+ for(l = 1; l < states; l++)
+ d[states * i + l] = EXP(dd[l] * rptr[i]);
+ }
+
+
+ /* now loop over the sites in this partition to obtain the per-site 1st and 2nd derivatives */
+
+ for (i = 0; i < upper; i++)
+ {
+ double
+ r = rptr[cptr[i]],
+ wr1 = r * wgt[i],
+ wr2 = r * r * wgt[i];
+
+ /* get the correct p matrix for the rate at the current site i */
+
+ d = &d_start[states * cptr[i]];
+
+ /* this is the likelihood at site i, NOT the log likelihood, we don't need the log
+ likelihood to compute derivatives ! */
+
+ inv_Li = sum[states * i];
+
+ /* those are for storing the first and second derivative of the Likelihood at site i */
+
+ dlnLidlz = 0.0;
+ d2lnLidlz2 = 0.0;
+
+ /* now multiply the likelihood and the first and second derivative with the
+ appropriate derivatives of P(lz) */
+
+ for(l = 1; l < states; l++)
+ {
+ double
+ tmpv = d[l] * sum[states * i + l];
+
+ inv_Li += tmpv;
+ dlnLidlz += tmpv * s[l];
+ d2lnLidlz2 += tmpv * e[l];
+ }
+
+ /* below we are implementing the other mathematical operations that are required
+ to obtain the deirivatives */
+
+ inv_Li = 1.0/ FABS(inv_Li);
+
+ dlnLidlz *= inv_Li;
+ d2lnLidlz2 *= inv_Li;
+
+ /* under the CAT model, wrptr[] and wr2ptr[] are pre-computed extension sof the weight pointer:
+ wrptr[i] = wgt[i] * rptr[cptr[i]].
+ and
+ wr2ptr[i] = wgt[i] * rptr[cptr[i]] * rptr[cptr[i]]
+
+ this is also something that is required for the derivatives because when computing the
+ derivative of the exponential() the rate must be multiplied with the
+ exponential
+
+ wgt is just the pattern site wieght
+ */
+
+ /* compute the accumulated first and second derivatives of this site */
+
+ dlnLdlz += wr1 * dlnLidlz;
+ d2lnLdlz2 += wr2 * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+ }
+
+ /*
+ set the result values, i.e., the sum of the per-site first and second derivatives of the likelihood function
+ for this partition.
+ */
+
+ *d1 = dlnLdlz;
+ *d2 = d2lnLdlz2;
+
+ /* free the temporary arrays */
+
+ free(d_start);
+ free(e);
+ free(s);
+ free(dd);
+}
+
+static void coreGAMMA_FLEX(int upper, double *sumtable, volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2,
+ double *EIGN, double *gammaRates, double lz, int *wgt, const int states)
+{
+ double
+ *sum,
+ diagptable[1024], /* TODO make this dynamic */
+ dlnLdlz = 0.0,
+ d2lnLdlz2 = 0.0,
+ ki,
+ kisqr,
+ tmp,
+ inv_Li,
+ dlnLidlz,
+ d2lnLidlz2;
+
+ int
+ i,
+ j,
+ l;
+
+ const int
+ gammaStates = 4 * states;
+
+ /* pre-compute the derivatives of the P matrix for all discrete GAMMA rates */
+
+ for(i = 0; i < 4; i++)
+ {
+ ki = gammaRates[i];
+ kisqr = ki * ki;
+
+ for(l = 1; l < states; l++)
+ {
+ diagptable[i * gammaStates + l * 4] = EXP(EIGN[l] * ki * lz);
+ diagptable[i * gammaStates + l * 4 + 1] = EIGN[l] * ki;
+ diagptable[i * gammaStates + l * 4 + 2] = EIGN[l] * EIGN[l] * kisqr;
+ }
+ }
+
+ /* loop over sites in this partition */
+
+ for (i = 0; i < upper; i++)
+ {
+ double
+ r = rptr[cptr[i]],
+ wr1 = r * wgt[i],
+ wr2 = r * r * wgt[i];
+
+ /* access the array with pre-computed values */
+ sum = &sumtable[i * gammaStates];
+
+ /* initial per-site likelihood and 1st and 2nd derivatives */
+
+ inv_Li = 0.0;
+ dlnLidlz = 0.0;
+ d2lnLidlz2 = 0.0;
+
+ /* loop over discrete GAMMA rates */
+
+ for(j = 0; j < 4; j++)
+ {
+ inv_Li += sum[j * states];
+
+ for(l = 1; l < states; l++)
+ {
+ inv_Li += (tmp = diagptable[j * gammaStates + l * 4] * sum[j * states + l]);
+ dlnLidlz += tmp * diagptable[j * gammaStates + l * 4 + 1];
+ d2lnLidlz2 += tmp * diagptable[j * gammaStates + l * 4 + 2];
+ }
+ }
+
+ /* finalize derivative computation */
+ /* note that wrptr[] here unlike in CAT above is the
+ integer weight vector of the current site
+
+ The operations:
+
+ EIGN[l] * ki;
+ EIGN[l] * EIGN[l] * kisqr;
+
+ that are hidden in CAT in wrptr (at least the * ki and * ki *ki part of them
+ are done explicitely here
+
+ */
+
+ inv_Li = 1.0 / FABS(inv_Li);
+
+ dlnLidlz *= inv_Li;
+ d2lnLidlz2 *= inv_Li;
+
+ dlnLdlz += wr1 * dlnLidlz;
+ d2lnLdlz2 += wr2 * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+ }
+
+ *ext_dlnLdlz = dlnLdlz;
+ *ext_d2lnLdlz2 = d2lnLdlz2;
+
+}
+
+#endif
+
+/* the function below is called only once at the very beginning of each Newton-Raphson procedure for optimizing barnch lengths.
+ It initially invokes an iterative newview call to get a consistent pair of vectors at the left and the right end of the
+ branch and thereafter invokes the one-time only precomputation of values (sumtable) that can be re-used in each Newton-Raphson
+ iteration. Once this function has been called we can execute the actual NR procedure */
+
+void makenewzIterative(tree *tr)
+{
+ /* call newvieIterative to get the likelihood arrays to the left and right of the branch */
+
+ newviewIterative(tr, 1);
+
+
+ /*
+ loop over all partoitions to do the precomputation of the sumTable buffer
+ This is analogous to the newviewIterative() and evaluateIterative()
+ implementations.
+ */
+#ifdef _USE_OMP
+#pragma omp parallel
+#endif
+ {
+ int
+ m,
+ model,
+ maxModel,
+ tipCase;
+
+#ifdef _USE_OMP
+ maxModel = tr->maxModelsPerThread;
+#else
+ maxModel = tr->NumberOfModels;
+#endif
+
+ double
+ *x1_start = (double*)NULL,
+ *x2_start = (double*)NULL;
+
+ unsigned char
+ *tipX1,
+ *tipX2;
+
+ double
+ *x1_gapColumn = (double*)NULL,
+ *x2_gapColumn = (double*)NULL;
+
+ unsigned int
+ *x1_gap = (unsigned int*)NULL,
+ *x2_gap = (unsigned int*)NULL;
+
+
+ for(m = 0; m < maxModel; m++)
+ {
+ size_t
+ width = 0,
+ offset = 0;
+
+#ifdef _USE_OMP
+ int
+ tid = omp_get_thread_num();
+
+ /* check if this thread should process this partition */
+ Assign*
+ pAss = tr->threadPartAssigns[tid * tr->maxModelsPerThread + m];
+
+ if(pAss)
+ {
+ model = pAss->partitionId;
+ width = pAss->width;
+ offset = pAss->offset;
+ }
+ else
+ break;
+#else
+ model = m;
+
+ /* number of sites in this partition */
+ width = (size_t)tr->partitionData[model].width;
+ offset = 0;
+#endif
+
+
+ if(tr->td[0].executeModel[model] && width > 0)
+ {
+ int
+ rateHet = (int)discreteRateCategories(tr->rateHetModel),
+
+ /* get the number of states in the partition, e.g.: 4 = DNA, 20 = Protein */
+ states = tr->partitionData[model].states,
+
+ /* span for single alignment site (in doubles!) */
+ span = rateHet * states;
+
+ size_t
+ /* offset for current thread's data in global xVector (in doubles!) */
+ x_offset = offset * (size_t)span;
+
+ getVects(tr, &tipX1, &tipX2, &x1_start, &x2_start, &tipCase, model, &x1_gapColumn, &x2_gapColumn, &x1_gap, &x2_gap, offset);
+
+ double
+ *sumBuffer = tr->partitionData[model].sumBuffer + x_offset;
+
+#ifndef _OPTIMIZED_FUNCTIONS
+ assert(!tr->saveMemory);
+ if(tr->rateHetModel == CAT)
+ sumCAT_FLEX(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+ width, states);
+ else
+ sumGAMMA_FLEX(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+ width, states);
+#else
+ switch(states)
+ {
+ case 2:
+#ifdef __MIC_NATIVE
+ assert(0 && "Binary data model is not implemented on Intel MIC");
+#else
+ assert(!tr->saveMemory);
+ if(tr->rateHetModel == CAT)
+ sumCAT_BINARY(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2, width);
+ else
+ sumGAMMA_BINARY(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2, width);
+#endif
+ break;
+ case 4: /* DNA */
+ if(tr->rateHetModel == CAT)
+ {
+ if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#else
+ sumCAT_SAVE(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+ width, x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+ else
+#ifdef __MIC_NATIVE
+ assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#else
+ sumCAT(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+ width);
+#endif
+ }
+ else
+ {
+ if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ assert(0 && "Memory saving is not implemented on Intel MIC");
+#else
+ sumGAMMA_GAPPED_SAVE(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+ width, x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+ else
+#ifdef __MIC_NATIVE
+ sumGAMMA_MIC(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].mic_tipVector, tipX1, tipX2,
+ width);
+#else
+ sumGAMMA(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+ width);
+#endif
+ }
+ break;
+ case 20: /* proteins */
+ if(tr->rateHetModel == CAT)
+ {
+ if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#else
+ sumGTRCATPROT_SAVE(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2, width, x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+ else
+#ifdef __MIC_NATIVE
+ assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#else
+ sumGTRCATPROT(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2, width);
+#endif
+ }
+ else
+ {
+ if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ assert(0 && "Memory saving is not implemented on Intel MIC");
+#else
+ sumGAMMAPROT_GAPPED_SAVE(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector, tipX1, tipX2,
+ width, x1_gapColumn, x2_gapColumn, x1_gap, x2_gap);
+#endif
+ else
+ {
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+#ifdef __MIC_NATIVE
+ sumGAMMAPROT_LG4_MIC(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].mic_tipVector, tipX1, tipX2,
+ width);
+#else
+ sumGAMMAPROT_LG4(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector_LG4,
+ tipX1, tipX2, width);
+#endif
+ else
+#ifdef __MIC_NATIVE
+ sumGAMMAPROT_MIC(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].mic_tipVector, tipX1, tipX2,
+ width);
+#else
+ sumGAMMAPROT(tipCase, sumBuffer, x1_start, x2_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2, width);
+#endif
+ }
+
+ }
+ break;
+ default:
+ assert(0);
+ }
+#endif
+ }
+ } // for model
+ } // omp parallel region
+}
+
+
+
+/* this function actually computes the first and second derivatives of the likelihood for a given branch stored
+ in tr->coreLZ[model] Note that in the parallel case coreLZ must always be broadcasted together with the
+ traversal descriptor, at least for optimizing branch lengths */
+
+void execCore(tree *tr, volatile double *_dlnLdlz, volatile double *_d2lnLdlz2)
+{
+#ifdef _USE_OMP
+#pragma omp parallel
+#endif
+ {
+ int
+ m,
+ model,
+ maxModel,
+ branchIndex;
+
+#ifdef _USE_OMP
+ int
+ tid = omp_get_thread_num(),
+ nModels = (tr->numBranches > 1) ? tr->NumberOfModels : 1,
+ p;
+
+ /* Clear reduction buffers: since in OMP version each thread works only on a subset of partitions,
+ * and their order is arbitrary, it's easier to perform this initialization before the main loop,
+ * just to be on the safe side. */
+ for(p = 0; p < nModels; p++)
+ {
+ tr->partitionData[p].reductionBuffer[tid] = 0.;
+ tr->partitionData[p].reductionBuffer2[tid] = 0.;
+ }
+
+ maxModel = tr->maxModelsPerThread;
+#else
+ maxModel = tr->NumberOfModels;
+#endif
+
+ double lz;
+ /* double
+ buffer_dlnLdlz[NUM_BRANCHES],
+ buffer_d2lnLdlz2[NUM_BRANCHES];*/
+
+ /* loop over partitions */
+ for(m = 0; m < maxModel; m++)
+ {
+ size_t
+ width = 0,
+ offset = 0;
+
+#ifdef _USE_OMP
+ /* check if this thread should process this partition */
+ Assign* pAss = tr->threadPartAssigns[tid * tr->maxModelsPerThread + m];
+
+ if (pAss)
+ {
+ model = pAss->partitionId;
+ width = GET_PADDED_WIDTH(pAss->width);
+ offset = pAss->offset;
+ }
+ else
+ break;
+#else
+ model = m;
+
+ /* number of sites in this partition */
+ width = (size_t)tr->partitionData[model].width;
+ offset = 0;
+#endif
+
+ volatile double
+ *d1acc = (double*) NULL,
+ *d2acc = (double*) NULL;
+
+ /* figure out if we are optimizing branch lengths individually per partition or jointly across
+ all partitions. If we do this on a per partition basis, we also need to compute and store
+ the per-partition derivatives of the likelihood separately, otherwise not */
+
+ if(tr->numBranches > 1)
+ {
+ branchIndex = model;
+ lz = tr->td[0].parameterValues[model];
+ }
+ else
+ {
+ branchIndex = 0;
+ lz = tr->td[0].parameterValues[0];
+ }
+
+#ifdef _USE_OMP
+ d1acc = &tr->partitionData[branchIndex].reductionBuffer[tid];
+ d2acc = &tr->partitionData[branchIndex].reductionBuffer2[tid];
+#else
+ d1acc = &_dlnLdlz[branchIndex];
+ d2acc = &_d2lnLdlz2[branchIndex];
+
+ /* We need to reset accumulated derivative values in two cases: a) per-partition derivatives or
+ * b) joint derivatives AND we're processing the first partition */
+ if (branchIndex == model)
+ {
+ *d1acc = 0.0;
+ *d2acc = 0.0;
+ }
+#endif
+
+ /* check if we (the present thread for instance) needs to compute something at
+ all for the present partition */
+
+ if(tr->td[0].executeModel[model] && width > 0)
+ {
+ int
+ rateHet = (int)discreteRateCategories(tr->rateHetModel),
+
+ /* get the number of states in the partition, e.g.: 4 = DNA, 20 = Protein */
+ states = tr->partitionData[model].states,
+
+ /* span for single alignment site (in doubles!) */
+ span = rateHet * states,
+
+ /* offset for current thread's data in global xVector (in doubles!) */
+ x_offset = offset * span,
+
+ /* integer weight vector with pattern compression weights */
+ *wgt = tr->partitionData[model].wgt + offset,
+
+ /* integer rate category vector (for each pattern, _number_ of PSR category assigned to it, NOT actual rate!) */
+ *rateCategory = tr->partitionData[model].rateCategory + offset;
+
+ /* set a pointer to the part of the pre-computed sumBuffer we are going to access */
+ double
+ *weights = tr->partitionData[model].weights,
+ *sumBuffer = tr->partitionData[model].sumBuffer + x_offset;
+
+ volatile double
+ dlnLdlz = 0.0,
+ d2lnLdlz2 = 0.0;
+
+ #ifndef _OPTIMIZED_FUNCTIONS
+
+ /* compute first and second derivatives with the slow generic functions */
+
+ if(tr->rateHetModel == CAT)
+ coreCAT_FLEX(width, tr->partitionData[model].numberOfCategories, sumBuffer,
+ &dlnLdlz, &d2lnLdlz2, wgt,
+ tr->partitionData[model].perSiteRates, tr->partitionData[model].EIGN, rateCategory, lz, states);
+ else
+ coreGAMMA_FLEX(width, sumBuffer,
+ &dlnLdlz, &d2lnLdlz2, tr->partitionData[model].EIGN, tr->partitionData[model].gammaRates, lz,
+ wgt, states);
+ #else
+ switch(states)
+ {
+ case 2:
+#ifdef __MIC_NATIVE
+ assert(0 && "Binary data model is not implemented on Intel MIC");
+#else
+ assert(!tr->saveMemory);
+ if(tr->rateHetModel == CAT)
+ coreGTRCAT_BINARY(width, tr->partitionData[model].numberOfCategories, sumBuffer,
+ &dlnLdlz, &d2lnLdlz2,
+ tr->partitionData[model].perSiteRates, tr->partitionData[model].EIGN, rateCategory,
+ lz, wgt);
+ else
+ coreGTRGAMMA_BINARY(width, sumBuffer,
+ &dlnLdlz, &d2lnLdlz2,
+ tr->partitionData[model].EIGN,
+ tr->partitionData[model].gammaRates, lz, wgt);
+#endif
+ break;
+ case 4: /* DNA */
+ if(tr->rateHetModel == CAT)
+ #ifdef __MIC_NATIVE
+ assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+ #else
+ coreGTRCAT(width, tr->partitionData[model].numberOfCategories, sumBuffer,
+ &dlnLdlz, &d2lnLdlz2, wgt,
+ tr->partitionData[model].perSiteRates, tr->partitionData[model].EIGN, rateCategory, lz);
+ #endif
+ else
+ #ifdef __MIC_NATIVE
+ coreGTRGAMMA_MIC(width, sumBuffer,
+ &dlnLdlz, &d2lnLdlz2, tr->partitionData[model].EIGN, tr->partitionData[model].gammaRates, lz, wgt);
+ #else
+ coreGTRGAMMA(width, sumBuffer,
+ &dlnLdlz, &d2lnLdlz2, tr->partitionData[model].EIGN, tr->partitionData[model].gammaRates, lz, wgt);
+ #endif
+
+ break;
+ case 20: /* proteins */
+ if(tr->rateHetModel == CAT)
+ #ifdef __MIC_NATIVE
+ assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+ #else
+ coreGTRCATPROT(tr->partitionData[model].EIGN, lz, tr->partitionData[model].numberOfCategories, tr->partitionData[model].perSiteRates,
+ rateCategory, width,
+ wgt,
+ &dlnLdlz, &d2lnLdlz2,
+ sumBuffer);
+ #endif
+ else
+ {
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+#ifdef __MIC_NATIVE
+ coreGTRGAMMAPROT_LG4_MIC(width, sumBuffer,
+ &dlnLdlz, &d2lnLdlz2, tr->partitionData[model].EIGN_LG4, tr->partitionData[model].gammaRates,
+ lz, wgt, weights);
+#else
+ {
+ //printf("model %d weights %f %f %f %f\n", model, weights[0], weights[1], weights[2], weights[3]);
+ coreGTRGAMMAPROT_LG4(tr->partitionData[model].gammaRates, tr->partitionData[model].EIGN_LG4,
+ sumBuffer, width, wgt,
+ &dlnLdlz, &d2lnLdlz2, lz, weights);
+ }
+#endif
+ else
+#ifdef __MIC_NATIVE
+ coreGTRGAMMAPROT_MIC(width, sumBuffer,
+ &dlnLdlz, &d2lnLdlz2, tr->partitionData[model].EIGN, tr->partitionData[model].gammaRates, lz, wgt);
+#else
+ coreGTRGAMMAPROT(tr->partitionData[model].gammaRates, tr->partitionData[model].EIGN,
+ sumBuffer, width, wgt,
+ &dlnLdlz, &d2lnLdlz2, lz);
+#endif
+ }
+ break;
+ default:
+ assert(0);
+ }
+ #endif
+
+ /* store first and second derivative */
+
+ *d1acc += dlnLdlz;
+ *d2acc += d2lnLdlz2;
+ }
+ else
+ {
+ /* set to 0 to make the reduction operation consistent */
+
+ if(width == 0 && (tr->numBranches > 1))
+ {
+ *d1acc = 0.0;
+ *d2acc = 0.0;
+ }
+
+ if(width > 0 && (tr->numBranches > 1))
+ {
+ assert(tr->td[0].executeModel[model] == FALSE);
+ /* _dlnLdlz[model] = 0.0;
+ _d2lnLdlz2[model] = 0.0;*/
+ }
+
+ }
+ } // for model
+ } // omp parallel section
+
+#ifdef _USE_OMP
+ /* perform reduction of 1st and 2nd derivative values */
+ int
+ model,
+ tid;
+
+ int nModels = (tr->numBranches > 1) ? tr->NumberOfModels : 1;
+ for(model = 0; model < nModels; model++)
+ {
+ _dlnLdlz[model] = 0.0;
+ _d2lnLdlz2[model] = 0.0;
+
+ for(tid = 0; tid < tr->nThreads; tid++)
+ {
+ _dlnLdlz[model] += tr->partitionData[model].reductionBuffer[tid];
+ _d2lnLdlz2[model] += tr->partitionData[model].reductionBuffer2[tid];
+ }
+ }
+#endif
+}
+
+
+/* the function below actually implements the iterative Newton-Raphson procedure.
+ It is particularly messy and hard to read because for the case of per-partition branch length
+ estimates it needs to keep track of whetehr the Newton Raphson procedure has
+ converged for each partition individually.
+
+ The rational efor doing it like this is also provided in:
+
+
+ A. Stamatakis, M. Ott: "Load Balance in the Phylogenetic Likelihood Kernel". Proceedings of ICPP 2009,
+
+*/
+
+static void topLevelMakenewz(tree *tr, double *z0, int _maxiter, double *result)
+{
+ double z[NUM_BRANCHES], zprev[NUM_BRANCHES], zstep[NUM_BRANCHES];
+ double dlnLdlz[NUM_BRANCHES], d2lnLdlz2[NUM_BRANCHES];
+ int i, maxiter[NUM_BRANCHES], model;
+ boolean firstIteration = TRUE;
+ boolean outerConverged[NUM_BRANCHES];
+ boolean loopConverged;
+
+
+ /* figure out if this is on a per partition basis or jointly across all partitions */
+
+
+
+ /* initialize loop convergence variables etc.
+ maxiter is the maximum number of NR iterations we are going to do before giving up */
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ z[i] = z0[i];
+ maxiter[i] = _maxiter;
+ outerConverged[i] = FALSE;
+ tr->curvatOK[i] = TRUE;
+ }
+
+
+ /* nested do while loops of Newton-Raphson */
+
+ do
+ {
+
+ /* check if we ar done for partition i or if we need to adapt the branch length again */
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ if(outerConverged[i] == FALSE && tr->curvatOK[i] == TRUE)
+ {
+ tr->curvatOK[i] = FALSE;
+ zprev[i] = z[i];
+ zstep[i] = (1.0 - zmax) * z[i] + zmin;
+ }
+ }
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ /* other case, the outer loop hasn't converged but we are trying to approach
+ the maximum from the wrong side */
+
+ if(outerConverged[i] == FALSE && tr->curvatOK[i] == FALSE)
+ {
+ double lz;
+
+ if (z[i] < zmin) z[i] = zmin;
+ else if (z[i] > zmax) z[i] = zmax;
+ lz = log(z[i]);
+
+ tr->coreLZ[i] = lz;
+ }
+ }
+
+
+ /* set the execution mask */
+
+ if(tr->numBranches > 1)
+ {
+ assert(tr->numBranches == tr->NumberOfModels);
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ if(tr->executeModel[model])
+ tr->executeModel[model] = !tr->curvatOK[model];
+ }
+ }
+ else
+ {
+ for(model = 0; model < tr->NumberOfModels; model++)
+ tr->executeModel[model] = !tr->curvatOK[0];
+ }
+
+
+ /* store it in traversal descriptor */
+
+ storeExecuteMaskInTraversalDescriptor(tr);
+
+ /* store the new branch length values to be tested in traversal descriptor */
+
+ storeValuesInTraversalDescriptor(tr, &(tr->coreLZ[0]));
+
+ /* sequential part, if this is the first newton-raphson implementation,
+ do the precomputations as well, otherwise just execute the computation
+ of the derivatives */
+
+ if(firstIteration)
+ {
+ makenewzIterative(tr);
+ firstIteration = FALSE;
+ }
+
+ execCore(tr, dlnLdlz, d2lnLdlz2);
+
+ {
+ double
+ *send = (double *)malloc(sizeof(double) * tr->numBranches * 2),
+ *recv = (double *)malloc(sizeof(double) * tr->numBranches * 2);
+
+ memcpy(&send[0], dlnLdlz, sizeof(double) * tr->numBranches);
+ memcpy(&send[tr->numBranches], d2lnLdlz2, sizeof(double) * tr->numBranches);
+
+#ifdef _USE_ALLREDUCE
+ /* the MPI_Allreduce implementation is apparently sometimes not deterministic */
+
+ MPI_Allreduce(send, recv, tr->numBranches * 2, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
+#else
+ MPI_Reduce(send, recv, tr->numBranches * 2, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
+ MPI_Bcast(recv, tr->numBranches * 2, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+#endif
+
+ memcpy(dlnLdlz, &recv[0], sizeof(double) * tr->numBranches);
+ memcpy(d2lnLdlz2, &recv[tr->numBranches], sizeof(double) * tr->numBranches);
+
+ free(send);
+ free(recv);
+ }
+
+ /* do a NR step, if we are on the correct side of the maximum that's okay, otherwise
+ shorten branch */
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ if(outerConverged[i] == FALSE && tr->curvatOK[i] == FALSE)
+ {
+ if ((d2lnLdlz2[i] >= 0.0) && (z[i] < zmax))
+ zprev[i] = z[i] = 0.37 * z[i] + 0.63; /* Bad curvature, shorten branch */
+ else
+ tr->curvatOK[i] = TRUE;
+ }
+ }
+
+ /* do the standard NR step to obrain the next value, depending on the state for each partition */
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ if(tr->curvatOK[i] == TRUE && outerConverged[i] == FALSE)
+ {
+ if (d2lnLdlz2[i] < 0.0)
+ {
+ double tantmp = -dlnLdlz[i] / d2lnLdlz2[i];
+ if (tantmp < 100)
+ {
+ z[i] *= EXP(tantmp);
+ if (z[i] < zmin)
+ z[i] = zmin;
+
+ if (z[i] > 0.25 * zprev[i] + 0.75)
+ z[i] = 0.25 * zprev[i] + 0.75;
+ }
+ else
+ z[i] = 0.25 * zprev[i] + 0.75;
+ }
+ if (z[i] > zmax) z[i] = zmax;
+
+ /* decrement the maximum number of itarations */
+
+ maxiter[i] = maxiter[i] - 1;
+
+ /* check if the outer loop has converged */
+
+ //old code below commented out, integrated new PRELIMINARY BUG FIX !
+ //this needs further work at some point!
+
+ /*
+ if(maxiter[i] > 0 && (ABS(z[i] - zprev[i]) > zstep[i]))
+ outerConverged[i] = FALSE;
+ else
+ outerConverged[i] = TRUE;
+ */
+
+ if((ABS(z[i] - zprev[i]) > zstep[i]))
+ {
+ /* We should make a more informed decision here,
+ based on the log like improvement */
+
+ if(maxiter[i] < -20)
+ {
+ z[i] = z0[i];
+ outerConverged[i] = TRUE;
+ }
+ else
+ outerConverged[i] = FALSE;
+ }
+ else
+ outerConverged[i] = TRUE;
+ }
+ }
+
+ /* check if the loop has converged for all partitions */
+
+ loopConverged = TRUE;
+ for(i = 0; i < tr->numBranches; i++)
+ loopConverged = loopConverged && outerConverged[i];
+ }
+ while (!loopConverged);
+
+
+ /* reset partition execution mask */
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ tr->executeModel[model] = TRUE;
+
+ /* copy the new branches in the result array of branches.
+ if we don't do a per partition estimate of
+ branches this will only set result[0]
+ */
+
+ for(i = 0; i < tr->numBranches; i++)
+ result[i] = z[i];
+}
+
+/* function called from RAxML to optimize a given branch with current branch lengths z0
+ between nodes p and q.
+ The new branch lengths will be stored in result */
+
+void makenewzGeneric(tree *tr, nodeptr p, nodeptr q, double *z0, int maxiter, double *result, boolean mask)
+{
+ int
+ i;
+
+ /* the first entry of the traversal descriptor stores the node pair that defines
+ the branch */
+
+ tr->td[0].ti[0].pNumber = p->number;
+ tr->td[0].ti[0].qNumber = q->number;
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ tr->td[0].ti[0].qz[i] = z0[i];
+
+ if(mask)
+ {
+ if(tr->partitionConverged[i])
+ tr->executeModel[i] = FALSE;
+ else
+ tr->executeModel[i] = TRUE;
+ }
+ else
+ assert(tr->executeModel[i]);
+ }
+
+
+ /* compute the traversal descriptor of the likelihood vectors that need to be re-computed
+ first in makenewzIterative */
+
+ tr->td[0].count = 1;
+
+ if(!p->x)
+ computeTraversalInfo(p, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, TRUE);
+ if(!q->x)
+ computeTraversalInfo(q, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, TRUE);
+
+ /* call the Newton-Raphson procedure */
+
+ topLevelMakenewz(tr, z0, maxiter, result);
+
+ /* fix eceuteModel this seems to be a bit redundant with topLevelMakenewz */
+
+ for(i = 0; i < tr->numBranches; i++)
+ tr->executeModel[i] = TRUE;
+}
+
+
+/* below are, once again the optimized functions */
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+/**** binary ***/
+static void coreGTRCAT_BINARY(int upper, int numberOfCategories, double *sum,
+ volatile double *d1, volatile double *d2,
+ double *rptr, double *EIGN, int *cptr, double lz, int *wgt)
+{
+ int i;
+ double
+ *d, *d_start,
+ tmp_0, inv_Li, dlnLidlz, d2lnLidlz2,
+ dlnLdlz = 0.0,
+ d2lnLdlz2 = 0.0;
+ double e[2];
+ double dd1;
+
+ /*e[0] = EIGN[0];
+ e[1] = EIGN[0] * EIGN[0];*/
+
+ e[0] = EIGN[1];
+ e[1] = EIGN[1] * EIGN[1];
+
+ d = d_start = (double *)malloc((size_t)numberOfCategories * sizeof(double));
+
+ dd1 = e[0] * lz;
+
+ for(i = 0; i < numberOfCategories; i++)
+ d[i] = exp(dd1 * rptr[i]);
+
+ for (i = 0; i < upper; i++)
+ {
+ double
+ r = rptr[cptr[i]],
+ wr1 = r * wgt[i],
+ wr2 = r * r * wgt[i];
+
+ d = &d_start[cptr[i]];
+
+ inv_Li = sum[2 * i];
+ inv_Li += (tmp_0 = d[0] * sum[2 * i + 1]);
+
+ inv_Li = 1.0/fabs(inv_Li);
+
+ dlnLidlz = tmp_0 * e[0];
+ d2lnLidlz2 = tmp_0 * e[1];
+
+ dlnLidlz *= inv_Li;
+ d2lnLidlz2 *= inv_Li;
+
+ dlnLdlz += wr1 * dlnLidlz;
+ d2lnLdlz2 += wr2 * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+ }
+
+ *d1 = dlnLdlz;
+ *d2 = d2lnLdlz2;
+
+ free(d_start);
+}
+
+static void coreGTRGAMMA_BINARY(const int upper, double *sumtable,
+ volatile double *d1, volatile double *d2, double *EIGN, double *gammaRates, double lz, int *wrptr)
+{
+ double
+ dlnLdlz = 0.0,
+ d2lnLdlz2 = 0.0,
+ ki,
+ kisqr,
+ inv_Li,
+ dlnLidlz,
+ d2lnLidlz2,
+ *sum,
+ diagptable0[8] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ diagptable1[8] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ diagptable2[8] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ int
+ i,
+ j;
+
+ for(i = 0; i < 4; i++)
+ {
+ ki = gammaRates[i];
+ kisqr = ki * ki;
+
+ diagptable0[i * 2] = 1.0;
+ diagptable1[i * 2] = 0.0;
+ diagptable2[i * 2] = 0.0;
+
+ diagptable0[i * 2 + 1] = exp(EIGN[1] * ki * lz);
+ diagptable1[i * 2 + 1] = EIGN[1] * ki;
+ diagptable2[i * 2 + 1] = EIGN[1] * EIGN[1] * kisqr;
+ }
+
+ for (i = 0; i < upper; i++)
+ {
+ __m128d a0 = _mm_setzero_pd();
+ __m128d a1 = _mm_setzero_pd();
+ __m128d a2 = _mm_setzero_pd();
+
+ sum = &sumtable[i * 8];
+
+ for(j = 0; j < 4; j++)
+ {
+ double
+ *d0 = &diagptable0[j * 2],
+ *d1 = &diagptable1[j * 2],
+ *d2 = &diagptable2[j * 2];
+
+ __m128d tmpv = _mm_mul_pd(_mm_load_pd(d0), _mm_load_pd(&sum[j * 2]));
+ a0 = _mm_add_pd(a0, tmpv);
+ a1 = _mm_add_pd(a1, _mm_mul_pd(tmpv, _mm_load_pd(d1)));
+ a2 = _mm_add_pd(a2, _mm_mul_pd(tmpv, _mm_load_pd(d2)));
+
+ }
+
+ a0 = _mm_hadd_pd(a0, a0);
+ a1 = _mm_hadd_pd(a1, a1);
+ a2 = _mm_hadd_pd(a2, a2);
+
+ _mm_storel_pd(&inv_Li, a0);
+ _mm_storel_pd(&dlnLidlz, a1);
+ _mm_storel_pd(&d2lnLidlz2, a2);
+
+ inv_Li = 1.0 / fabs(inv_Li);
+
+ dlnLidlz *= inv_Li;
+ d2lnLidlz2 *= inv_Li;
+
+ dlnLdlz += wrptr[i] * dlnLidlz;
+ d2lnLdlz2 += wrptr[i] * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+ }
+
+
+ *d1 = dlnLdlz;
+ *d2 = d2lnLdlz2;
+}
+
+
+static void sumGAMMA_BINARY(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+ double
+ *x1,
+ *x2,
+ *sum;
+
+ int
+ i,
+ j;
+
+ /* C-OPT once again switch over possible configurations at inner node */
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ /* C-OPT main for loop overt alignment length */
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[2 * tipX1[i]]);
+ x2 = &(tipVector[2 * tipX2[i]]);
+ sum = &sumtable[i * 8];
+
+ for(j = 0; j < 4; j++)
+ _mm_store_pd( &sum[j*2], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[2 * tipX1[i]]);
+ x2 = &x2_start[8 * i];
+ sum = &sumtable[8 * i];
+
+ for(j = 0; j < 4; j++)
+ _mm_store_pd( &sum[j*2], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[j * 2] )));
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &x1_start[8 * i];
+ x2 = &x2_start[8 * i];
+ sum = &sumtable[8 * i];
+
+ for(j = 0; j < 4; j++)
+ _mm_store_pd( &sum[j*2], _mm_mul_pd( _mm_load_pd( &x1[j * 2] ), _mm_load_pd( &x2[j * 2] )));
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+
+static void sumCAT_BINARY(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n)
+
+{
+ int
+ i;
+
+ double
+ *x1,
+ *x2;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[2 * tipX1[i]]);
+ x2 = &(tipVector[2 * tipX2[i]]);
+
+ _mm_store_pd(&sum[i * 2], _mm_mul_pd( _mm_load_pd(x1), _mm_load_pd(x2)));
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[2 * tipX1[i]]);
+ x2 = &x2_start[2 * i];
+
+ _mm_store_pd(&sum[i * 2], _mm_mul_pd( _mm_load_pd(x1), _mm_load_pd(x2)));
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &x1_start[2 * i];
+ x2 = &x2_start[2 * i];
+
+ _mm_store_pd(&sum[i * 2], _mm_mul_pd( _mm_load_pd(x1), _mm_load_pd(x2)));
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+
+/*** binary end ****/
+
+
+static void sumCAT_SAVE(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n, double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+ int i;
+ double
+ *x1,
+ *x2,
+ *x1_ptr = x1_start,
+ *x2_ptr = x2_start;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &(tipVector[4 * tipX2[i]]);
+
+ _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+ _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+ if(isGap(x2_gap, i))
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 4;
+ }
+
+ _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+ _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ if(isGap(x1_gap, i))
+ x1 = x1_gapColumn;
+ else
+ {
+ x1 = x1_ptr;
+ x1_ptr += 4;
+ }
+
+ if(isGap(x2_gap, i))
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 4;
+ }
+
+ _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+ _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+
+static void sumGAMMA_GAPPED_SAVE(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+ double
+ *x1,
+ *x2,
+ *sum,
+ *x1_ptr = x1_start,
+ *x2_ptr = x2_start;
+
+ int i, j, k;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &(tipVector[4 * tipX2[i]]);
+ sum = &sumtable[i * 16];
+
+ for(j = 0; j < 4; j++)
+ for(k = 0; k < 4; k+=2)
+ _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[k] ), _mm_load_pd( &x2[k] )));
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 16;
+ }
+
+ sum = &sumtable[16 * i];
+
+ for(j = 0; j < 4; j++)
+ for(k = 0; k < 4; k+=2)
+ _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[k] ), _mm_load_pd( &x2[j * 4 + k] )));
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ if(x1_gap[i / 32] & mask32[i % 32])
+ x1 = x1_gapColumn;
+ else
+ {
+ x1 = x1_ptr;
+ x1_ptr += 16;
+ }
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 16;
+ }
+
+ sum = &sumtable[16 * i];
+
+
+ for(j = 0; j < 4; j++)
+ for(k = 0; k < 4; k+=2)
+ _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[j * 4 + k] ), _mm_load_pd( &x2[j * 4 + k] )));
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+
+
+
+static void sumGAMMA(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+ double *x1, *x2, *sum;
+ int i, j, k;
+
+ /* C-OPT once again switch over possible configurations at inner node */
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ /* C-OPT main for loop overt alignment length */
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &(tipVector[4 * tipX2[i]]);
+ sum = &sumtable[i * 16];
+
+ for(j = 0; j < 4; j++)
+ for(k = 0; k < 4; k+=2)
+ _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[k] ), _mm_load_pd( &x2[k] )));
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &x2_start[16 * i];
+ sum = &sumtable[16 * i];
+
+ for(j = 0; j < 4; j++)
+ for(k = 0; k < 4; k+=2)
+ _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[k] ), _mm_load_pd( &x2[j * 4 + k] )));
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &x1_start[16 * i];
+ x2 = &x2_start[16 * i];
+ sum = &sumtable[16 * i];
+
+ for(j = 0; j < 4; j++)
+ for(k = 0; k < 4; k+=2)
+ _mm_store_pd( &sum[j*4 + k], _mm_mul_pd( _mm_load_pd( &x1[j * 4 + k] ), _mm_load_pd( &x2[j * 4 + k] )));
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+
+static void sumCAT(int tipCase, double *sum, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+ int i;
+ double
+ *x1,
+ *x2;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &(tipVector[4 * tipX2[i]]);
+
+ _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+ _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &x2_start[4 * i];
+
+ _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+ _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &x1_start[4 * i];
+ x2 = &x2_start[4 * i];
+
+ _mm_store_pd( &sum[i*4 + 0], _mm_mul_pd( _mm_load_pd( &x1[0] ), _mm_load_pd( &x2[0] )));
+ _mm_store_pd( &sum[i*4 + 2], _mm_mul_pd( _mm_load_pd( &x1[2] ), _mm_load_pd( &x2[2] )));
+
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+static void sumGAMMAPROT_GAPPED_SAVE(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+ int i, l, k;
+ double
+ *left,
+ *right,
+ *sum,
+ *x1_ptr = x1,
+ *x2_ptr = x2,
+ *x1v,
+ *x2v;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for(i = 0; i < n; i++)
+ {
+ left = &(tipVector[20 * tipX1[i]]);
+ right = &(tipVector[20 * tipX2[i]]);
+
+ for(l = 0; l < 4; l++)
+ {
+ sum = &sumtable[i * 80 + l * 20];
+
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+
+ _mm_store_pd(&sum[k], sumv);
+ }
+
+ }
+ }
+ break;
+ case TIP_INNER:
+ for(i = 0; i < n; i++)
+ {
+ left = &(tipVector[20 * tipX1[i]]);
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2v = x2_gapColumn;
+ else
+ {
+ x2v = x2_ptr;
+ x2_ptr += 80;
+ }
+
+ for(l = 0; l < 4; l++)
+ {
+ right = &(x2v[l * 20]);
+ sum = &sumtable[i * 80 + l * 20];
+
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+
+ _mm_store_pd(&sum[k], sumv);
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ for(i = 0; i < n; i++)
+ {
+ if(x1_gap[i / 32] & mask32[i % 32])
+ x1v = x1_gapColumn;
+ else
+ {
+ x1v = x1_ptr;
+ x1_ptr += 80;
+ }
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2v = x2_gapColumn;
+ else
+ {
+ x2v = x2_ptr;
+ x2_ptr += 80;
+ }
+
+ for(l = 0; l < 4; l++)
+ {
+ left = &(x1v[l * 20]);
+ right = &(x2v[l * 20]);
+ sum = &(sumtable[i * 80 + l * 20]);
+
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+
+ _mm_store_pd(&sum[k], sumv);
+ }
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+
+static void sumGAMMAPROT_LG4(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector[4],
+ unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+ int i, l, k;
+ double *left, *right, *sum;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for(i = 0; i < n; i++)
+ {
+ for(l = 0; l < 4; l++)
+ {
+ left = &(tipVector[l][20 * tipX1[i]]);
+ right = &(tipVector[l][20 * tipX2[i]]);
+
+ sum = &sumtable[i * 80 + l * 20];
+#ifdef __SIM_SSE3
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+
+ _mm_store_pd(&sum[k], sumv);
+ }
+#else
+ for(k = 0; k < 20; k++)
+ sum[k] = left[k] * right[k];
+#endif
+ }
+ }
+ break;
+ case TIP_INNER:
+ for(i = 0; i < n; i++)
+ {
+
+
+ for(l = 0; l < 4; l++)
+ {
+ left = &(tipVector[l][20 * tipX1[i]]);
+ right = &(x2[80 * i + l * 20]);
+ sum = &sumtable[i * 80 + l * 20];
+#ifdef __SIM_SSE3
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+
+ _mm_store_pd(&sum[k], sumv);
+ }
+#else
+ for(k = 0; k < 20; k++)
+ sum[k] = left[k] * right[k];
+#endif
+ }
+ }
+ break;
+ case INNER_INNER:
+ for(i = 0; i < n; i++)
+ {
+ for(l = 0; l < 4; l++)
+ {
+ left = &(x1[80 * i + l * 20]);
+ right = &(x2[80 * i + l * 20]);
+ sum = &(sumtable[i * 80 + l * 20]);
+
+#ifdef __SIM_SSE3
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+
+ _mm_store_pd(&sum[k], sumv);
+ }
+#else
+ for(k = 0; k < 20; k++)
+ sum[k] = left[k] * right[k];
+#endif
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+
+static void sumGAMMAPROT(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+ int i, l, k;
+ double *left, *right, *sum;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for(i = 0; i < n; i++)
+ {
+ left = &(tipVector[20 * tipX1[i]]);
+ right = &(tipVector[20 * tipX2[i]]);
+
+ for(l = 0; l < 4; l++)
+ {
+ sum = &sumtable[i * 80 + l * 20];
+
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+
+ _mm_store_pd(&sum[k], sumv);
+ }
+
+ }
+ }
+ break;
+ case TIP_INNER:
+ for(i = 0; i < n; i++)
+ {
+ left = &(tipVector[20 * tipX1[i]]);
+
+ for(l = 0; l < 4; l++)
+ {
+ right = &(x2[80 * i + l * 20]);
+ sum = &sumtable[i * 80 + l * 20];
+
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+
+ _mm_store_pd(&sum[k], sumv);
+ }
+
+ }
+ }
+ break;
+ case INNER_INNER:
+ for(i = 0; i < n; i++)
+ {
+ for(l = 0; l < 4; l++)
+ {
+ left = &(x1[80 * i + l * 20]);
+ right = &(x2[80 * i + l * 20]);
+ sum = &(sumtable[i * 80 + l * 20]);
+
+
+ for(k = 0; k < 20; k+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[k]), _mm_load_pd(&right[k]));
+
+ _mm_store_pd(&sum[k], sumv);
+ }
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+
+static void sumGTRCATPROT(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+ int i, l;
+ double *sum, *left, *right;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ left = &(tipVector[20 * tipX1[i]]);
+ right = &(tipVector[20 * tipX2[i]]);
+ sum = &sumtable[20 * i];
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+
+ _mm_store_pd(&sum[l], sumv);
+ }
+
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ left = &(tipVector[20 * tipX1[i]]);
+ right = &x2[20 * i];
+ sum = &sumtable[20 * i];
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+
+ _mm_store_pd(&sum[l], sumv);
+ }
+
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ left = &x1[20 * i];
+ right = &x2[20 * i];
+ sum = &sumtable[20 * i];
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+
+ _mm_store_pd(&sum[l], sumv);
+ }
+
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+
+static void sumGTRCATPROT_SAVE(int tipCase, double *sumtable, double *x1, double *x2, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n,
+ double *x1_gapColumn, double *x2_gapColumn, unsigned int *x1_gap, unsigned int *x2_gap)
+{
+ int
+ i,
+ l;
+
+ double
+ *sum,
+ *left,
+ *right,
+ *left_ptr = x1,
+ *right_ptr = x2;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ left = &(tipVector[20 * tipX1[i]]);
+ right = &(tipVector[20 * tipX2[i]]);
+ sum = &sumtable[20 * i];
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+
+ _mm_store_pd(&sum[l], sumv);
+ }
+
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ left = &(tipVector[20 * tipX1[i]]);
+
+ if(isGap(x2_gap, i))
+ right = x2_gapColumn;
+ else
+ {
+ right = right_ptr;
+ right_ptr += 20;
+ }
+
+ sum = &sumtable[20 * i];
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+
+ _mm_store_pd(&sum[l], sumv);
+ }
+
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ if(isGap(x1_gap, i))
+ left = x1_gapColumn;
+ else
+ {
+ left = left_ptr;
+ left_ptr += 20;
+ }
+
+ if(isGap(x2_gap, i))
+ right = x2_gapColumn;
+ else
+ {
+ right = right_ptr;
+ right_ptr += 20;
+ }
+
+ sum = &sumtable[20 * i];
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d sumv = _mm_mul_pd(_mm_load_pd(&left[l]), _mm_load_pd(&right[l]));
+
+ _mm_store_pd(&sum[l], sumv);
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+}
+
+static void coreGTRGAMMA(const int upper, double *sumtable,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wgt)
+{
+ double
+ dlnLdlz = 0.0,
+ d2lnLdlz2 = 0.0,
+ ki,
+ kisqr,
+ inv_Li,
+ dlnLidlz,
+ d2lnLidlz2,
+ *sum,
+ diagptable0[16] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ diagptable1[16] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ diagptable2[16] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ int
+ i,
+ j,
+ l;
+
+ for(i = 0; i < 4; i++)
+ {
+ ki = gammaRates[i];
+ kisqr = ki * ki;
+
+ diagptable0[i * 4] = 1.0;
+ diagptable1[i * 4] = 0.0;
+ diagptable2[i * 4] = 0.0;
+
+ for(l = 1; l < 4; l++)
+ {
+ diagptable0[i * 4 + l] = EXP(EIGN[l] * ki * lz);
+ diagptable1[i * 4 + l] = EIGN[l] * ki;
+ diagptable2[i * 4 + l] = EIGN[l] * EIGN[l] * kisqr;
+ }
+ }
+
+ for (i = 0; i < upper; i++)
+ {
+ __m128d a0 = _mm_setzero_pd();
+ __m128d a1 = _mm_setzero_pd();
+ __m128d a2 = _mm_setzero_pd();
+
+
+
+ sum = &sumtable[i * 16];
+
+ for(j = 0; j < 4; j++)
+ {
+ double
+ *d0 = &diagptable0[j * 4],
+ *d1 = &diagptable1[j * 4],
+ *d2 = &diagptable2[j * 4];
+
+ for(l = 0; l < 4; l+=2)
+ {
+ __m128d tmpv = _mm_mul_pd(_mm_load_pd(&d0[l]), _mm_load_pd(&sum[j * 4 + l]));
+ a0 = _mm_add_pd(a0, tmpv);
+ a1 = _mm_add_pd(a1, _mm_mul_pd(tmpv, _mm_load_pd(&d1[l])));
+ a2 = _mm_add_pd(a2, _mm_mul_pd(tmpv, _mm_load_pd(&d2[l])));
+ }
+ }
+
+ a0 = _mm_hadd_pd(a0, a0);
+ a1 = _mm_hadd_pd(a1, a1);
+ a2 = _mm_hadd_pd(a2, a2);
+
+ _mm_storel_pd(&inv_Li, a0);
+ _mm_storel_pd(&dlnLidlz, a1);
+ _mm_storel_pd(&d2lnLidlz2, a2);
+
+ inv_Li = 1.0 / FABS(inv_Li);
+
+ dlnLidlz *= inv_Li;
+ d2lnLidlz2 *= inv_Li;
+
+ dlnLdlz += wgt[i] * dlnLidlz;
+ d2lnLdlz2 += wgt[i] * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+ }
+
+
+ *ext_dlnLdlz = dlnLdlz;
+ *ext_d2lnLdlz2 = d2lnLdlz2;
+}
+
+
+
+static void coreGTRCAT(int upper, int numberOfCategories, double *sum,
+ volatile double *d1, volatile double *d2, int *wgt,
+ double *rptr, double *EIGN, int *cptr, double lz)
+{
+ int i;
+ double
+ *d, *d_start,
+ inv_Li, dlnLidlz, d2lnLidlz2,
+ dlnLdlz = 0.0,
+ d2lnLdlz2 = 0.0;
+ double e1[4] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double e2[4] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double dd1, dd2, dd3;
+
+ __m128d
+ e1v[2],
+ e2v[2];
+
+ e1[0] = 0.0;
+ e2[0] = 0.0;
+ e1[1] = EIGN[1];
+ e2[1] = EIGN[1] * EIGN[1];
+ e1[2] = EIGN[2];
+ e2[2] = EIGN[2] * EIGN[2];
+ e1[3] = EIGN[3];
+ e2[3] = EIGN[3] * EIGN[3];
+
+ e1v[0]= _mm_load_pd(&e1[0]);
+ e1v[1]= _mm_load_pd(&e1[2]);
+
+ e2v[0]= _mm_load_pd(&e2[0]);
+ e2v[1]= _mm_load_pd(&e2[2]);
+
+ d = d_start = (double *)malloc_aligned(numberOfCategories * 4 * sizeof(double));
+
+ dd1 = EIGN[1] * lz;
+ dd2 = EIGN[2] * lz;
+ dd3 = EIGN[3] * lz;
+
+ for(i = 0; i < numberOfCategories; i++)
+ {
+ d[i * 4 + 0] = 1.0;
+ d[i * 4 + 1] = EXP(dd1 * rptr[i]);
+ d[i * 4 + 2] = EXP(dd2 * rptr[i]);
+ d[i * 4 + 3] = EXP(dd3 * rptr[i]);
+ }
+
+ for (i = 0; i < upper; i++)
+ {
+ double
+ *s = &sum[4 * i];
+
+ double
+ r = rptr[cptr[i]],
+ wr1 = r * wgt[i],
+ wr2 = r * r * wgt[i];
+
+ d = &d_start[4 * cptr[i]];
+
+ __m128d tmp_0v =_mm_mul_pd(_mm_load_pd(&d[0]),_mm_load_pd(&s[0]));
+ __m128d tmp_1v =_mm_mul_pd(_mm_load_pd(&d[2]),_mm_load_pd(&s[2]));
+
+ __m128d inv_Liv = _mm_add_pd(tmp_0v, tmp_1v);
+
+ __m128d dlnLidlzv = _mm_add_pd(_mm_mul_pd(tmp_0v, e1v[0]), _mm_mul_pd(tmp_1v, e1v[1]));
+ __m128d d2lnLidlz2v = _mm_add_pd(_mm_mul_pd(tmp_0v, e2v[0]), _mm_mul_pd(tmp_1v, e2v[1]));
+
+
+ inv_Liv = _mm_hadd_pd(inv_Liv, inv_Liv);
+ dlnLidlzv = _mm_hadd_pd(dlnLidlzv, dlnLidlzv);
+ d2lnLidlz2v = _mm_hadd_pd(d2lnLidlz2v, d2lnLidlz2v);
+
+ _mm_storel_pd(&inv_Li, inv_Liv);
+ _mm_storel_pd(&dlnLidlz, dlnLidlzv);
+ _mm_storel_pd(&d2lnLidlz2, d2lnLidlz2v);
+
+ inv_Li = 1.0/FABS(inv_Li);
+
+ dlnLidlz *= inv_Li;
+ d2lnLidlz2 *= inv_Li;
+
+ dlnLdlz += wr1 * dlnLidlz;
+ d2lnLdlz2 += wr2 * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+ }
+
+ *d1 = dlnLdlz;
+ *d2 = d2lnLdlz2;
+
+ free(d_start);
+}
+
+
+static void coreGTRGAMMAPROT_LG4(double *gammaRates, double *EIGN[4], double *sumtable, int upper, int *wrptr,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double lz, double *weights)
+{
+ double *sum,
+ diagptable0[80] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ diagptable1[80] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ diagptable2[80] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ int i, j, l;
+ double dlnLdlz = 0;
+ double d2lnLdlz2 = 0;
+ double ki, kisqr;
+
+
+ for(i = 0; i < 4; i++)
+ {
+ ki = gammaRates[i];
+ kisqr = ki * ki;
+
+ diagptable0[i * 20] = 1.0;
+ diagptable1[i * 20] = 0.0;
+ diagptable2[i * 20] = 0.0;
+
+ for(l = 1; l < 20; l++)
+ {
+ diagptable0[i * 20 + l] = EXP(EIGN[i][l] * ki * lz);
+ diagptable1[i * 20 + l] = EIGN[i][l] * ki;
+ diagptable2[i * 20 + l] = EIGN[i][l] * EIGN[i][l] * kisqr;
+ }
+ }
+
+ for (i = 0; i < upper; i++)
+ {
+ double
+ inv_Li = 0.0,
+ dlnLidlz = 0.0,
+ d2lnLidlz2 = 0.0;
+
+
+ sum = &sumtable[i * 80];
+
+ for(j = 0; j < 4; j++)
+ {
+ double
+ l0,
+ l1,
+ l2,
+ *d0 = &diagptable0[j * 20],
+ *d1 = &diagptable1[j * 20],
+ *d2 = &diagptable2[j * 20];
+
+ __m128d
+ a0 = _mm_setzero_pd(),
+ a1 = _mm_setzero_pd(),
+ a2 = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d tmpv = _mm_mul_pd(_mm_load_pd(&d0[l]), _mm_load_pd(&sum[j * 20 +l]));
+ a0 = _mm_add_pd(a0, tmpv);
+ a1 = _mm_add_pd(a1, _mm_mul_pd(tmpv, _mm_load_pd(&d1[l])));
+ a2 = _mm_add_pd(a2, _mm_mul_pd(tmpv, _mm_load_pd(&d2[l])));
+ }
+
+ a0 = _mm_hadd_pd(a0, a0);
+ a1 = _mm_hadd_pd(a1, a1);
+ a2 = _mm_hadd_pd(a2, a2);
+
+ _mm_storel_pd(&l0, a0);
+ _mm_storel_pd(&l1, a1);
+ _mm_storel_pd(&l2, a2);
+
+ inv_Li += weights[j] * l0;
+ dlnLidlz += weights[j] * l1;
+ d2lnLidlz2 += weights[j] * l2;
+ }
+
+
+
+ inv_Li = 1.0 / FABS(inv_Li);
+
+ dlnLidlz *= inv_Li;
+ d2lnLidlz2 *= inv_Li;
+
+ dlnLdlz += wrptr[i] * dlnLidlz;
+ d2lnLdlz2 += wrptr[i] * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+ }
+
+ *ext_dlnLdlz = dlnLdlz;
+ *ext_d2lnLdlz2 = d2lnLdlz2;
+}
+
+
+static void coreGTRGAMMAPROT(double *gammaRates, double *EIGN, double *sumtable, int upper, int *wgt,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double lz)
+{
+ double *sum,
+ diagptable0[80] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ diagptable1[80] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ diagptable2[80] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ int i, j, l;
+ double dlnLdlz = 0;
+ double d2lnLdlz2 = 0;
+ double ki, kisqr;
+ double inv_Li, dlnLidlz, d2lnLidlz2;
+
+ for(i = 0; i < 4; i++)
+ {
+ ki = gammaRates[i];
+ kisqr = ki * ki;
+
+ diagptable0[i * 20] = 1.0;
+ diagptable1[i * 20] = 0.0;
+ diagptable2[i * 20] = 0.0;
+
+ for(l = 1; l < 20; l++)
+ {
+ diagptable0[i * 20 + l] = EXP(EIGN[l] * ki * lz);
+ diagptable1[i * 20 + l] = EIGN[l] * ki;
+ diagptable2[i * 20 + l] = EIGN[l] * EIGN[l] * kisqr;
+ }
+ }
+
+ for (i = 0; i < upper; i++)
+ {
+ __m128d a0 = _mm_setzero_pd();
+ __m128d a1 = _mm_setzero_pd();
+ __m128d a2 = _mm_setzero_pd();
+
+
+ sum = &sumtable[i * 80];
+
+ for(j = 0; j < 4; j++)
+ {
+ double
+ *d0 = &diagptable0[j * 20],
+ *d1 = &diagptable1[j * 20],
+ *d2 = &diagptable2[j * 20];
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d tmpv = _mm_mul_pd(_mm_load_pd(&d0[l]), _mm_load_pd(&sum[j * 20 +l]));
+ a0 = _mm_add_pd(a0, tmpv);
+ a1 = _mm_add_pd(a1, _mm_mul_pd(tmpv, _mm_load_pd(&d1[l])));
+ a2 = _mm_add_pd(a2, _mm_mul_pd(tmpv, _mm_load_pd(&d2[l])));
+ }
+ }
+
+ a0 = _mm_hadd_pd(a0, a0);
+ a1 = _mm_hadd_pd(a1, a1);
+ a2 = _mm_hadd_pd(a2, a2);
+
+ _mm_storel_pd(&inv_Li, a0);
+ _mm_storel_pd(&dlnLidlz, a1);
+ _mm_storel_pd(&d2lnLidlz2, a2);
+
+ inv_Li = 1.0 / FABS(inv_Li);
+
+ dlnLidlz *= inv_Li;
+ d2lnLidlz2 *= inv_Li;
+
+ dlnLdlz += wgt[i] * dlnLidlz;
+ d2lnLdlz2 += wgt[i] * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+ }
+
+ *ext_dlnLdlz = dlnLdlz;
+ *ext_d2lnLdlz2 = d2lnLdlz2;
+}
+
+
+
+static void coreGTRCATPROT(double *EIGN, double lz, int numberOfCategories, double *rptr, int *cptr, int upper,
+ int *wgt, volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double *sumtable)
+{
+ int i, l;
+ double *d1, *d_start, *sum;
+ double
+ e[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ s[20] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ dd[20] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double inv_Li, dlnLidlz, d2lnLidlz2;
+ double dlnLdlz = 0.0;
+ double d2lnLdlz2 = 0.0;
+
+ d1 = d_start = (double *)malloc_aligned(numberOfCategories * 20 * sizeof(double));
+
+ e[0] = 0.0;
+ s[0] = 0.0;
+
+ for(l = 1; l < 20; l++)
+ {
+ e[l] = EIGN[l] * EIGN[l];
+ s[l] = EIGN[l];
+ dd[l] = s[l] * lz;
+ }
+
+ for(i = 0; i < numberOfCategories; i++)
+ {
+ d1[20 * i] = 1.0;
+ for(l = 1; l < 20; l++)
+ d1[20 * i + l] = EXP(dd[l] * rptr[i]);
+ }
+
+ for (i = 0; i < upper; i++)
+ {
+ __m128d a0 = _mm_setzero_pd();
+ __m128d a1 = _mm_setzero_pd();
+ __m128d a2 = _mm_setzero_pd();
+
+ double
+ r = rptr[cptr[i]],
+ wr1 = r * wgt[i],
+ wr2 = r * r * wgt[i];
+
+ d1 = &d_start[20 * cptr[i]];
+ sum = &sumtable[20 * i];
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d tmpv = _mm_mul_pd(_mm_load_pd(&d1[l]), _mm_load_pd(&sum[l]));
+
+ a0 = _mm_add_pd(a0, tmpv);
+ __m128d sv = _mm_load_pd(&s[l]);
+
+ a1 = _mm_add_pd(a1, _mm_mul_pd(tmpv, sv));
+ __m128d ev = _mm_load_pd(&e[l]);
+
+ a2 = _mm_add_pd(a2, _mm_mul_pd(tmpv, ev));
+ }
+
+ a0 = _mm_hadd_pd(a0, a0);
+ a1 = _mm_hadd_pd(a1, a1);
+ a2 = _mm_hadd_pd(a2, a2);
+
+ _mm_storel_pd(&inv_Li, a0);
+ _mm_storel_pd(&dlnLidlz, a1);
+ _mm_storel_pd(&d2lnLidlz2, a2);
+
+ inv_Li = 1.0/FABS(inv_Li);
+
+ dlnLidlz *= inv_Li;
+ d2lnLidlz2 *= inv_Li;
+
+ dlnLdlz += wr1 * dlnLidlz;
+ d2lnLdlz2 += wr2 * (d2lnLidlz2 - dlnLidlz * dlnLidlz);
+ }
+
+ *ext_dlnLdlz = dlnLdlz;
+ *ext_d2lnLdlz2 = d2lnLdlz2;
+
+ free(d_start);
+}
+
+
+
+
+#endif
+
+
+
diff --git a/examl/mic_native.h b/examl/mic_native.h
new file mode 100644
index 0000000..05b4775
--- /dev/null
+++ b/examl/mic_native.h
@@ -0,0 +1,96 @@
+#ifndef MIC_NATIVE_H_
+#define MIC_NATIVE_H_
+
+//#define VECTOR_PADDING 8
+//#define GET_PADDED_WIDTH(w) w % VECTOR_PADDING == 0 ? w : w + (VECTOR_PADDING - (w % VECTOR_PADDING))
+
+// general functions
+void updateModel_MIC(pInfo* part);
+
+// DNA data
+
+void makeP_DNA_MIC(double z1, double z2, double *rptr, double *EI, double *EIGN, int numberOfCategories,
+ double *left, double *right, boolean saveMem, int maxCat);
+
+void precomputeTips_DNA_MIC(int tipCase, double *tipVector, double *left, double *right,
+ double *umpLeft, double *umpRight,
+ int numberOfCategories);
+
+void newviewGTRGAMMA_MIC(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ double *umpLeft, double *umpRight);
+
+double evaluateGAMMA_MIC(int *wptr,
+ double *x1_start, double *x2_start,
+ double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable);
+
+void sumGAMMA_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n);
+
+void coreGTRGAMMA_MIC(const int upper, double *sumtable,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wrptr);
+
+void sumcoreGTRGAMMA_MIC(int tipCase, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, const int n,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wgt);
+
+// protein data - single matrix
+
+void makeP_PROT_MIC(double z1, double z2, double *rptr, double *EI, double *EIGN, int numberOfCategories,
+ double *left, double *right, boolean saveMem, int maxCat);
+
+void precomputeTips_PROT_MIC(int tipCase, double *tipVector, double *left, double *right,
+ double *umpLeft, double *umpRight,
+ int numberOfCategories);
+
+void newviewGTRGAMMAPROT_MIC(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ double *umpLeft, double *umpRight);
+
+double evaluateGAMMAPROT_MIC(int *wptr,
+ double *x1_start, double *x2_start,
+ double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable);
+
+void sumGAMMAPROT_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n);
+
+void coreGTRGAMMAPROT_MIC(const int upper, double *sumtable,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wrptr);
+
+
+// protein data - LG4
+
+void updateModel_LG4_MIC(pInfo* part);
+
+void makeP_PROT_LG4_MIC(double z1, double z2, double *rptr, double *EI[4], double *EIGN[4], int numberOfCategories, double *left, double *right);
+
+void precomputeTips_PROT_LG4_MIC(int tipCase, double *tipVector[4], double *left, double *right,
+ double *umpLeft, double *umpRight,
+ int numberOfCategories);
+
+void newviewGTRGAMMAPROT_LG4_MIC(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ double *umpLeft, double *umpRight);
+
+double evaluateGAMMAPROT_LG4_MIC(int *wptr,
+ double *x1_start, double *x2_start,
+ double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable, double* weights);
+
+void sumGAMMAPROT_LG4_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n);
+
+void coreGTRGAMMAPROT_LG4_MIC(const int upper, double *sumtable,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double *EIGN[4], double *gammaRates, double lz, int *wrptr, double* weights);
+
+
+
+#endif /* MIC_NATIVE_H_ */
diff --git a/examl/mic_native_aa.c b/examl/mic_native_aa.c
new file mode 100644
index 0000000..6e64105
--- /dev/null
+++ b/examl/mic_native_aa.c
@@ -0,0 +1,1323 @@
+#include <immintrin.h>
+#include <string.h>
+#include <math.h>
+
+#include "axml.h"
+#include "mic_native.h"
+
+static const int states = 20;
+static const int statesSquare = 20 * 20;
+static const int span = 20 * 4;
+static const int maxStateValue = 23;
+
+void makeP_PROT_MIC(double z1, double z2, double *rptr, double *EI, double *EIGN, int numberOfCategories, double *left, double *right,
+ boolean saveMem, int maxCat)
+{
+ int
+ i,
+ j,
+ k,
+ span = states * numberOfCategories;
+
+ /* assign some space for pre-computing and later re-using functions */
+
+ double lz1[20] __attribute__((align(BYTE_ALIGNMENT)));
+ double lz2[20] __attribute__((align(BYTE_ALIGNMENT)));
+ double d1[20] __attribute__((align(BYTE_ALIGNMENT)));
+ double d2[20] __attribute__((align(BYTE_ALIGNMENT)));
+
+
+ /* multiply branch lengths with eigenvalues */
+ for(i = 1; i < states; i++)
+ {
+ lz1[i] = EIGN[i] * z1;
+ lz2[i] = EIGN[i] * z2;
+ }
+
+
+ /* loop over the number of rate categories, this will be 4 for the GAMMA model and
+ variable for the CAT model */
+
+ for(i = 0; i < numberOfCategories; i++)
+ {
+ /* exponentiate the rate multiplied by the branch */
+
+ for(j = 1; j < states; j++)
+ {
+ d1[j] = EXP(rptr[i] * lz1[j]);
+ d2[j] = EXP(rptr[i] * lz2[j]);
+ }
+
+ /* now fill the P matrices for the two branch length values */
+
+ for(j = 0; j < states; j++)
+ {
+ /* left and right are pre-allocated arrays */
+
+ left[i * states + j] = 1.0;
+ right[i * states + j] = 1.0;
+
+ for(k = 1; k < states; k++)
+ {
+ left[k * span + i * states + j] = d1[k] * EI[states * j + k];
+ right[k * span + i * states + j] = d2[k] * EI[states * j + k];
+ }
+ }
+ }
+
+
+ /* if memory saving is enabled and we are using CAT we need to do one additional P matrix
+ calculation for a rate of 1.0 to compute the entries of a column/tree site comprising only gaps */
+
+
+ if(saveMem)
+ {
+ i = maxCat;
+
+ for(j = 1; j < states; j++)
+ {
+ d1[j] = EXP (lz1[j]);
+ d2[j] = EXP (lz2[j]);
+ }
+
+ for(j = 0; j < states; j++)
+ {
+ left[statesSquare * i + states * j] = 1.0;
+ right[statesSquare * i + states * j] = 1.0;
+
+ for(k = 1; k < states; k++)
+ {
+ left[statesSquare * i + states * j + k] = d1[k] * EI[states * j + k];
+ right[statesSquare * i + states * j + k] = d2[k] * EI[states * j + k];
+ }
+ }
+ }
+}
+
+void precomputeTips_PROT_MIC(int tipCase, double *tipVector, double *left, double *right,
+ double *umpLeft, double *umpRight,
+ int numberOfCategories)
+{
+ /* no precomputation needed if both children are inner nodes */
+ if (tipCase == INNER_INNER)
+ return;
+
+ const int
+ span = states * numberOfCategories,
+ umpSize = span * maxStateValue;
+
+ for(int k = 0; k < umpSize; ++k)
+ {
+ umpLeft[k] = 0.0;
+ umpRight[k] = 0.0;
+ }
+
+ for(int i = 0; i < maxStateValue; ++i)
+ {
+ for(int l = 0; l < states; ++l)
+ {
+ #pragma ivdep
+ #pragma vector aligned
+ for(int k = 0; k < span; ++k)
+ {
+ umpLeft[span * i + k] += tipVector[i * states + l] * left[l * span + k];
+ if (tipCase == TIP_TIP)
+ umpRight[span * i + k] += tipVector[i * states + l] * right[l * span + k];
+ }
+ }
+ }
+}
+
+inline void mic_fma4x80(const double* inv, double* outv, double* mulv)
+{
+ __mmask8 k1 = _mm512_int2mask(0x0F);
+ __mmask8 k2 = _mm512_int2mask(0xF0);
+ for(int l = 0; l < 80; l += 40)
+ {
+ __m512d t = _mm512_setzero_pd();
+
+ t = _mm512_extload_pd(&inv[l], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+ __m512d m = _mm512_load_pd(&mulv[l]);
+ __m512d acc = _mm512_load_pd(&outv[l]);
+ __m512d r = _mm512_fmadd_pd(t, m, acc);
+ _mm512_store_pd(&outv[l], r);
+
+ m = _mm512_load_pd(&mulv[l + 8]);
+ acc = _mm512_load_pd(&outv[l + 8]);
+ r = _mm512_fmadd_pd(t, m, acc);
+ _mm512_store_pd(&outv[l + 8], r);
+
+ t = _mm512_mask_extload_pd(t, k1, &inv[l], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+ t = _mm512_mask_extload_pd(t, k2, &inv[l+20], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+
+ m = _mm512_load_pd(&mulv[l + 16]);
+ acc = _mm512_load_pd(&outv[l + 16]);
+ r = _mm512_fmadd_pd(t, m, acc);
+ _mm512_store_pd(&outv[l + 16], r);
+
+ t = _mm512_extload_pd(&inv[l+20], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+ m = _mm512_load_pd(&mulv[l + 24]);
+ acc = _mm512_load_pd(&outv[l + 24]);
+ r = _mm512_fmadd_pd(t, m, acc);
+ _mm512_store_pd(&outv[l + 24], r);
+
+ m = _mm512_load_pd(&mulv[l + 32]);
+ acc = _mm512_load_pd(&outv[l + 32]);
+ r = _mm512_fmadd_pd(t, m, acc);
+ _mm512_store_pd(&outv[l + 32], r);
+ }
+}
+
+
+void newviewGTRGAMMAPROT_MIC(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ double *umpLeft, double *umpRight)
+{
+ __m512d minlikelihood_MIC = _mm512_set1_pd(minlikelihood);
+ __m512d twotothe256_MIC = _mm512_set1_pd(twotothe256);
+ __m512i absMask_MIC = _mm512_set1_epi64(0x7fffffffffffffffULL);
+
+ int addScale = 0;
+
+ double
+ *aEV = extEV,
+ *aRight = right,
+ *aLeft = left,
+ *umpX1 = umpLeft,
+ *umpX2 = umpRight;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ /* multiply all possible tip state vectors with the respective P-matrices
+ */
+
+ for (int i = 0; i < n; i++)
+ {
+ const double *uX1 = &umpX1[span * tipX1[i]];
+ const double *uX2 = &umpX2[span * tipX2[i]];
+
+ double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double* v3 = &x3[i * span];
+
+ #pragma ivdep
+ #pragma vector aligned
+ for(int l = 0; l < span; ++l)
+ {
+ uX[l] = uX1[l] * uX2[l];
+ v3[l] = 0.;
+ }
+
+ for(int k = 0; k < states; ++k)
+ {
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+ }
+
+ mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+ }
+
+ } // sites loop
+ }
+ break;
+ case TIP_INNER:
+ {
+ for (int i = 0; i < n; i++)
+ {
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T1);
+// _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T0);
+ }
+
+ /* access pre-computed value based on the raw sequence data tipX1 that is used as an index */
+ double* uX1 = &umpX1[span * tipX1[i]];
+ double uX2[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+ double* v3 = &(x3[span * i]);
+
+ const double* v2 = &(x2[span * i]);
+
+ #pragma vector aligned
+ for(int l = 0; l < span; ++l)
+ {
+ uX2[l] = 0.;
+ }
+
+ for(int k = 0; k < states; ++k)
+ {
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&aRight[span*(k+1) + j], _MM_HINT_T0);
+ }
+
+ mic_fma4x80(&v2[k], uX2, &aRight[k * span]);
+ }
+
+ #pragma ivdep
+ #pragma vector aligned
+ for(int l = 0; l < span; ++l)
+ {
+ uX[l] = uX1[l] * uX2[l];
+ v3[l] = 0.;
+ }
+
+ for(int k = 0; k < states; ++k)
+ {
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+ }
+
+ mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+ }
+
+ __m512d t1 = _mm512_load_pd(&v3[0]);
+ t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+ double vmax = _mm512_reduce_gmax_pd(t1);
+ for (int l = 8; l < span; l += 8)
+ {
+ __m512d t = _mm512_load_pd(&v3[l]);
+ t = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t), absMask_MIC));
+ double vmax2 = _mm512_reduce_gmax_pd(t);
+ vmax = MAX(vmax, vmax2);
+ }
+
+ if (vmax < minlikelihood)
+ {
+ #pragma vector aligned nontemporal
+ for(int l = 0; l < span; l++)
+ v3[l] *= twotothe256;
+
+ addScale += wgt[i];
+ }
+
+ } // site loop
+
+ }
+ break;
+ case INNER_INNER:
+ {
+ /* same as above, without pre-computations */
+
+ for (int i = 0; i < n; i++)
+ {
+
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&x1[span*(i+1) + j], _MM_HINT_T1);
+ _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T1);
+// _mm_prefetch((const char *)&x1[span*(i+1) + j], _MM_HINT_T0);
+// _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T0);
+ }
+
+
+ double uX1[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double uX2[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+ double* v3 = &(x3[span * i]);
+
+ const double* v1 = &(x1[span * i]);
+ const double* v2 = &(x2[span * i]);
+
+ #pragma vector aligned
+ for(int l = 0; l < span; ++l)
+ {
+ uX1[l] = 0.;
+ uX2[l] = 0.;
+ }
+
+ for(int k = 0; k < states; ++k)
+ {
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&aRight[span*(k+1) + j], _MM_HINT_T0);
+ _mm_prefetch((const char *)&aLeft[span*(k+1) + j], _MM_HINT_T0);
+ }
+
+ mic_fma4x80(&v1[k], uX1, &aLeft[k * span]);
+ mic_fma4x80(&v2[k], uX2, &aRight[k * span]);
+ }
+
+ #pragma ivdep
+ #pragma vector aligned
+ for(int l = 0; l < span; ++l)
+ {
+ uX[l] = uX1[l] * uX2[l];
+ v3[l] = 0.;
+ }
+
+ for(int k = 0; k < states; ++k)
+ {
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+ }
+
+ mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+ }
+
+ __m512d t1 = _mm512_load_pd(&v3[0]);
+ t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+ double vmax = _mm512_reduce_gmax_pd(t1);
+ for (int l = 8; l < span; l += 8)
+ {
+ __m512d t = _mm512_load_pd(&v3[l]);
+ t = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t), absMask_MIC));
+ double vmax2 = _mm512_reduce_gmax_pd(t);
+ vmax = MAX(vmax, vmax2);
+ }
+
+ if (vmax < minlikelihood)
+ {
+ #pragma vector aligned nontemporal
+ for(int l = 0; l < span; l++)
+ v3[l] *= twotothe256;
+
+ addScale += wgt[i];
+ }
+ }
+ } break;
+ default:
+// assert(0);
+ break;
+ }
+
+ *scalerIncrement = addScale;
+
+}
+
+
+
+double evaluateGAMMAPROT_MIC(int *wgt, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable)
+{
+ double sum = 0.0;
+
+ /* the left node is a tip */
+ if(tipX1)
+ {
+ double
+ *aTipVec = tipVector;
+
+ /* loop over the sites of this partition */
+ for (int i = 0; i < n; i++)
+ {
+ /* access pre-computed tip vector values via a lookup table */
+ const double *x1 = &(aTipVec[span * tipX1[i]]);
+ /* access the other(inner) node at the other end of the branch */
+ const double *x2 = &(x2_start[span * i]);
+
+ #pragma unroll(10)
+ for (int k = 0; k < span; k += 8)
+ {
+ _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+ }
+
+ double term = 0.;
+
+ #pragma ivdep
+ #pragma vector aligned
+ #pragma noprefetch x2
+ for(int j = 0; j < span; j++) {
+ term += x1[j] * x2[j] * diagptable[j];
+ }
+
+ term = log(0.25 * fabs(term));
+
+ sum += wgt[i] * term;
+ }
+ }
+ else
+ {
+ for (int i = 0; i < n; i++)
+ {
+ #pragma unroll(10)
+ for (int k = 0; k < span; k += 8)
+ {
+ _mm_prefetch((const char *) &x1_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x1_start[span*(i+1) + k], _MM_HINT_T0);
+
+ _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+ }
+
+ const double *x1 = &(x1_start[span * i]);
+ const double *x2 = &(x2_start[span * i]);
+
+ double term = 0.;
+
+ #pragma ivdep
+ #pragma vector aligned
+ #pragma noprefetch x1 x2
+ for(int j = 0; j < span; j++)
+ term += x1[j] * x2[j] * diagptable[j];
+
+ term = log(0.25 * fabs(term));
+
+ sum += wgt[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+void sumGAMMAPROT_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+ double
+ *aTipVec = tipVector;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ for(int i = 0; i < n; i++)
+ {
+ const double *left = &(aTipVec[span * tipX1[i]]);
+ const double *right = &(aTipVec[span * tipX2[i]]);
+
+ #pragma ivdep
+ #pragma vector aligned nontemporal
+ for(int l = 0; l < span; l++)
+ {
+ sumtable[i * span + l] = left[l] * right[l];
+ }
+ }
+ } break;
+ case TIP_INNER:
+ {
+ for(int i = 0; i < n; i++)
+ {
+ #pragma unroll(10)
+ for (int k = 0; k < span; k += 8)
+ {
+ _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+ }
+
+ const double *left = &(aTipVec[span * tipX1[i]]);
+ const double *right = &(x2_start[span * i]);
+
+ #pragma ivdep
+ #pragma vector aligned nontemporal
+ #pragma noprefetch right
+ for(int l = 0; l < span; l++)
+ {
+ sumtable[i * span + l] = left[l] * right[l];
+ }
+ }
+ } break;
+ case INNER_INNER:
+ {
+ for(int i = 0; i < n; i++)
+ {
+ #pragma unroll(10)
+ for (int k = 0; k < span; k += 8)
+ {
+ _mm_prefetch((const char *) &x1_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x1_start[span*(i+1) + k], _MM_HINT_T0);
+
+ _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+ }
+
+ const double *left = &(x1_start[span * i]);
+ const double *right = &(x2_start[span * i]);
+
+ #pragma ivdep
+ #pragma vector aligned nontemporal
+ #pragma noprefetch left right
+ for(int l = 0; l < span; l++)
+ {
+ sumtable[i * span + l] = left[l] * right[l];
+ }
+ }
+ } break;
+ // default:
+ // assert(0);
+ }
+}
+
+void coreGTRGAMMAPROT_MIC(const int upper, double *sumtable,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wgt)
+{
+ static const int states = 20;
+ static const int span = 20 * 4;
+
+ double diagptable0[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double diagptable1[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double diagptable2[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double diagptable01[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double diagptable02[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+ /* pre-compute the derivatives of the P matrix for all discrete GAMMA rates */
+
+ for(int i = 0; i < 4; i++)
+ {
+ const double ki = gammaRates[i];
+ const double kisqr = ki * ki;
+
+ diagptable0[i*states] = 1.;
+ diagptable1[i*states] = 0.;
+ diagptable2[i*states] = 0.;
+
+ for(int l = 1; l < states; l++)
+ {
+ diagptable0[i * states + l] = exp(EIGN[l] * ki * lz);
+ diagptable1[i * states + l] = EIGN[l] * ki;
+ diagptable2[i * states + l] = EIGN[l] * EIGN[l] * kisqr;
+ }
+ }
+
+ #pragma ivdep
+ for(int i = 0; i < span; i++)
+ {
+ diagptable01[i] = diagptable0[i] * diagptable1[i];
+ diagptable02[i] = diagptable0[i] * diagptable2[i];
+ }
+
+ /* loop over sites in this partition */
+
+ const int aligned_width = upper % 8 == 0 ? upper / 8 : upper / 8 + 1;
+
+ double dlnLBuf[8] __attribute__((align(BYTE_ALIGNMENT)));
+ double d2lnLBuf[8] __attribute__((align(BYTE_ALIGNMENT)));
+ for (int j = 0; j < 8; ++j)
+ {
+ dlnLBuf[j] = 0.;
+ d2lnLBuf[j] = 0.;
+ }
+
+ __mmask16 k1 = _mm512_int2mask(0x000000FF);
+
+ for (int i = 0; i < aligned_width; i++)
+ {
+ /* access the array with pre-computed values */
+ const double *sum = &sumtable[i * span * 8];
+
+ /* initial per-site likelihood and 1st and 2nd derivatives */
+
+ double invBuf[8] __attribute__((align(BYTE_ALIGNMENT)));
+ double d1Buf[8] __attribute__((align(BYTE_ALIGNMENT)));
+ double d2Buf[8] __attribute__((align(BYTE_ALIGNMENT)));
+
+ __m512d invVec;
+ __m512d d1Vec;
+ __m512d d2Vec;
+ int mask = 0x01;
+
+ #pragma noprefetch sum
+ #pragma unroll(8)
+ for(int j = 0; j < 8; j++)
+ {
+ #pragma unroll(10)
+ for (int k = 0; k < span; k += 8)
+ {
+ _mm_prefetch((const char *) &sum[span*(j+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &sum[span*(j+1) + k], _MM_HINT_T0);
+ }
+
+ __m512d inv_1 = _mm512_setzero_pd();
+ __m512d d1_1 = _mm512_setzero_pd();
+ __m512d d2_1 = _mm512_setzero_pd();
+
+ for (int offset = 0; offset < span; offset += 8)
+ {
+ __m512d d0_1 = _mm512_load_pd(&diagptable0[offset]);
+ __m512d d01_1 = _mm512_load_pd(&diagptable01[offset]);
+ __m512d d02_1 = _mm512_load_pd(&diagptable02[offset]);
+ __m512d s_1 = _mm512_load_pd(&sum[j*span + offset]);
+
+ inv_1 = _mm512_fmadd_pd(d0_1, s_1, inv_1);
+ d1_1 = _mm512_fmadd_pd(d01_1, s_1, d1_1);
+ d2_1 = _mm512_fmadd_pd(d02_1, s_1, d2_1);
+ }
+
+ __mmask8 k1 = _mm512_int2mask(mask);
+ mask <<= 1;
+
+ // reduce
+ inv_1 = _mm512_add_pd (inv_1, _mm512_swizzle_pd(inv_1, _MM_SWIZ_REG_CDAB));
+ inv_1 = _mm512_add_pd (inv_1, _mm512_swizzle_pd(inv_1, _MM_SWIZ_REG_BADC));
+ inv_1 = _mm512_add_pd (inv_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(inv_1), _MM_PERM_BADC)));
+ invVec = _mm512_mask_mov_pd(invVec, k1, inv_1);
+
+ d1_1 = _mm512_add_pd (d1_1, _mm512_swizzle_pd(d1_1, _MM_SWIZ_REG_CDAB));
+ d1_1 = _mm512_add_pd (d1_1, _mm512_swizzle_pd(d1_1, _MM_SWIZ_REG_BADC));
+ d1_1 = _mm512_add_pd (d1_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d1_1), _MM_PERM_BADC)));
+ d1Vec = _mm512_mask_mov_pd(d1Vec, k1, d1_1);
+
+ d2_1 = _mm512_add_pd (d2_1, _mm512_swizzle_pd(d2_1, _MM_SWIZ_REG_CDAB));
+ d2_1 = _mm512_add_pd (d2_1, _mm512_swizzle_pd(d2_1, _MM_SWIZ_REG_BADC));
+ d2_1 = _mm512_add_pd (d2_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d2_1), _MM_PERM_BADC)));
+ d2Vec = _mm512_mask_mov_pd(d2Vec, k1, d2_1);
+ }
+
+ _mm512_store_pd(&invBuf[0], invVec);
+ _mm512_store_pd(&d1Buf[0], d1Vec);
+ _mm512_store_pd(&d2Buf[0], d2Vec);
+
+ #pragma ivdep
+ #pragma vector aligned
+ for (int j = 0; j < 8; ++j)
+ {
+ const double inv_Li = 1.0 / invBuf[j];
+
+ const double d1 = d1Buf[j] * inv_Li;
+ const double d2 = d2Buf[j] * inv_Li;
+
+ dlnLBuf[j] += wgt[i * 8 + j] * d1;
+ d2lnLBuf[j] += wgt[i * 8 + j] * (d2 - d1 * d1);
+ }
+ } // site loop
+
+ double dlnLdlz = 0.;
+ double d2lnLdlz2 = 0.;
+ for (int j = 0; j < 8; ++j)
+ {
+ dlnLdlz += dlnLBuf[j];
+ d2lnLdlz2 += d2lnLBuf[j];
+ }
+
+ *ext_dlnLdlz = dlnLdlz;
+ *ext_d2lnLdlz2 = d2lnLdlz2;
+}
+
+
+/****
+ * PROTEIN - LG4
+ */
+void updateModel_LG4_MIC(pInfo* part)
+{
+ double
+ **EV = part->EV_LG4,
+ **tipVector = part->tipVector_LG4,
+ *aEV = part->mic_EV,
+ *aTipVector = part->mic_tipVector;
+
+ const int
+ states = part->states,
+ span = 4 * states,
+ maxState = getUndetermined(part->dataType) + 1;
+
+ int
+ k, l;
+
+ #pragma ivdep
+ for (l = 0; l < 4 * states * states; ++l)
+ {
+ aEV[l] = EV[(l % span) / states][(l / span) * states + (l % states)];
+ }
+
+ for(int k = 0; k < maxState; k++)
+ {
+ for(int j = 0; j < 4; j++)
+ {
+ for(int l = 0; l < states; l++)
+ {
+ aTipVector[k*span + j*states + l] = tipVector[j][k*states + l];
+ }
+ }
+ }
+}
+
+void makeP_PROT_LG4_MIC(double z1, double z2, double *rptr, double *EI[4], double *EIGN[4], int numberOfCategories, double *left, double *right)
+{
+ int
+ i,
+ j,
+ k;
+
+ double
+ d1[64],
+ d2[64];
+
+ for(i = 0; i < numberOfCategories; i++)
+ {
+ for(j = 1; j < states; j++)
+ {
+ d1[j] = EXP (rptr[i] * EIGN[i][j] * z1);
+ d2[j] = EXP (rptr[i] * EIGN[i][j] * z2);
+ }
+
+ for(j = 0; j < states; j++)
+ {
+ left[i * states + j] = 1.0;
+ right[i * states + j] = 1.0;
+
+ for(k = 1; k < states; k++)
+ {
+ left[k * span + i * states + j] = d1[k] * EI[i][states * j + k];
+ right[k * span + i * states + j] = d2[k] * EI[i][states * j + k];
+ }
+ }
+ }
+}
+
+void precomputeTips_PROT_LG4_MIC(int tipCase, double *tipVector[4], double *left, double *right,
+ double *umpLeft, double *umpRight,
+ int numberOfCategories)
+{
+ /* no precomputation needed if both children are inner nodes */
+ if (tipCase == INNER_INNER)
+ return;
+
+ const int
+ span = states * numberOfCategories,
+ umpSize = span * maxStateValue;
+
+ for(int k = 0; k < umpSize; ++k)
+ {
+ umpLeft[k] = 0.0;
+ umpRight[k] = 0.0;
+ }
+
+ for(int i = 0; i < maxStateValue; ++i)
+ {
+ for(int l = 0; l < states; ++l)
+ {
+ #pragma ivdep
+ #pragma vector aligned
+ for(int k = 0; k < span; ++k)
+ {
+ umpLeft[span * i + k] += tipVector[k/20][i * states + l] * left[l * span + k];
+ if (tipCase == TIP_TIP)
+ umpRight[span * i + k] += tipVector[k/20][i * states + l] * right[l * span + k];
+ }
+ }
+ }
+}
+
+void newviewGTRGAMMAPROT_LG4_MIC(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ double *umpLeft, double *umpRight)
+{
+ __m512d minlikelihood_MIC = _mm512_set1_pd(minlikelihood);
+ __m512d twotothe256_MIC = _mm512_set1_pd(twotothe256);
+ __m512i absMask_MIC = _mm512_set1_epi64(0x7fffffffffffffffULL);
+
+ int addScale = 0;
+
+ /* we assume that P-matrix and eigenvectors are in correct layout already */
+ double
+ *aEV = extEV,
+ *aRight = right,
+ *aLeft = left,
+ *umpX1 = umpLeft,
+ *umpX2 = umpRight;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ for (int i = 0; i < n; i++)
+ {
+ const double *uX1 = &umpX1[span * tipX1[i]];
+ const double *uX2 = &umpX2[span * tipX2[i]];
+
+ double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double* v3 = &x3[i * span];
+
+ #pragma ivdep
+ #pragma vector aligned
+ for(int l = 0; l < span; ++l)
+ {
+ uX[l] = uX1[l] * uX2[l];
+ v3[l] = 0.;
+ }
+
+ for(int k = 0; k < states; ++k)
+ {
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+ }
+
+ mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+ }
+
+ } // sites loop
+ }
+ break;
+ case TIP_INNER:
+ {
+ for (int i = 0; i < n; i++)
+ {
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T1);
+// _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T0);
+ }
+
+ /* access pre-computed value based on the raw sequence data tipX1 that is used as an index */
+ double* uX1 = &umpX1[span * tipX1[i]];
+ double uX2[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+ double* v3 = &(x3[span * i]);
+
+ const double* v2 = &(x2[span * i]);
+
+ #pragma vector aligned
+ for(int l = 0; l < span; ++l)
+ {
+ uX2[l] = 0.;
+ }
+
+ for(int k = 0; k < states; ++k)
+ {
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&aRight[span*(k+1) + j], _MM_HINT_T0);
+ }
+
+ mic_fma4x80(&v2[k], uX2, &aRight[k * span]);
+ }
+
+ #pragma ivdep
+ #pragma vector aligned
+ for(int l = 0; l < span; ++l)
+ {
+ uX[l] = uX1[l] * uX2[l];
+ v3[l] = 0.;
+ }
+
+ for(int k = 0; k < states; ++k)
+ {
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+ }
+
+ mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+ }
+
+ __m512d t1 = _mm512_load_pd(&v3[0]);
+ t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+ double vmax = _mm512_reduce_gmax_pd(t1);
+ for (int l = 8; l < span; l += 8)
+ {
+ __m512d t = _mm512_load_pd(&v3[l]);
+ t = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t), absMask_MIC));
+ double vmax2 = _mm512_reduce_gmax_pd(t);
+ vmax = MAX(vmax, vmax2);
+ }
+
+ if (vmax < minlikelihood)
+ {
+ #pragma vector aligned nontemporal
+ for(int l = 0; l < span; l++)
+ v3[l] *= twotothe256;
+
+ addScale += wgt[i];
+ }
+
+ } // site loop
+
+ }
+ break;
+ case INNER_INNER:
+ {
+ for (int i = 0; i < n; i++)
+ {
+
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&x1[span*(i+1) + j], _MM_HINT_T1);
+ _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T1);
+// _mm_prefetch((const char *)&x1[span*(i+1) + j], _MM_HINT_T0);
+// _mm_prefetch((const char *)&x2[span*(i+1) + j], _MM_HINT_T0);
+ }
+
+
+ double uX1[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double uX2[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double uX[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+ double* v3 = &(x3[span * i]);
+
+ const double* v1 = &(x1[span * i]);
+ const double* v2 = &(x2[span * i]);
+
+ #pragma vector aligned
+ for(int l = 0; l < span; ++l)
+ {
+ uX1[l] = 0.;
+ uX2[l] = 0.;
+ }
+
+ for(int k = 0; k < states; ++k)
+ {
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&aRight[span*(k+1) + j], _MM_HINT_T0);
+ _mm_prefetch((const char *)&aLeft[span*(k+1) + j], _MM_HINT_T0);
+ }
+
+ mic_fma4x80(&v1[k], uX1, &aLeft[k * span]);
+ mic_fma4x80(&v2[k], uX2, &aRight[k * span]);
+ }
+
+ #pragma ivdep
+ #pragma vector aligned
+ for(int l = 0; l < span; ++l)
+ {
+ uX[l] = uX1[l] * uX2[l];
+ v3[l] = 0.;
+ }
+
+ for(int k = 0; k < states; ++k)
+ {
+ #pragma unroll(10)
+ for (int j = 0; j < span; j += 8)
+ {
+ _mm_prefetch((const char *)&aEV[span*(k+1) + j], _MM_HINT_T0);
+ }
+
+ mic_fma4x80(&uX[k], v3, &aEV[k * span]);
+ }
+
+ __m512d t1 = _mm512_load_pd(&v3[0]);
+ t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+ double vmax = _mm512_reduce_gmax_pd(t1);
+ for (int l = 8; l < span; l += 8)
+ {
+ __m512d t = _mm512_load_pd(&v3[l]);
+ t = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t), absMask_MIC));
+ double vmax2 = _mm512_reduce_gmax_pd(t);
+ vmax = MAX(vmax, vmax2);
+ }
+
+ if (vmax < minlikelihood)
+ {
+ #pragma vector aligned nontemporal
+ for(int l = 0; l < span; l++)
+ v3[l] *= twotothe256;
+
+ addScale += wgt[i];
+ }
+ }
+ } break;
+ default:
+// assert(0);
+ break;
+ }
+
+ *scalerIncrement = addScale;
+
+}
+
+
+
+double evaluateGAMMAPROT_LG4_MIC(int *wgt, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable, double *weights)
+{
+ double wtable[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+ /* pre-multiply diagptable entries with the corresponding weights */
+ for(int j = 0; j < 4; j++)
+ for(int k = 0; k < states; k++)
+ {
+ wtable[j * states + k] = diagptable[j * states + k] * weights[j];
+ }
+
+ double sum = 0.0;
+
+ /* the left node is a tip */
+ if(tipX1)
+ {
+ /* loop over the sites of this partition */
+ for (int i = 0; i < n; i++)
+ {
+ const double
+ *aTipVec = tipVector;
+
+ /* access pre-computed tip vector values via a lookup table */
+ const double *x1 = &(aTipVec[span * tipX1[i]]);
+ /* access the other(inner) node at the other end of the branch */
+ const double *x2 = &(x2_start[span * i]);
+
+ #pragma unroll(10)
+ for (int k = 0; k < span; k += 8)
+ {
+ _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+ }
+
+ double term = 0.;
+
+ #pragma ivdep
+ #pragma vector aligned
+ #pragma noprefetch x2
+ for(int j = 0; j < span; j++)
+ {
+ term += x1[j] * x2[j] * wtable[j];
+ }
+
+ term = log(fabs(term));
+
+ sum += wgt[i] * term;
+ }
+ }
+ else
+ {
+ for (int i = 0; i < n; i++)
+ {
+ #pragma unroll(10)
+ for (int k = 0; k < span; k += 8)
+ {
+ _mm_prefetch((const char *) &x1_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x1_start[span*(i+1) + k], _MM_HINT_T0);
+
+ _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+ }
+
+ const double *x1 = &(x1_start[span * i]);
+ const double *x2 = &(x2_start[span * i]);
+
+ double term = 0.;
+
+ #pragma ivdep
+ #pragma vector aligned
+ #pragma noprefetch x1 x2
+ for(int j = 0; j < span; j++)
+ term += x1[j] * x2[j] * wtable[j];
+
+ term = log(fabs(term));
+
+ sum += wgt[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+void sumGAMMAPROT_LG4_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+ const double
+ *aTipVec = tipVector;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ for(int i = 0; i < n; i++)
+ {
+ const double *left = &(aTipVec[span * tipX1[i]]);
+ const double *right = &(aTipVec[span * tipX2[i]]);
+
+ #pragma ivdep
+ #pragma vector aligned nontemporal
+ for(int l = 0; l < span; l++)
+ {
+ sumtable[i * span + l] = left[l] * right[l];
+ }
+ }
+ } break;
+ case TIP_INNER:
+ {
+ for(int i = 0; i < n; i++)
+ {
+ #pragma unroll(10)
+ for (int k = 0; k < span; k += 8)
+ {
+ _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+ }
+
+ const double *left = &(aTipVec[span * tipX1[i]]);
+ const double *right = &(x2_start[span * i]);
+
+ #pragma ivdep
+ #pragma vector aligned nontemporal
+ #pragma noprefetch right
+ for(int l = 0; l < span; l++)
+ {
+ sumtable[i * span + l] = left[l] * right[l];
+ }
+ }
+ } break;
+ case INNER_INNER:
+ {
+ for(int i = 0; i < n; i++)
+ {
+ #pragma unroll(10)
+ for (int k = 0; k < span; k += 8)
+ {
+ _mm_prefetch((const char *) &x1_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x1_start[span*(i+1) + k], _MM_HINT_T0);
+
+ _mm_prefetch((const char *) &x2_start[span*(i+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+1) + k], _MM_HINT_T0);
+ }
+
+ const double *left = &(x1_start[span * i]);
+ const double *right = &(x2_start[span * i]);
+
+ #pragma ivdep
+ #pragma vector aligned nontemporal
+ #pragma noprefetch left right
+ for(int l = 0; l < span; l++)
+ {
+ sumtable[i * span + l] = left[l] * right[l];
+ }
+ }
+ } break;
+ // default:
+ // assert(0);
+ }
+}
+
+void coreGTRGAMMAPROT_LG4_MIC(const int upper, double *sumtable,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double *EIGN[4], double *gammaRates,
+ double lz, int *wgt, double *weights)
+{
+ double diagptable0[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double diagptable1[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double diagptable2[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double diagptable01[span] __attribute__((align(BYTE_ALIGNMENT)));
+ double diagptable02[span] __attribute__((align(BYTE_ALIGNMENT)));
+
+ /* pre-compute the derivatives of the P matrix for all discrete GAMMA rates */
+
+ for(int i = 0; i < 4; i++)
+ {
+ const double ki = gammaRates[i];
+ const double kisqr = ki * ki;
+
+ diagptable0[i*states] = 1. * weights[i];
+ diagptable1[i*states] = 0.;
+ diagptable2[i*states] = 0.;
+
+ for(int l = 1; l < states; l++)
+ {
+ diagptable0[i * states + l] = exp(EIGN[i][l] * ki * lz) * weights[i];
+ diagptable1[i * states + l] = EIGN[i][l] * ki;
+ diagptable2[i * states + l] = EIGN[i][l] * EIGN[i][l] * kisqr;
+ }
+ }
+
+ #pragma ivdep
+ for(int i = 0; i < span; i++)
+ {
+ diagptable01[i] = diagptable0[i] * diagptable1[i];
+ diagptable02[i] = diagptable0[i] * diagptable2[i];
+ }
+
+ /* loop over sites in this partition */
+
+ const int aligned_width = upper % 8 == 0 ? upper / 8 : upper / 8 + 1;
+
+ double dlnLdlz = 0.;
+ double d2lnLdlz2 = 0.;
+
+ __mmask16 k1 = _mm512_int2mask(0x000000FF);
+
+ for (int i = 0; i < aligned_width; i++)
+ {
+ /* access the array with pre-computed values */
+ const double *sum = &sumtable[i * span * 8];
+
+ /* initial per-site likelihood and 1st and 2nd derivatives */
+
+ double invBuf[8] __attribute__((align(BYTE_ALIGNMENT)));
+ double d1Buf[8] __attribute__((align(BYTE_ALIGNMENT)));
+ double d2Buf[8] __attribute__((align(BYTE_ALIGNMENT)));
+
+ __m512d invVec;
+ __m512d d1Vec;
+ __m512d d2Vec;
+ int mask = 0x01;
+
+ #pragma noprefetch sum
+ #pragma unroll(8)
+ for(int j = 0; j < 8; j++)
+ {
+ #pragma unroll(10)
+ for (int k = 0; k < span; k += 8)
+ {
+ _mm_prefetch((const char *) &sum[span*(j+2) + k], _MM_HINT_T1);
+ _mm_prefetch((const char *) &sum[span*(j+1) + k], _MM_HINT_T0);
+ }
+
+ __m512d inv_1 = _mm512_setzero_pd();
+ __m512d d1_1 = _mm512_setzero_pd();
+ __m512d d2_1 = _mm512_setzero_pd();
+
+ for (int offset = 0; offset < span; offset += 8)
+ {
+ __m512d d0_1 = _mm512_load_pd(&diagptable0[offset]);
+ __m512d d01_1 = _mm512_load_pd(&diagptable01[offset]);
+ __m512d d02_1 = _mm512_load_pd(&diagptable02[offset]);
+ __m512d s_1 = _mm512_load_pd(&sum[j*span + offset]);
+
+ inv_1 = _mm512_fmadd_pd(d0_1, s_1, inv_1);
+ d1_1 = _mm512_fmadd_pd(d01_1, s_1, d1_1);
+ d2_1 = _mm512_fmadd_pd(d02_1, s_1, d2_1);
+ }
+
+ __mmask8 k1 = _mm512_int2mask(mask);
+ mask <<= 1;
+
+ // reduce
+ inv_1 = _mm512_add_pd (inv_1, _mm512_swizzle_pd(inv_1, _MM_SWIZ_REG_CDAB));
+ inv_1 = _mm512_add_pd (inv_1, _mm512_swizzle_pd(inv_1, _MM_SWIZ_REG_BADC));
+ inv_1 = _mm512_add_pd (inv_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(inv_1), _MM_PERM_BADC)));
+ invVec = _mm512_mask_mov_pd(invVec, k1, inv_1);
+
+ d1_1 = _mm512_add_pd (d1_1, _mm512_swizzle_pd(d1_1, _MM_SWIZ_REG_CDAB));
+ d1_1 = _mm512_add_pd (d1_1, _mm512_swizzle_pd(d1_1, _MM_SWIZ_REG_BADC));
+ d1_1 = _mm512_add_pd (d1_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d1_1), _MM_PERM_BADC)));
+ d1Vec = _mm512_mask_mov_pd(d1Vec, k1, d1_1);
+
+ d2_1 = _mm512_add_pd (d2_1, _mm512_swizzle_pd(d2_1, _MM_SWIZ_REG_CDAB));
+ d2_1 = _mm512_add_pd (d2_1, _mm512_swizzle_pd(d2_1, _MM_SWIZ_REG_BADC));
+ d2_1 = _mm512_add_pd (d2_1, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d2_1), _MM_PERM_BADC)));
+ d2Vec = _mm512_mask_mov_pd(d2Vec, k1, d2_1);
+ }
+
+ _mm512_store_pd(&invBuf[0], invVec);
+ _mm512_store_pd(&d1Buf[0], d1Vec);
+ _mm512_store_pd(&d2Buf[0], d2Vec);
+
+ #pragma ivdep
+ #pragma vector aligned
+ for (int j = 0; j < 8; ++j)
+ {
+ const double inv_Li = 1.0 / invBuf[j];
+
+ const double d1 = d1Buf[j] * inv_Li;
+ const double d2 = d2Buf[j] * inv_Li;
+
+ dlnLdlz += wgt[i * 8 + j] * d1;
+ d2lnLdlz2 += wgt[i * 8 + j] * (d2 - d1 * d1);
+ }
+ } // site loop
+
+ *ext_dlnLdlz = dlnLdlz;
+ *ext_d2lnLdlz2 = d2lnLdlz2;
+}
+
diff --git a/examl/mic_native_dna.c b/examl/mic_native_dna.c
new file mode 100644
index 0000000..9bbc47a
--- /dev/null
+++ b/examl/mic_native_dna.c
@@ -0,0 +1,661 @@
+#include <immintrin.h>
+#include <string.h>
+#include <math.h>
+
+#include "axml.h"
+#include "mic_native.h"
+
+static const int states = 4;
+static const int statesSquare = 16;
+static const int span = 4 * 4;
+static const int maxStateValue = 16;
+
+/* Common functions */
+
+void updateModel_MIC(pInfo* part)
+{
+ double
+ *EV = part->EV,
+ *tipVector = part->tipVector,
+ *aEV = part->mic_EV,
+ *aTipVector = part->mic_tipVector;
+
+ const int
+ states = part->states,
+ span = 4 * states,
+ maxState = getUndetermined(part->dataType) + 1;
+
+ int
+ k, l;
+ #pragma ivdep
+ for (l = 0; l < 4 * states * states; ++l)
+ {
+ aEV[l] = EV[(l / span) * states + (l % states)];
+ }
+
+ for(int k = 0; k < maxState; k++)
+ {
+ #pragma ivdep
+ for(int l = 0; l < states; l++)
+ {
+ aTipVector[k*span + l] = aTipVector[k*span + states + l] = aTipVector[k*span + 2*states + l] = aTipVector[k*span + 3*states + l] = tipVector[k*states + l];
+ }
+ }
+}
+
+/* DNA */
+
+void makeP_DNA_MIC(double z1, double z2, double *rptr, double *EI, double *EIGN, int numberOfCategories, double *left, double *right,
+ boolean saveMem, int maxCat)
+{
+ int
+ i,
+ j,
+ k,
+ span = states * numberOfCategories;
+
+ /* assign some space for pre-computing and later re-using functions */
+
+ double lz1[4] __attribute__((align(BYTE_ALIGNMENT)));
+ double lz2[4] __attribute__((align(BYTE_ALIGNMENT)));
+ double d1[4] __attribute__((align(BYTE_ALIGNMENT)));
+ double d2[4] __attribute__((align(BYTE_ALIGNMENT)));
+
+
+ /* multiply branch lengths with eigenvalues */
+ for(i = 1; i < states; i++)
+ {
+ lz1[i] = EIGN[i] * z1;
+ lz2[i] = EIGN[i] * z2;
+ }
+
+
+ /* loop over the number of rate categories, this will be 4 for the GAMMA model and
+ variable for the CAT model */
+
+ for(i = 0; i < numberOfCategories; i++)
+ {
+ /* exponentiate the rate multiplied by the branch */
+
+ for(j = 1; j < states; j++)
+ {
+ d1[j] = EXP(rptr[i] * lz1[j]);
+ d2[j] = EXP(rptr[i] * lz2[j]);
+ }
+
+ /* now fill the P matrices for the two branch length values */
+
+ for(j = 0; j < states; j++)
+ {
+ /* left and right are pre-allocated arrays */
+
+ left[i * states + j] = 1.0;
+ right[i * states + j] = 1.0;
+
+ for(k = 1; k < states; k++)
+ {
+ left[k * span + i * states + j] = d1[k] * EI[states * j + k];
+ right[k * span + i * states + j] = d2[k] * EI[states * j + k];
+ }
+ }
+ }
+
+
+ /* if memory saving is enabled and we are using CAT we need to do one additional P matrix
+ calculation for a rate of 1.0 to compute the entries of a column/tree site comprising only gaps */
+
+
+ if(saveMem)
+ {
+ i = maxCat;
+
+ for(j = 1; j < states; j++)
+ {
+ d1[j] = EXP (lz1[j]);
+ d2[j] = EXP (lz2[j]);
+ }
+
+ for(j = 0; j < states; j++)
+ {
+ left[statesSquare * i + states * j] = 1.0;
+ right[statesSquare * i + states * j] = 1.0;
+
+ for(k = 1; k < states; k++)
+ {
+ left[statesSquare * i + states * j + k] = d1[k] * EI[states * j + k];
+ right[statesSquare * i + states * j + k] = d2[k] * EI[states * j + k];
+ }
+ }
+ }
+}
+
+void precomputeTips_DNA_MIC(int tipCase, double *tipVector, double *left, double *right,
+ double *umpLeft, double *umpRight,
+ int numberOfCategories)
+{
+ /* no precomputation needed if both children are inner nodes */
+ if (tipCase == INNER_INNER)
+ return;
+
+ const int
+ span = states * 4,
+ umpSize = span * 16;
+
+ for(int k = 0; k < umpSize; ++k)
+ {
+ umpLeft[k] = 0.0;
+ umpRight[k] = 0.0;
+ }
+
+ for(int i = 0; i < maxStateValue; ++i)
+ {
+ for(int l = 0; l < states; ++l)
+ {
+ #pragma ivdep
+ #pragma vector aligned
+ for(int k = 0; k < span; ++k)
+ {
+ umpLeft[span * i + k] += tipVector[i * states + l] * left[l * span + k];
+ if (tipCase == TIP_TIP)
+ umpRight[span * i + k] += tipVector[i * states + l] * right[l * span + k];
+ }
+ }
+ }
+}
+
+inline void mic_fma4x16(const double* inv, double* outv, double* mulv)
+{
+ __mmask8 k1 = _mm512_int2mask(0x0F);
+ __mmask8 k2 = _mm512_int2mask(0xF0);
+
+ __m512d acc1 = _mm512_setzero_pd();
+ __m512d acc2 = _mm512_setzero_pd();
+
+ __m512d t;
+
+ for(int k = 0; k < 4; k++)
+ {
+ t = _mm512_mask_extload_pd(t, k1, &inv[0 + k], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+ t = _mm512_mask_extload_pd(t, k2, &inv[4 + k], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+
+ __m512d m = _mm512_load_pd(&mulv[k * 16]);
+ acc1 = _mm512_fmadd_pd(t, m, acc1);
+
+ t = _mm512_mask_extload_pd(t, k1, &inv[8 + k], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+ t = _mm512_mask_extload_pd(t, k2, &inv[12 + k], _MM_UPCONV_PD_NONE, _MM_BROADCAST_1X8, _MM_HINT_NONE);
+
+ m = _mm512_load_pd(&mulv[k * 16 + 8]);
+ acc2 = _mm512_fmadd_pd(t, m, acc2);
+ }
+
+ _mm512_store_pd(&outv[0], acc1);
+ _mm512_store_pd(&outv[8], acc2);
+}
+
+void newviewGTRGAMMA_MIC(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ double *umpLeft, double *umpRight)
+{
+ __m512d minlikelihood_MIC = _mm512_set1_pd(minlikelihood);
+ __m512d twotothe256_MIC = _mm512_set1_pd(twotothe256);
+ __m512i absMask_MIC = _mm512_set1_epi64(0x7fffffffffffffffULL);
+
+ int addScale = 0;
+
+ /* we assume that P-matrix and eigenvectors are in correct layout already */
+ double
+ *aEV = extEV,
+ *aRight = right,
+ *aLeft = left,
+ *umpX1 = umpLeft,
+ *umpX2 = umpRight;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ #pragma noprefetch umpX1,umpX2
+ for (int i = 0; i < n; i++)
+ {
+ _mm_prefetch((const char *)&x3[span*(i+8)], _MM_HINT_ET1);
+ _mm_prefetch((const char *)&x3[span*(i+8) + 8], _MM_HINT_ET1);
+
+ _mm_prefetch((const char *)&x3[span*(i+1)], _MM_HINT_ET0);
+ _mm_prefetch((const char *)&x3[span*(i+1) + 8], _MM_HINT_ET0);
+
+ const double *uX1 = &umpX1[16 * tipX1[i]];
+ const double *uX2 = &umpX2[16 * tipX2[i]];
+
+ double uX[16] __attribute__((align(64)));
+
+ #pragma ivdep
+ #pragma vector aligned
+ for(int l = 0; l < 16; ++l)
+ {
+ uX[l] = uX1[l] * uX2[l];
+ }
+
+ double* v3 = &x3[i * 16];
+
+ mic_fma4x16(uX, v3, aEV);
+ } // sites loop
+ }
+ break;
+ case TIP_INNER:
+ {
+ #pragma noprefetch umpX1
+ for (int i = 0; i < n; i++)
+ {
+ _mm_prefetch((const char *)&x2[span*(i+16)], _MM_HINT_T1);
+ _mm_prefetch((const char *)&x2[span*(i+16) + 8], _MM_HINT_T1);
+ _mm_prefetch((const char *)&x3[span*(i+16)], _MM_HINT_ET1);
+ _mm_prefetch((const char *)&x3[span*(i+16) + 8], _MM_HINT_ET1);
+
+ _mm_prefetch((const char *)&x2[span*(i+1)], _MM_HINT_T0);
+ _mm_prefetch((const char *)&x2[span*(i+1) + 8], _MM_HINT_T0);
+ _mm_prefetch((const char *)&x3[span*(i+1)], _MM_HINT_ET0);
+ _mm_prefetch((const char *)&x3[span*(i+1) + 8], _MM_HINT_ET0);
+
+ /* access pre-computed value based on the raw sequence data tipX1 that is used as an index */
+ double* uX1 = &umpX1[span * tipX1[i]];
+ double uX2[16] __attribute__((align(64)));
+ double uX[16] __attribute__((align(64)));
+
+ const double* v2 = &(x2[16 * i]);
+
+ mic_fma4x16(v2, uX2, aRight);
+
+ #pragma ivdep
+ #pragma vector aligned
+ for(int l = 0; l < 16; ++l)
+ {
+ uX[l] = uX1[l] * uX2[l];
+ }
+
+ double* v3 = &(x3[span * i]);
+
+ mic_fma4x16(uX, v3, aEV);
+
+ __m512d t1 = _mm512_load_pd(&v3[0]);
+ t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+ double vmax1 = _mm512_reduce_gmax_pd(t1);
+ __m512d t2 = _mm512_load_pd(&v3[8]);
+ t2 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t2), absMask_MIC));
+ double vmax2 = _mm512_reduce_gmax_pd(t2);
+
+ if(vmax1 < minlikelihood && vmax2 < minlikelihood)
+ {
+ /* t1 = _mm512_mul_pd(t1, twotothe256_MIC);
+ _mm512_store_pd(&v3[0], t1);
+ t2 = _mm512_mul_pd(t2, twotothe256_MIC);
+ _mm512_store_pd(&v3[8], t2);*/
+
+#pragma vector aligned nontemporal
+ for(int l = 0; l < span; l++)
+ v3[l] *= twotothe256;
+
+ addScale += wgt[i];
+ }
+ } // site loop
+ }
+ break;
+ case INNER_INNER:
+ {
+ for (int i = 0; i < n; i++)
+ {
+ _mm_prefetch((const char *) &x1[span*(i+8)], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x1[span*(i+8) + 8], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2[span*(i+8)], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2[span*(i+8) + 8], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x3[span*(i+8)], _MM_HINT_ET1);
+ _mm_prefetch((const char *) &x3[span*(i+8) + 8], _MM_HINT_ET1);
+
+ _mm_prefetch((const char *) &x1[span*(i+1)], _MM_HINT_T0);
+ _mm_prefetch((const char *) &x1[span*(i+1) + 8], _MM_HINT_T0);
+ _mm_prefetch((const char *) &x2[span*(i+1)], _MM_HINT_T0);
+ _mm_prefetch((const char *) &x2[span*(i+1) + 8], _MM_HINT_T0);
+ _mm_prefetch((const char *) &x3[span*(i+1)], _MM_HINT_ET0);
+ _mm_prefetch((const char *) &x3[span*(i+1) + 8], _MM_HINT_ET0);
+
+ double uX1[16] __attribute__((align(64)));
+ double uX2[16] __attribute__((align(64)));
+ double uX[16] __attribute__((align(64)));
+
+ const double* v1 = &(x1[span * i]);
+ const double* v2 = &(x2[span * i]);
+
+ mic_fma4x16(v1, uX1, aLeft);
+ mic_fma4x16(v2, uX2, aRight);
+
+ #pragma ivdep
+ #pragma vector aligned
+ for(int l = 0; l < 16; ++l)
+ {
+ uX[l] = uX1[l] * uX2[l];
+ }
+
+ double* v3 = &(x3[span * i]);
+
+ mic_fma4x16(uX, v3, aEV);
+
+ __m512d t1 = _mm512_load_pd(&v3[0]);
+ t1 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t1), absMask_MIC));
+ double vmax1 = _mm512_reduce_gmax_pd(t1);
+ __m512d t2 = _mm512_load_pd(&v3[8]);
+ t2 = _mm512_castsi512_pd(_mm512_and_epi64(_mm512_castpd_si512(t2), absMask_MIC));
+ double vmax2 = _mm512_reduce_gmax_pd(t2);
+
+ if(vmax1 < minlikelihood && vmax2 < minlikelihood)
+ {
+ /* t1 = _mm512_mul_pd(t1, twotothe256_MIC);
+ _mm512_store_pd(&v3[0], t1);
+ t2 = _mm512_mul_pd(t2, twotothe256_MIC);
+ _mm512_store_pd(&v3[8], t2);
+ */
+
+#pragma vector aligned nontemporal
+ for(int l = 0; l < span; l++)
+ v3[l] *= twotothe256;
+
+ addScale += wgt[i];
+ }
+ }
+ } break;
+ default:
+// assert(0);
+ break;
+ }
+
+ *scalerIncrement = addScale;
+
+}
+
+double evaluateGAMMA_MIC(int *wgt, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, const int n, double *diagptable)
+{
+ double sum = 0.0;
+
+ /* the left node is a tip */
+ if(tipX1)
+ {
+ double
+ *aTipVec = tipVector;
+
+ /* loop over the sites of this partition */
+ for (int i = 0; i < n; i++)
+ {
+ /* access pre-computed tip vector values via a lookup table */
+ const double *x1 = &(aTipVec[16 * tipX1[i]]);
+ /* access the other(inner) node at the other end of the branch */
+ const double *x2 = &(x2_start[span * i]);
+
+ double term = 0.;
+
+ #pragma ivdep
+ #pragma vector aligned
+ for(int j = 0; j < 16; j++)
+ term += x1[j] * x2[j] * diagptable[j];
+
+ term = log(0.25 * fabs(term));
+
+ sum += wgt[i] * term;
+ }
+ }
+ else
+ {
+ for (int i = 0; i < n; i++)
+ {
+ _mm_prefetch((const char *) &x1_start[span*(i+8)], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x1_start[span*(i+8) + 8], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+8)], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+8) + 8], _MM_HINT_T1);
+
+ _mm_prefetch((const char *) &x1_start[span*(i+1)], _MM_HINT_T0);
+ _mm_prefetch((const char *) &x1_start[span*(i+1) + 8], _MM_HINT_T0);
+ _mm_prefetch((const char *) &x2_start[span*(i+1)], _MM_HINT_T0);
+ _mm_prefetch((const char *) &x2_start[span*(i+1) + 8], _MM_HINT_T0);
+
+ const double *x1 = &(x1_start[span * i]);
+ const double *x2 = &(x2_start[span * i]);
+
+ double term = 0.;
+
+ #pragma ivdep
+ #pragma vector aligned
+ for(int j = 0; j < 16; j++)
+ term += x1[j] * x2[j] * diagptable[j];
+
+ term = log(0.25 * fabs(term));
+
+ sum += wgt[i] * term;
+ }
+ }
+
+ return sum;
+}
+
+void sumGAMMA_MIC(int tipCase, double *sumtable, double *x1_start, double *x2_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2, int n)
+{
+ const double
+ *aTipVec = tipVector;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ #pragma unroll(8)
+ for(int i = 0; i < n; i++)
+ {
+ const double *left = &(aTipVec[16 * tipX1[i]]);
+ const double *right = &(aTipVec[16 * tipX2[i]]);
+
+ #pragma ivdep
+ #pragma vector aligned nontemporal
+ for(int l = 0; l < 16; l++)
+ {
+ sumtable[i * span + l] = left[l] * right[l];
+ }
+ }
+ } break;
+ case TIP_INNER:
+ {
+ #pragma unroll(8)
+ for(int i = 0; i < n; i++)
+ {
+ _mm_prefetch((const char *) &x2_start[span*(i+32)], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+32) + 8], _MM_HINT_T1);
+
+ _mm_prefetch((const char *) &x2_start[span*(i+8)], _MM_HINT_T0);
+ _mm_prefetch((const char *) &x2_start[span*(i+8) + 8], _MM_HINT_T0);
+
+ const double *left = &(aTipVec[16 * tipX1[i]]);
+ const double *right = &(x2_start[span * i]);
+
+ #pragma ivdep
+ #pragma vector aligned nontemporal
+ for(int l = 0; l < 16; l++)
+ {
+ sumtable[i * span + l] = left[l] * right[l];
+ }
+ }
+ } break;
+ case INNER_INNER:
+ {
+ #pragma unroll(8)
+ for(int i = 0; i < n; i++)
+ {
+ _mm_prefetch((const char *) &x1_start[span*(i+16)], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x1_start[span*(i+16) + 8], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+16)], _MM_HINT_T1);
+ _mm_prefetch((const char *) &x2_start[span*(i+16) + 8], _MM_HINT_T1);
+
+ _mm_prefetch((const char *) &x1_start[span*(i+4)], _MM_HINT_T0);
+ _mm_prefetch((const char *) &x1_start[span*(i+4) + 8], _MM_HINT_T0);
+ _mm_prefetch((const char *) &x2_start[span*(i+4)], _MM_HINT_T0);
+ _mm_prefetch((const char *) &x2_start[span*(i+4) + 8], _MM_HINT_T0);
+
+ const double *left = &(x1_start[span * i]);
+ const double *right = &(x2_start[span * i]);
+
+ #pragma ivdep
+ #pragma vector aligned nontemporal
+ for(int l = 0; l < 16; l++)
+ {
+ sumtable[i * span + l] = left[l] * right[l];
+ }
+ }
+ } break;
+ // default:
+ // assert(0);
+ }
+}
+
+void coreGTRGAMMA_MIC(const int upper, double *sumtable,
+ volatile double *ext_dlnLdlz, volatile double *ext_d2lnLdlz2, double *EIGN, double *gammaRates, double lz, int *wgt)
+{
+ double diagptable0[16] __attribute__((align(64)));
+ double diagptable1[16] __attribute__((align(64)));
+ double diagptable2[16] __attribute__((align(64)));
+ double diagptable01[16] __attribute__((align(64)));
+ double diagptable02[16] __attribute__((align(64)));
+
+ /* pre-compute the derivatives of the P matrix for all discrete GAMMA rates */
+
+ for(int i = 0; i < 4; i++)
+ {
+ const double ki = gammaRates[i];
+ const double kisqr = ki * ki;
+
+ diagptable0[i*4] = 1.;
+ diagptable1[i*4] = 0.;
+ diagptable2[i*4] = 0.;
+
+ for(int l = 1; l < 4; l++)
+ {
+ diagptable0[i * 4 + l] = exp(EIGN[l] * ki * lz);
+ diagptable1[i * 4 + l] = EIGN[l] * ki;
+ diagptable2[i * 4 + l] = EIGN[l] * EIGN[l] * kisqr;
+ }
+ }
+
+ #pragma ivdep
+ for(int i = 0; i < 16; i++)
+ {
+ diagptable01[i] = diagptable0[i] * diagptable1[i];
+ diagptable02[i] = diagptable0[i] * diagptable2[i];
+ }
+
+ /* loop over sites in this partition */
+
+ const int aligned_width = upper % 8 == 0 ? upper / 8 : upper / 8 + 1;
+
+ double dlnLBuf[8] __attribute__((align(64)));
+ double d2lnLBuf[8] __attribute__((align(64)));
+ for (int j = 0; j < 8; ++j)
+ {
+ dlnLBuf[j] = 0.;
+ d2lnLBuf[j] = 0.;
+ }
+
+ __mmask16 k1 = _mm512_int2mask(0x000000FF);
+
+ for (int i = 0; i < aligned_width; i++)
+ {
+ _mm_prefetch((const char *) &sumtable[i * span * 8], _MM_HINT_T0);
+ _mm_prefetch((const char *) &sumtable[i * span * 8 + 8], _MM_HINT_T0);
+
+ /* access the array with pre-computed values */
+ const double *sum = &sumtable[i * span * 8];
+
+ /* initial per-site likelihood and 1st and 2nd derivatives */
+
+ double invBuf[8] __attribute__((align(64)));
+ double d1Buf[8] __attribute__((align(64)));
+ double d2Buf[8] __attribute__((align(64)));
+
+ __m512d invVec;
+ __m512d d1Vec;
+ __m512d d2Vec;
+ int mask = 0x01;
+
+ #pragma noprefetch sum
+ #pragma unroll(8)
+ for(int j = 0; j < 8; j++)
+ {
+ _mm_prefetch((const char *) &sum[span*(j+8)], _MM_HINT_T1);
+ _mm_prefetch((const char *) &sum[span*(j+8) + 8], _MM_HINT_T1);
+
+ _mm_prefetch((const char *) &sum[span*(j+1)], _MM_HINT_T0);
+ _mm_prefetch((const char *) &sum[span*(j+1) + 8], _MM_HINT_T0);
+
+ __m512d d0_1 = _mm512_load_pd(&diagptable0[0]);
+ __m512d d0_2 = _mm512_load_pd(&diagptable0[8]);
+
+ __m512d d01_1 = _mm512_load_pd(&diagptable01[0]);
+ __m512d d01_2 = _mm512_load_pd(&diagptable01[8]);
+
+ __m512d d02_1 = _mm512_load_pd(&diagptable02[0]);
+ __m512d d02_2 = _mm512_load_pd(&diagptable02[8]);
+
+ __m512d s_1 = _mm512_load_pd(&sum[j*16]);
+ __m512d s_2 = _mm512_load_pd(&sum[j*16 + 8]);
+ __m512d inv_1 = _mm512_mul_pd(d0_1, s_1);
+ __m512d d1_1 = _mm512_mul_pd(d01_1, s_1);
+ __m512d d2_1 = _mm512_mul_pd(d02_1, s_1);
+
+ __m512d inv_2 = _mm512_fmadd_pd(d0_2, s_2, inv_1);
+ __m512d d1_2 = _mm512_fmadd_pd(d01_2, s_2, d1_1);
+ __m512d d2_2 = _mm512_fmadd_pd(d02_2, s_2, d2_1);
+
+ __mmask8 k1 = _mm512_int2mask(mask);
+ mask <<= 1;
+
+ // reduce
+ inv_2 = _mm512_add_pd (inv_2, _mm512_swizzle_pd(inv_2, _MM_SWIZ_REG_CDAB));
+ inv_2 = _mm512_add_pd (inv_2, _mm512_swizzle_pd(inv_2, _MM_SWIZ_REG_BADC));
+ inv_2 = _mm512_add_pd (inv_2, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(inv_2), _MM_PERM_BADC)));
+ invVec = _mm512_mask_mov_pd(invVec, k1, inv_2);
+
+ d1_2 = _mm512_add_pd (d1_2, _mm512_swizzle_pd(d1_2, _MM_SWIZ_REG_CDAB));
+ d1_2 = _mm512_add_pd (d1_2, _mm512_swizzle_pd(d1_2, _MM_SWIZ_REG_BADC));
+ d1_2 = _mm512_add_pd (d1_2, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d1_2), _MM_PERM_BADC)));
+ d1Vec = _mm512_mask_mov_pd(d1Vec, k1, d1_2);
+
+ d2_2 = _mm512_add_pd (d2_2, _mm512_swizzle_pd(d2_2, _MM_SWIZ_REG_CDAB));
+ d2_2 = _mm512_add_pd (d2_2, _mm512_swizzle_pd(d2_2, _MM_SWIZ_REG_BADC));
+ d2_2 = _mm512_add_pd (d2_2, _mm512_castsi512_pd(_mm512_permute4f128_epi32(_mm512_castpd_si512(d2_2), _MM_PERM_BADC)));
+ d2Vec = _mm512_mask_mov_pd(d2Vec, k1, d2_2);
+ }
+
+ _mm512_store_pd(&invBuf[0], invVec);
+ _mm512_store_pd(&d1Buf[0], d1Vec);
+ _mm512_store_pd(&d2Buf[0], d2Vec);
+
+ #pragma ivdep
+ #pragma vector aligned
+ for (int j = 0; j < 8; ++j)
+ {
+ const double inv_Li = 1.0 / invBuf[j];
+
+ const double d1 = d1Buf[j] * inv_Li;
+ const double d2 = d2Buf[j] * inv_Li;
+
+ dlnLBuf[j] += wgt[i * 8 + j] * d1;
+ d2lnLBuf[j] += wgt[i * 8 + j] * (d2 - d1 * d1);
+ }
+ } // site loop
+
+ double dlnLdlz = 0.;
+ double d2lnLdlz2 = 0.;
+ for (int j = 0; j < 8; ++j)
+ {
+ dlnLdlz += dlnLBuf[j];
+ d2lnLdlz2 += d2lnLBuf[j];
+ }
+
+ *ext_dlnLdlz = dlnLdlz;
+ *ext_d2lnLdlz2 = d2lnLdlz2;
+}
diff --git a/examl/models.c b/examl/models.c
new file mode 100644
index 0000000..02cd746
--- /dev/null
+++ b/examl/models.c
@@ -0,0 +1,4243 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands
+ * of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+
+#include "axml.h"
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+extern int optimizeRatesInvocations;
+extern int optimizeRateCategoryInvocations;
+extern int optimizeAlphaInvocations;
+extern int optimizeTTRatioInvocations;
+extern int optimizeInvarInvocations;
+
+extern const unsigned int bitVectorSecondary[256];
+extern const unsigned int bitVector32[33];
+extern const unsigned int bitVectorAA[23];
+extern const unsigned int bitVectorIdentity[256];
+
+extern const partitionLengths pLengths[MAX_MODEL];
+
+
+
+
+
+//extern FILE *byteFile;
+
+
+
+
+
+
+
+
+
+
+
+void putWAG(double *ext_initialRates)
+{
+ double
+ scaler,
+ q[20][20],
+ daa[400];
+
+ int
+ i,
+ j,
+ r;
+
+ daa[ 1*20+ 0] = 55.15710; daa[ 2*20+ 0] = 50.98480; daa[ 2*20+ 1] = 63.53460;
+ daa[ 3*20+ 0] = 73.89980; daa[ 3*20+ 1] = 14.73040; daa[ 3*20+ 2] = 542.94200;
+ daa[ 4*20+ 0] = 102.70400; daa[ 4*20+ 1] = 52.81910; daa[ 4*20+ 2] = 26.52560;
+ daa[ 4*20+ 3] = 3.02949; daa[ 5*20+ 0] = 90.85980; daa[ 5*20+ 1] = 303.55000;
+ daa[ 5*20+ 2] = 154.36400; daa[ 5*20+ 3] = 61.67830; daa[ 5*20+ 4] = 9.88179;
+ daa[ 6*20+ 0] = 158.28500; daa[ 6*20+ 1] = 43.91570; daa[ 6*20+ 2] = 94.71980;
+ daa[ 6*20+ 3] = 617.41600; daa[ 6*20+ 4] = 2.13520; daa[ 6*20+ 5] = 546.94700;
+ daa[ 7*20+ 0] = 141.67200; daa[ 7*20+ 1] = 58.46650; daa[ 7*20+ 2] = 112.55600;
+ daa[ 7*20+ 3] = 86.55840; daa[ 7*20+ 4] = 30.66740; daa[ 7*20+ 5] = 33.00520;
+ daa[ 7*20+ 6] = 56.77170; daa[ 8*20+ 0] = 31.69540; daa[ 8*20+ 1] = 213.71500;
+ daa[ 8*20+ 2] = 395.62900; daa[ 8*20+ 3] = 93.06760; daa[ 8*20+ 4] = 24.89720;
+ daa[ 8*20+ 5] = 429.41100; daa[ 8*20+ 6] = 57.00250; daa[ 8*20+ 7] = 24.94100;
+ daa[ 9*20+ 0] = 19.33350; daa[ 9*20+ 1] = 18.69790; daa[ 9*20+ 2] = 55.42360;
+ daa[ 9*20+ 3] = 3.94370; daa[ 9*20+ 4] = 17.01350; daa[ 9*20+ 5] = 11.39170;
+ daa[ 9*20+ 6] = 12.73950; daa[ 9*20+ 7] = 3.04501; daa[ 9*20+ 8] = 13.81900;
+ daa[10*20+ 0] = 39.79150; daa[10*20+ 1] = 49.76710; daa[10*20+ 2] = 13.15280;
+ daa[10*20+ 3] = 8.48047; daa[10*20+ 4] = 38.42870; daa[10*20+ 5] = 86.94890;
+ daa[10*20+ 6] = 15.42630; daa[10*20+ 7] = 6.13037; daa[10*20+ 8] = 49.94620;
+ daa[10*20+ 9] = 317.09700; daa[11*20+ 0] = 90.62650; daa[11*20+ 1] = 535.14200;
+ daa[11*20+ 2] = 301.20100; daa[11*20+ 3] = 47.98550; daa[11*20+ 4] = 7.40339;
+ daa[11*20+ 5] = 389.49000; daa[11*20+ 6] = 258.44300; daa[11*20+ 7] = 37.35580;
+ daa[11*20+ 8] = 89.04320; daa[11*20+ 9] = 32.38320; daa[11*20+10] = 25.75550;
+ daa[12*20+ 0] = 89.34960; daa[12*20+ 1] = 68.31620; daa[12*20+ 2] = 19.82210;
+ daa[12*20+ 3] = 10.37540; daa[12*20+ 4] = 39.04820; daa[12*20+ 5] = 154.52600;
+ daa[12*20+ 6] = 31.51240; daa[12*20+ 7] = 17.41000; daa[12*20+ 8] = 40.41410;
+ daa[12*20+ 9] = 425.74600; daa[12*20+10] = 485.40200; daa[12*20+11] = 93.42760;
+ daa[13*20+ 0] = 21.04940; daa[13*20+ 1] = 10.27110; daa[13*20+ 2] = 9.61621;
+ daa[13*20+ 3] = 4.67304; daa[13*20+ 4] = 39.80200; daa[13*20+ 5] = 9.99208;
+ daa[13*20+ 6] = 8.11339; daa[13*20+ 7] = 4.99310; daa[13*20+ 8] = 67.93710;
+ daa[13*20+ 9] = 105.94700; daa[13*20+10] = 211.51700; daa[13*20+11] = 8.88360;
+ daa[13*20+12] = 119.06300; daa[14*20+ 0] = 143.85500; daa[14*20+ 1] = 67.94890;
+ daa[14*20+ 2] = 19.50810; daa[14*20+ 3] = 42.39840; daa[14*20+ 4] = 10.94040;
+ daa[14*20+ 5] = 93.33720; daa[14*20+ 6] = 68.23550; daa[14*20+ 7] = 24.35700;
+ daa[14*20+ 8] = 69.61980; daa[14*20+ 9] = 9.99288; daa[14*20+10] = 41.58440;
+ daa[14*20+11] = 55.68960; daa[14*20+12] = 17.13290; daa[14*20+13] = 16.14440;
+ daa[15*20+ 0] = 337.07900; daa[15*20+ 1] = 122.41900; daa[15*20+ 2] = 397.42300;
+ daa[15*20+ 3] = 107.17600; daa[15*20+ 4] = 140.76600; daa[15*20+ 5] = 102.88700;
+ daa[15*20+ 6] = 70.49390; daa[15*20+ 7] = 134.18200; daa[15*20+ 8] = 74.01690;
+ daa[15*20+ 9] = 31.94400; daa[15*20+10] = 34.47390; daa[15*20+11] = 96.71300;
+ daa[15*20+12] = 49.39050; daa[15*20+13] = 54.59310; daa[15*20+14] = 161.32800;
+ daa[16*20+ 0] = 212.11100; daa[16*20+ 1] = 55.44130; daa[16*20+ 2] = 203.00600;
+ daa[16*20+ 3] = 37.48660; daa[16*20+ 4] = 51.29840; daa[16*20+ 5] = 85.79280;
+ daa[16*20+ 6] = 82.27650; daa[16*20+ 7] = 22.58330; daa[16*20+ 8] = 47.33070;
+ daa[16*20+ 9] = 145.81600; daa[16*20+10] = 32.66220; daa[16*20+11] = 138.69800;
+ daa[16*20+12] = 151.61200; daa[16*20+13] = 17.19030; daa[16*20+14] = 79.53840;
+ daa[16*20+15] = 437.80200; daa[17*20+ 0] = 11.31330; daa[17*20+ 1] = 116.39200;
+ daa[17*20+ 2] = 7.19167; daa[17*20+ 3] = 12.97670; daa[17*20+ 4] = 71.70700;
+ daa[17*20+ 5] = 21.57370; daa[17*20+ 6] = 15.65570; daa[17*20+ 7] = 33.69830;
+ daa[17*20+ 8] = 26.25690; daa[17*20+ 9] = 21.24830; daa[17*20+10] = 66.53090;
+ daa[17*20+11] = 13.75050; daa[17*20+12] = 51.57060; daa[17*20+13] = 152.96400;
+ daa[17*20+14] = 13.94050; daa[17*20+15] = 52.37420; daa[17*20+16] = 11.08640;
+ daa[18*20+ 0] = 24.07350; daa[18*20+ 1] = 38.15330; daa[18*20+ 2] = 108.60000;
+ daa[18*20+ 3] = 32.57110; daa[18*20+ 4] = 54.38330; daa[18*20+ 5] = 22.77100;
+ daa[18*20+ 6] = 19.63030; daa[18*20+ 7] = 10.36040; daa[18*20+ 8] = 387.34400;
+ daa[18*20+ 9] = 42.01700; daa[18*20+10] = 39.86180; daa[18*20+11] = 13.32640;
+ daa[18*20+12] = 42.84370; daa[18*20+13] = 645.42800; daa[18*20+14] = 21.60460;
+ daa[18*20+15] = 78.69930; daa[18*20+16] = 29.11480; daa[18*20+17] = 248.53900;
+ daa[19*20+ 0] = 200.60100; daa[19*20+ 1] = 25.18490; daa[19*20+ 2] = 19.62460;
+ daa[19*20+ 3] = 15.23350; daa[19*20+ 4] = 100.21400; daa[19*20+ 5] = 30.12810;
+ daa[19*20+ 6] = 58.87310; daa[19*20+ 7] = 18.72470; daa[19*20+ 8] = 11.83580;
+ daa[19*20+ 9] = 782.13000; daa[19*20+10] = 180.03400; daa[19*20+11] = 30.54340;
+ daa[19*20+12] = 205.84500; daa[19*20+13] = 64.98920; daa[19*20+14] = 31.48870;
+ daa[19*20+15] = 23.27390; daa[19*20+16] = 138.82300; daa[19*20+17] = 36.53690;
+ daa[19*20+18] = 31.47300;
+
+ for(i = 0; i < 20; i++)
+ for(j = 0; j < 20; j++)
+ q[i][j] = 0.0;
+
+ for (i=0; i<20; i++)
+ for (j=0; j<i; j++)
+ daa[j*20+i] = daa[i*20+j];
+
+ for(i = 0; i < 19; i++)
+ for(j = i + 1; j < 20; j++)
+ q[i][j] = daa[i * 20 + j];
+
+
+ /*
+ for (i=0; i<20; i++)
+ {
+ for (j=0; j<20; j++)
+ printf("%1.2f ", q[i][j]);
+ printf("\n");
+ }
+ printf("\n");
+
+ printf("%f\n", q[18][19]);
+ */
+
+ scaler = 1.0 / q[18][19];
+
+
+
+ for(i = 0; i < 19; i++)
+ for(j = i + 1; j < 20; j++)
+ q[i][j] *= scaler;
+
+ for(i = 0, r = 0; i < 19; i++)
+ for(j = i + 1; j < 20; j++)
+ ext_initialRates[r++] = q[i][j];
+
+ /*
+ for (i=0; i<20; i++)
+ {
+ for (j=0; j<20; j++)
+ printf("%1.2f ", q[i][j]);
+ printf("\n");
+ }
+ printf("\n");
+ */
+
+}
+
+static void makeAASubstMat(double *daa, double *f, double *rates, double *freqs)
+{
+ int
+ i, j, r = 0;
+
+ for(i = 1; i < 20; i++)
+ for(j = 0; j < i; j++)
+ {
+ daa[i * 20 + j] = rates[r];
+ r++;
+ }
+
+ assert(r == 190);
+
+ for(i = 0; i < 20; i++)
+ f[i] = freqs[i];
+}
+
+static void initProtMat(double f[20], int proteinMatrix, double *ext_initialRates, int lg4_index)
+{
+ double q[20][20];
+ double daa[400], max, temp;
+ int i, j, r;
+ double *initialRates = ext_initialRates;
+ double scaler;
+
+ {
+ switch(proteinMatrix)
+ {
+ case DAYHOFF:
+ {
+ daa[ 1*20+ 0] = 27.00; daa[ 2*20+ 0] = 98.00; daa[ 2*20+ 1] = 32.00; daa[ 3*20+ 0] = 120.00;
+ daa[ 3*20+ 1] = 0.00; daa[ 3*20+ 2] = 905.00; daa[ 4*20+ 0] = 36.00; daa[ 4*20+ 1] = 23.00;
+ daa[ 4*20+ 2] = 0.00; daa[ 4*20+ 3] = 0.00; daa[ 5*20+ 0] = 89.00; daa[ 5*20+ 1] = 246.00;
+ daa[ 5*20+ 2] = 103.00; daa[ 5*20+ 3] = 134.00; daa[ 5*20+ 4] = 0.00; daa[ 6*20+ 0] = 198.00;
+ daa[ 6*20+ 1] = 1.00; daa[ 6*20+ 2] = 148.00; daa[ 6*20+ 3] = 1153.00; daa[ 6*20+ 4] = 0.00;
+ daa[ 6*20+ 5] = 716.00; daa[ 7*20+ 0] = 240.00; daa[ 7*20+ 1] = 9.00; daa[ 7*20+ 2] = 139.00;
+ daa[ 7*20+ 3] = 125.00; daa[ 7*20+ 4] = 11.00; daa[ 7*20+ 5] = 28.00; daa[ 7*20+ 6] = 81.00;
+ daa[ 8*20+ 0] = 23.00; daa[ 8*20+ 1] = 240.00; daa[ 8*20+ 2] = 535.00; daa[ 8*20+ 3] = 86.00;
+ daa[ 8*20+ 4] = 28.00; daa[ 8*20+ 5] = 606.00; daa[ 8*20+ 6] = 43.00; daa[ 8*20+ 7] = 10.00;
+ daa[ 9*20+ 0] = 65.00; daa[ 9*20+ 1] = 64.00; daa[ 9*20+ 2] = 77.00; daa[ 9*20+ 3] = 24.00;
+ daa[ 9*20+ 4] = 44.00; daa[ 9*20+ 5] = 18.00; daa[ 9*20+ 6] = 61.00; daa[ 9*20+ 7] = 0.00;
+ daa[ 9*20+ 8] = 7.00; daa[10*20+ 0] = 41.00; daa[10*20+ 1] = 15.00; daa[10*20+ 2] = 34.00;
+ daa[10*20+ 3] = 0.00; daa[10*20+ 4] = 0.00; daa[10*20+ 5] = 73.00; daa[10*20+ 6] = 11.00;
+ daa[10*20+ 7] = 7.00; daa[10*20+ 8] = 44.00; daa[10*20+ 9] = 257.00; daa[11*20+ 0] = 26.00;
+ daa[11*20+ 1] = 464.00; daa[11*20+ 2] = 318.00; daa[11*20+ 3] = 71.00; daa[11*20+ 4] = 0.00;
+ daa[11*20+ 5] = 153.00; daa[11*20+ 6] = 83.00; daa[11*20+ 7] = 27.00; daa[11*20+ 8] = 26.00;
+ daa[11*20+ 9] = 46.00; daa[11*20+10] = 18.00; daa[12*20+ 0] = 72.00; daa[12*20+ 1] = 90.00;
+ daa[12*20+ 2] = 1.00; daa[12*20+ 3] = 0.00; daa[12*20+ 4] = 0.00; daa[12*20+ 5] = 114.00;
+ daa[12*20+ 6] = 30.00; daa[12*20+ 7] = 17.00; daa[12*20+ 8] = 0.00; daa[12*20+ 9] = 336.00;
+ daa[12*20+10] = 527.00; daa[12*20+11] = 243.00; daa[13*20+ 0] = 18.00; daa[13*20+ 1] = 14.00;
+ daa[13*20+ 2] = 14.00; daa[13*20+ 3] = 0.00; daa[13*20+ 4] = 0.00; daa[13*20+ 5] = 0.00;
+ daa[13*20+ 6] = 0.00; daa[13*20+ 7] = 15.00; daa[13*20+ 8] = 48.00; daa[13*20+ 9] = 196.00;
+ daa[13*20+10] = 157.00; daa[13*20+11] = 0.00; daa[13*20+12] = 92.00; daa[14*20+ 0] = 250.00;
+ daa[14*20+ 1] = 103.00; daa[14*20+ 2] = 42.00; daa[14*20+ 3] = 13.00; daa[14*20+ 4] = 19.00;
+ daa[14*20+ 5] = 153.00; daa[14*20+ 6] = 51.00; daa[14*20+ 7] = 34.00; daa[14*20+ 8] = 94.00;
+ daa[14*20+ 9] = 12.00; daa[14*20+10] = 32.00; daa[14*20+11] = 33.00; daa[14*20+12] = 17.00;
+ daa[14*20+13] = 11.00; daa[15*20+ 0] = 409.00; daa[15*20+ 1] = 154.00; daa[15*20+ 2] = 495.00;
+ daa[15*20+ 3] = 95.00; daa[15*20+ 4] = 161.00; daa[15*20+ 5] = 56.00; daa[15*20+ 6] = 79.00;
+ daa[15*20+ 7] = 234.00; daa[15*20+ 8] = 35.00; daa[15*20+ 9] = 24.00; daa[15*20+10] = 17.00;
+ daa[15*20+11] = 96.00; daa[15*20+12] = 62.00; daa[15*20+13] = 46.00; daa[15*20+14] = 245.00;
+ daa[16*20+ 0] = 371.00; daa[16*20+ 1] = 26.00; daa[16*20+ 2] = 229.00; daa[16*20+ 3] = 66.00;
+ daa[16*20+ 4] = 16.00; daa[16*20+ 5] = 53.00; daa[16*20+ 6] = 34.00; daa[16*20+ 7] = 30.00;
+ daa[16*20+ 8] = 22.00; daa[16*20+ 9] = 192.00; daa[16*20+10] = 33.00; daa[16*20+11] = 136.00;
+ daa[16*20+12] = 104.00; daa[16*20+13] = 13.00; daa[16*20+14] = 78.00; daa[16*20+15] = 550.00;
+ daa[17*20+ 0] = 0.00; daa[17*20+ 1] = 201.00; daa[17*20+ 2] = 23.00; daa[17*20+ 3] = 0.00;
+ daa[17*20+ 4] = 0.00; daa[17*20+ 5] = 0.00; daa[17*20+ 6] = 0.00; daa[17*20+ 7] = 0.00;
+ daa[17*20+ 8] = 27.00; daa[17*20+ 9] = 0.00; daa[17*20+10] = 46.00; daa[17*20+11] = 0.00;
+ daa[17*20+12] = 0.00; daa[17*20+13] = 76.00; daa[17*20+14] = 0.00; daa[17*20+15] = 75.00;
+ daa[17*20+16] = 0.00; daa[18*20+ 0] = 24.00; daa[18*20+ 1] = 8.00; daa[18*20+ 2] = 95.00;
+ daa[18*20+ 3] = 0.00; daa[18*20+ 4] = 96.00; daa[18*20+ 5] = 0.00; daa[18*20+ 6] = 22.00;
+ daa[18*20+ 7] = 0.00; daa[18*20+ 8] = 127.00; daa[18*20+ 9] = 37.00; daa[18*20+10] = 28.00;
+ daa[18*20+11] = 13.00; daa[18*20+12] = 0.00; daa[18*20+13] = 698.00; daa[18*20+14] = 0.00;
+ daa[18*20+15] = 34.00; daa[18*20+16] = 42.00; daa[18*20+17] = 61.00; daa[19*20+ 0] = 208.00;
+ daa[19*20+ 1] = 24.00; daa[19*20+ 2] = 15.00; daa[19*20+ 3] = 18.00; daa[19*20+ 4] = 49.00;
+ daa[19*20+ 5] = 35.00; daa[19*20+ 6] = 37.00; daa[19*20+ 7] = 54.00; daa[19*20+ 8] = 44.00;
+ daa[19*20+ 9] = 889.00; daa[19*20+10] = 175.00; daa[19*20+11] = 10.00; daa[19*20+12] = 258.00;
+ daa[19*20+13] = 12.00; daa[19*20+14] = 48.00; daa[19*20+15] = 30.00; daa[19*20+16] = 157.00;
+ daa[19*20+17] = 0.00; daa[19*20+18] = 28.00;
+
+
+ /*f[ 0] = 0.087000; f[ 1] = 0.041000; f[ 2] = 0.040000; f[ 3] = 0.047000;
+ f[ 4] = 0.034000; f[ 5] = 0.038000; f[ 6] = 0.050000; f[ 7] = 0.089000;
+ f[ 8] = 0.034000; f[ 9] = 0.037000; f[10] = 0.085000; f[11] = 0.080000;
+ f[12] = 0.014000; f[13] = 0.040000; f[14] = 0.051000; f[15] = 0.070000;
+ f[16] = 0.058000; f[17] = 0.011000; f[18] = 0.030000; f[19] = 0.064000;*/
+
+ f[ 0] = 0.087127; f[ 1] = 0.040904; f[ 2] = 0.040432; f[ 3] = 0.046872;
+ f[ 4] = 0.033474; f[ 5] = 0.038255; f[ 6] = 0.049530; f[ 7] = 0.088612;
+ f[ 8] = 0.033618; f[ 9] = 0.036886; f[10] = 0.085357; f[11] = 0.080482;
+ f[12] = 0.014753; f[13] = 0.039772; f[14] = 0.050680; f[15] = 0.069577;
+ f[16] = 0.058542; f[17] = 0.010494; f[18] = 0.029916; f[19] = 0.064717;
+ }
+ break;
+ case DCMUT:
+ {
+ daa[ 1*20+ 0] = 26.78280; daa[ 2*20+ 0] = 98.44740; daa[ 2*20+ 1] = 32.70590; daa[ 3*20+ 0] = 119.98050;
+ daa[ 3*20+ 1] = 0.00000; daa[ 3*20+ 2] = 893.15150; daa[ 4*20+ 0] = 36.00160; daa[ 4*20+ 1] = 23.23740;
+ daa[ 4*20+ 2] = 0.00000; daa[ 4*20+ 3] = 0.00000; daa[ 5*20+ 0] = 88.77530; daa[ 5*20+ 1] = 243.99390;
+ daa[ 5*20+ 2] = 102.85090; daa[ 5*20+ 3] = 134.85510; daa[ 5*20+ 4] = 0.00000; daa[ 6*20+ 0] = 196.11670;
+ daa[ 6*20+ 1] = 0.00000; daa[ 6*20+ 2] = 149.34090; daa[ 6*20+ 3] = 1138.86590; daa[ 6*20+ 4] = 0.00000;
+ daa[ 6*20+ 5] = 708.60220; daa[ 7*20+ 0] = 238.61110; daa[ 7*20+ 1] = 8.77910; daa[ 7*20+ 2] = 138.53520;
+ daa[ 7*20+ 3] = 124.09810; daa[ 7*20+ 4] = 10.72780; daa[ 7*20+ 5] = 28.15810; daa[ 7*20+ 6] = 81.19070;
+ daa[ 8*20+ 0] = 22.81160; daa[ 8*20+ 1] = 238.31480; daa[ 8*20+ 2] = 529.00240; daa[ 8*20+ 3] = 86.82410;
+ daa[ 8*20+ 4] = 28.27290; daa[ 8*20+ 5] = 601.16130; daa[ 8*20+ 6] = 43.94690; daa[ 8*20+ 7] = 10.68020;
+ daa[ 9*20+ 0] = 65.34160; daa[ 9*20+ 1] = 63.26290; daa[ 9*20+ 2] = 76.80240; daa[ 9*20+ 3] = 23.92480;
+ daa[ 9*20+ 4] = 43.80740; daa[ 9*20+ 5] = 18.03930; daa[ 9*20+ 6] = 60.95260; daa[ 9*20+ 7] = 0.00000;
+ daa[ 9*20+ 8] = 7.69810; daa[10*20+ 0] = 40.64310; daa[10*20+ 1] = 15.49240; daa[10*20+ 2] = 34.11130;
+ daa[10*20+ 3] = 0.00000; daa[10*20+ 4] = 0.00000; daa[10*20+ 5] = 73.07720; daa[10*20+ 6] = 11.28800;
+ daa[10*20+ 7] = 7.15140; daa[10*20+ 8] = 44.35040; daa[10*20+ 9] = 255.66850; daa[11*20+ 0] = 25.86350;
+ daa[11*20+ 1] = 461.01240; daa[11*20+ 2] = 314.83710; daa[11*20+ 3] = 71.69130; daa[11*20+ 4] = 0.00000;
+ daa[11*20+ 5] = 151.90780; daa[11*20+ 6] = 83.00780; daa[11*20+ 7] = 26.76830; daa[11*20+ 8] = 27.04750;
+ daa[11*20+ 9] = 46.08570; daa[11*20+10] = 18.06290; daa[12*20+ 0] = 71.78400; daa[12*20+ 1] = 89.63210;
+ daa[12*20+ 2] = 0.00000; daa[12*20+ 3] = 0.00000; daa[12*20+ 4] = 0.00000; daa[12*20+ 5] = 112.74990;
+ daa[12*20+ 6] = 30.48030; daa[12*20+ 7] = 17.03720; daa[12*20+ 8] = 0.00000; daa[12*20+ 9] = 333.27320;
+ daa[12*20+10] = 523.01150; daa[12*20+11] = 241.17390; daa[13*20+ 0] = 18.36410; daa[13*20+ 1] = 13.69060;
+ daa[13*20+ 2] = 13.85030; daa[13*20+ 3] = 0.00000; daa[13*20+ 4] = 0.00000; daa[13*20+ 5] = 0.00000;
+ daa[13*20+ 6] = 0.00000; daa[13*20+ 7] = 15.34780; daa[13*20+ 8] = 47.59270; daa[13*20+ 9] = 195.19510;
+ daa[13*20+10] = 156.51600; daa[13*20+11] = 0.00000; daa[13*20+12] = 92.18600; daa[14*20+ 0] = 248.59200;
+ daa[14*20+ 1] = 102.83130; daa[14*20+ 2] = 41.92440; daa[14*20+ 3] = 13.39400; daa[14*20+ 4] = 18.75500;
+ daa[14*20+ 5] = 152.61880; daa[14*20+ 6] = 50.70030; daa[14*20+ 7] = 34.71530; daa[14*20+ 8] = 93.37090;
+ daa[14*20+ 9] = 11.91520; daa[14*20+10] = 31.62580; daa[14*20+11] = 33.54190; daa[14*20+12] = 17.02050;
+ daa[14*20+13] = 11.05060; daa[15*20+ 0] = 405.18700; daa[15*20+ 1] = 153.15900; daa[15*20+ 2] = 488.58920;
+ daa[15*20+ 3] = 95.60970; daa[15*20+ 4] = 159.83560; daa[15*20+ 5] = 56.18280; daa[15*20+ 6] = 79.39990;
+ daa[15*20+ 7] = 232.22430; daa[15*20+ 8] = 35.36430; daa[15*20+ 9] = 24.79550; daa[15*20+10] = 17.14320;
+ daa[15*20+11] = 95.45570; daa[15*20+12] = 61.99510; daa[15*20+13] = 45.99010; daa[15*20+14] = 242.72020;
+ daa[16*20+ 0] = 368.03650; daa[16*20+ 1] = 26.57450; daa[16*20+ 2] = 227.16970; daa[16*20+ 3] = 66.09300;
+ daa[16*20+ 4] = 16.23660; daa[16*20+ 5] = 52.56510; daa[16*20+ 6] = 34.01560; daa[16*20+ 7] = 30.66620;
+ daa[16*20+ 8] = 22.63330; daa[16*20+ 9] = 190.07390; daa[16*20+10] = 33.10900; daa[16*20+11] = 135.05990;
+ daa[16*20+12] = 103.15340; daa[16*20+13] = 13.66550; daa[16*20+14] = 78.28570; daa[16*20+15] = 543.66740;
+ daa[17*20+ 0] = 0.00000; daa[17*20+ 1] = 200.13750; daa[17*20+ 2] = 22.49680; daa[17*20+ 3] = 0.00000;
+ daa[17*20+ 4] = 0.00000; daa[17*20+ 5] = 0.00000; daa[17*20+ 6] = 0.00000; daa[17*20+ 7] = 0.00000;
+ daa[17*20+ 8] = 27.05640; daa[17*20+ 9] = 0.00000; daa[17*20+10] = 46.17760; daa[17*20+11] = 0.00000;
+ daa[17*20+12] = 0.00000; daa[17*20+13] = 76.23540; daa[17*20+14] = 0.00000; daa[17*20+15] = 74.08190;
+ daa[17*20+16] = 0.00000; daa[18*20+ 0] = 24.41390; daa[18*20+ 1] = 7.80120; daa[18*20+ 2] = 94.69400;
+ daa[18*20+ 3] = 0.00000; daa[18*20+ 4] = 95.31640; daa[18*20+ 5] = 0.00000; daa[18*20+ 6] = 21.47170;
+ daa[18*20+ 7] = 0.00000; daa[18*20+ 8] = 126.54000; daa[18*20+ 9] = 37.48340; daa[18*20+10] = 28.65720;
+ daa[18*20+11] = 13.21420; daa[18*20+12] = 0.00000; daa[18*20+13] = 695.26290; daa[18*20+14] = 0.00000;
+ daa[18*20+15] = 33.62890; daa[18*20+16] = 41.78390; daa[18*20+17] = 60.80700; daa[19*20+ 0] = 205.95640;
+ daa[19*20+ 1] = 24.03680; daa[19*20+ 2] = 15.80670; daa[19*20+ 3] = 17.83160; daa[19*20+ 4] = 48.46780;
+ daa[19*20+ 5] = 34.69830; daa[19*20+ 6] = 36.72500; daa[19*20+ 7] = 53.81650; daa[19*20+ 8] = 43.87150;
+ daa[19*20+ 9] = 881.00380; daa[19*20+10] = 174.51560; daa[19*20+11] = 10.38500; daa[19*20+12] = 256.59550;
+ daa[19*20+13] = 12.36060; daa[19*20+14] = 48.50260; daa[19*20+15] = 30.38360; daa[19*20+16] = 156.19970;
+ daa[19*20+17] = 0.00000; daa[19*20+18] = 27.93790;
+
+ /* f[ 0] = 0.08700; f[ 1] = 0.04100; f[ 2] = 0.04000; f[ 3] = 0.04700;
+ f[ 4] = 0.03300; f[ 5] = 0.03800; f[ 6] = 0.04900; f[ 7] = 0.08900;
+ f[ 8] = 0.03400; f[ 9] = 0.03700; f[10] = 0.08500; f[11] = 0.08000;
+ f[12] = 0.01500; f[13] = 0.04000; f[14] = 0.05200; f[15] = 0.06900;
+ f[16] = 0.05900; f[17] = 0.01000; f[18] = 0.03000; f[19] = 0.06500;*/
+
+ f[ 0] = 0.087127; f[ 1] = 0.040904; f[ 2] = 0.040432; f[ 3] = 0.046872;
+ f[ 4] = 0.033474; f[ 5] = 0.038255; f[ 6] = 0.049530; f[ 7] = 0.088612;
+ f[ 8] = 0.033619; f[ 9] = 0.036886; f[10] = 0.085357; f[11] = 0.080481;
+ f[12] = 0.014753; f[13] = 0.039772; f[14] = 0.050680; f[15] = 0.069577;
+ f[16] = 0.058542; f[17] = 0.010494; f[18] = 0.029916; f[19] = 0.064717;
+
+ }
+ break;
+ case JTT:
+ {
+ daa[ 1*20+ 0] = 58.00; daa[ 2*20+ 0] = 54.00; daa[ 2*20+ 1] = 45.00; daa[ 3*20+ 0] = 81.00;
+ daa[ 3*20+ 1] = 16.00; daa[ 3*20+ 2] = 528.00; daa[ 4*20+ 0] = 56.00; daa[ 4*20+ 1] = 113.00;
+ daa[ 4*20+ 2] = 34.00; daa[ 4*20+ 3] = 10.00; daa[ 5*20+ 0] = 57.00; daa[ 5*20+ 1] = 310.00;
+ daa[ 5*20+ 2] = 86.00; daa[ 5*20+ 3] = 49.00; daa[ 5*20+ 4] = 9.00; daa[ 6*20+ 0] = 105.00;
+ daa[ 6*20+ 1] = 29.00; daa[ 6*20+ 2] = 58.00; daa[ 6*20+ 3] = 767.00; daa[ 6*20+ 4] = 5.00;
+ daa[ 6*20+ 5] = 323.00; daa[ 7*20+ 0] = 179.00; daa[ 7*20+ 1] = 137.00; daa[ 7*20+ 2] = 81.00;
+ daa[ 7*20+ 3] = 130.00; daa[ 7*20+ 4] = 59.00; daa[ 7*20+ 5] = 26.00; daa[ 7*20+ 6] = 119.00;
+ daa[ 8*20+ 0] = 27.00; daa[ 8*20+ 1] = 328.00; daa[ 8*20+ 2] = 391.00; daa[ 8*20+ 3] = 112.00;
+ daa[ 8*20+ 4] = 69.00; daa[ 8*20+ 5] = 597.00; daa[ 8*20+ 6] = 26.00; daa[ 8*20+ 7] = 23.00;
+ daa[ 9*20+ 0] = 36.00; daa[ 9*20+ 1] = 22.00; daa[ 9*20+ 2] = 47.00; daa[ 9*20+ 3] = 11.00;
+ daa[ 9*20+ 4] = 17.00; daa[ 9*20+ 5] = 9.00; daa[ 9*20+ 6] = 12.00; daa[ 9*20+ 7] = 6.00;
+ daa[ 9*20+ 8] = 16.00; daa[10*20+ 0] = 30.00; daa[10*20+ 1] = 38.00; daa[10*20+ 2] = 12.00;
+ daa[10*20+ 3] = 7.00; daa[10*20+ 4] = 23.00; daa[10*20+ 5] = 72.00; daa[10*20+ 6] = 9.00;
+ daa[10*20+ 7] = 6.00; daa[10*20+ 8] = 56.00; daa[10*20+ 9] = 229.00; daa[11*20+ 0] = 35.00;
+ daa[11*20+ 1] = 646.00; daa[11*20+ 2] = 263.00; daa[11*20+ 3] = 26.00; daa[11*20+ 4] = 7.00;
+ daa[11*20+ 5] = 292.00; daa[11*20+ 6] = 181.00; daa[11*20+ 7] = 27.00; daa[11*20+ 8] = 45.00;
+ daa[11*20+ 9] = 21.00; daa[11*20+10] = 14.00; daa[12*20+ 0] = 54.00; daa[12*20+ 1] = 44.00;
+ daa[12*20+ 2] = 30.00; daa[12*20+ 3] = 15.00; daa[12*20+ 4] = 31.00; daa[12*20+ 5] = 43.00;
+ daa[12*20+ 6] = 18.00; daa[12*20+ 7] = 14.00; daa[12*20+ 8] = 33.00; daa[12*20+ 9] = 479.00;
+ daa[12*20+10] = 388.00; daa[12*20+11] = 65.00; daa[13*20+ 0] = 15.00; daa[13*20+ 1] = 5.00;
+ daa[13*20+ 2] = 10.00; daa[13*20+ 3] = 4.00; daa[13*20+ 4] = 78.00; daa[13*20+ 5] = 4.00;
+ daa[13*20+ 6] = 5.00; daa[13*20+ 7] = 5.00; daa[13*20+ 8] = 40.00; daa[13*20+ 9] = 89.00;
+ daa[13*20+10] = 248.00; daa[13*20+11] = 4.00; daa[13*20+12] = 43.00; daa[14*20+ 0] = 194.00;
+ daa[14*20+ 1] = 74.00; daa[14*20+ 2] = 15.00; daa[14*20+ 3] = 15.00; daa[14*20+ 4] = 14.00;
+ daa[14*20+ 5] = 164.00; daa[14*20+ 6] = 18.00; daa[14*20+ 7] = 24.00; daa[14*20+ 8] = 115.00;
+ daa[14*20+ 9] = 10.00; daa[14*20+10] = 102.00; daa[14*20+11] = 21.00; daa[14*20+12] = 16.00;
+ daa[14*20+13] = 17.00; daa[15*20+ 0] = 378.00; daa[15*20+ 1] = 101.00; daa[15*20+ 2] = 503.00;
+ daa[15*20+ 3] = 59.00; daa[15*20+ 4] = 223.00; daa[15*20+ 5] = 53.00; daa[15*20+ 6] = 30.00;
+ daa[15*20+ 7] = 201.00; daa[15*20+ 8] = 73.00; daa[15*20+ 9] = 40.00; daa[15*20+10] = 59.00;
+ daa[15*20+11] = 47.00; daa[15*20+12] = 29.00; daa[15*20+13] = 92.00; daa[15*20+14] = 285.00;
+ daa[16*20+ 0] = 475.00; daa[16*20+ 1] = 64.00; daa[16*20+ 2] = 232.00; daa[16*20+ 3] = 38.00;
+ daa[16*20+ 4] = 42.00; daa[16*20+ 5] = 51.00; daa[16*20+ 6] = 32.00; daa[16*20+ 7] = 33.00;
+ daa[16*20+ 8] = 46.00; daa[16*20+ 9] = 245.00; daa[16*20+10] = 25.00; daa[16*20+11] = 103.00;
+ daa[16*20+12] = 226.00; daa[16*20+13] = 12.00; daa[16*20+14] = 118.00; daa[16*20+15] = 477.00;
+ daa[17*20+ 0] = 9.00; daa[17*20+ 1] = 126.00; daa[17*20+ 2] = 8.00; daa[17*20+ 3] = 4.00;
+ daa[17*20+ 4] = 115.00; daa[17*20+ 5] = 18.00; daa[17*20+ 6] = 10.00; daa[17*20+ 7] = 55.00;
+ daa[17*20+ 8] = 8.00; daa[17*20+ 9] = 9.00; daa[17*20+10] = 52.00; daa[17*20+11] = 10.00;
+ daa[17*20+12] = 24.00; daa[17*20+13] = 53.00; daa[17*20+14] = 6.00; daa[17*20+15] = 35.00;
+ daa[17*20+16] = 12.00; daa[18*20+ 0] = 11.00; daa[18*20+ 1] = 20.00; daa[18*20+ 2] = 70.00;
+ daa[18*20+ 3] = 46.00; daa[18*20+ 4] = 209.00; daa[18*20+ 5] = 24.00; daa[18*20+ 6] = 7.00;
+ daa[18*20+ 7] = 8.00; daa[18*20+ 8] = 573.00; daa[18*20+ 9] = 32.00; daa[18*20+10] = 24.00;
+ daa[18*20+11] = 8.00; daa[18*20+12] = 18.00; daa[18*20+13] = 536.00; daa[18*20+14] = 10.00;
+ daa[18*20+15] = 63.00; daa[18*20+16] = 21.00; daa[18*20+17] = 71.00; daa[19*20+ 0] = 298.00;
+ daa[19*20+ 1] = 17.00; daa[19*20+ 2] = 16.00; daa[19*20+ 3] = 31.00; daa[19*20+ 4] = 62.00;
+ daa[19*20+ 5] = 20.00; daa[19*20+ 6] = 45.00; daa[19*20+ 7] = 47.00; daa[19*20+ 8] = 11.00;
+ daa[19*20+ 9] = 961.00; daa[19*20+10] = 180.00; daa[19*20+11] = 14.00; daa[19*20+12] = 323.00;
+ daa[19*20+13] = 62.00; daa[19*20+14] = 23.00; daa[19*20+15] = 38.00; daa[19*20+16] = 112.00;
+ daa[19*20+17] = 25.00; daa[19*20+18] = 16.00;
+
+ /*f[ 0] = 0.07700; f[ 1] = 0.05200; f[ 2] = 0.04200; f[ 3] = 0.05100;
+ f[ 4] = 0.02000; f[ 5] = 0.04100; f[ 6] = 0.06200; f[ 7] = 0.07300;
+ f[ 8] = 0.02300; f[ 9] = 0.05400; f[10] = 0.09200; f[11] = 0.05900;
+ f[12] = 0.02400; f[13] = 0.04000; f[14] = 0.05100; f[15] = 0.06900;
+ f[16] = 0.05800; f[17] = 0.01400; f[18] = 0.03200; f[19] = 0.06600;*/
+
+ f[ 0] = 0.076748; f[ 1] = 0.051691; f[ 2] = 0.042645; f[ 3] = 0.051544;
+ f[ 4] = 0.019803; f[ 5] = 0.040752; f[ 6] = 0.061830; f[ 7] = 0.073152;
+ f[ 8] = 0.022944; f[ 9] = 0.053761; f[10] = 0.091904; f[11] = 0.058676;
+ f[12] = 0.023826; f[13] = 0.040126; f[14] = 0.050901; f[15] = 0.068765;
+ f[16] = 0.058565; f[17] = 0.014261; f[18] = 0.032102; f[19] = 0.066004;
+ }
+ break;
+ case MTREV:
+ {
+ daa[ 1*20+ 0] = 23.18; daa[ 2*20+ 0] = 26.95; daa[ 2*20+ 1] = 13.24; daa[ 3*20+ 0] = 17.67;
+ daa[ 3*20+ 1] = 1.90; daa[ 3*20+ 2] = 794.38; daa[ 4*20+ 0] = 59.93; daa[ 4*20+ 1] = 103.33;
+ daa[ 4*20+ 2] = 58.94; daa[ 4*20+ 3] = 1.90; daa[ 5*20+ 0] = 1.90; daa[ 5*20+ 1] = 220.99;
+ daa[ 5*20+ 2] = 173.56; daa[ 5*20+ 3] = 55.28; daa[ 5*20+ 4] = 75.24; daa[ 6*20+ 0] = 9.77;
+ daa[ 6*20+ 1] = 1.90; daa[ 6*20+ 2] = 63.05; daa[ 6*20+ 3] = 583.55; daa[ 6*20+ 4] = 1.90;
+ daa[ 6*20+ 5] = 313.56; daa[ 7*20+ 0] = 120.71; daa[ 7*20+ 1] = 23.03; daa[ 7*20+ 2] = 53.30;
+ daa[ 7*20+ 3] = 56.77; daa[ 7*20+ 4] = 30.71; daa[ 7*20+ 5] = 6.75; daa[ 7*20+ 6] = 28.28;
+ daa[ 8*20+ 0] = 13.90; daa[ 8*20+ 1] = 165.23; daa[ 8*20+ 2] = 496.13; daa[ 8*20+ 3] = 113.99;
+ daa[ 8*20+ 4] = 141.49; daa[ 8*20+ 5] = 582.40; daa[ 8*20+ 6] = 49.12; daa[ 8*20+ 7] = 1.90;
+ daa[ 9*20+ 0] = 96.49; daa[ 9*20+ 1] = 1.90; daa[ 9*20+ 2] = 27.10; daa[ 9*20+ 3] = 4.34;
+ daa[ 9*20+ 4] = 62.73; daa[ 9*20+ 5] = 8.34; daa[ 9*20+ 6] = 3.31; daa[ 9*20+ 7] = 5.98;
+ daa[ 9*20+ 8] = 12.26; daa[10*20+ 0] = 25.46; daa[10*20+ 1] = 15.58; daa[10*20+ 2] = 15.16;
+ daa[10*20+ 3] = 1.90; daa[10*20+ 4] = 25.65; daa[10*20+ 5] = 39.70; daa[10*20+ 6] = 1.90;
+ daa[10*20+ 7] = 2.41; daa[10*20+ 8] = 11.49; daa[10*20+ 9] = 329.09; daa[11*20+ 0] = 8.36;
+ daa[11*20+ 1] = 141.40; daa[11*20+ 2] = 608.70; daa[11*20+ 3] = 2.31; daa[11*20+ 4] = 1.90;
+ daa[11*20+ 5] = 465.58; daa[11*20+ 6] = 313.86; daa[11*20+ 7] = 22.73; daa[11*20+ 8] = 127.67;
+ daa[11*20+ 9] = 19.57; daa[11*20+10] = 14.88; daa[12*20+ 0] = 141.88; daa[12*20+ 1] = 1.90;
+ daa[12*20+ 2] = 65.41; daa[12*20+ 3] = 1.90; daa[12*20+ 4] = 6.18; daa[12*20+ 5] = 47.37;
+ daa[12*20+ 6] = 1.90; daa[12*20+ 7] = 1.90; daa[12*20+ 8] = 11.97; daa[12*20+ 9] = 517.98;
+ daa[12*20+10] = 537.53; daa[12*20+11] = 91.37; daa[13*20+ 0] = 6.37; daa[13*20+ 1] = 4.69;
+ daa[13*20+ 2] = 15.20; daa[13*20+ 3] = 4.98; daa[13*20+ 4] = 70.80; daa[13*20+ 5] = 19.11;
+ daa[13*20+ 6] = 2.67; daa[13*20+ 7] = 1.90; daa[13*20+ 8] = 48.16; daa[13*20+ 9] = 84.67;
+ daa[13*20+10] = 216.06; daa[13*20+11] = 6.44; daa[13*20+12] = 90.82; daa[14*20+ 0] = 54.31;
+ daa[14*20+ 1] = 23.64; daa[14*20+ 2] = 73.31; daa[14*20+ 3] = 13.43; daa[14*20+ 4] = 31.26;
+ daa[14*20+ 5] = 137.29; daa[14*20+ 6] = 12.83; daa[14*20+ 7] = 1.90; daa[14*20+ 8] = 60.97;
+ daa[14*20+ 9] = 20.63; daa[14*20+10] = 40.10; daa[14*20+11] = 50.10; daa[14*20+12] = 18.84;
+ daa[14*20+13] = 17.31; daa[15*20+ 0] = 387.86; daa[15*20+ 1] = 6.04; daa[15*20+ 2] = 494.39;
+ daa[15*20+ 3] = 69.02; daa[15*20+ 4] = 277.05; daa[15*20+ 5] = 54.11; daa[15*20+ 6] = 54.71;
+ daa[15*20+ 7] = 125.93; daa[15*20+ 8] = 77.46; daa[15*20+ 9] = 47.70; daa[15*20+10] = 73.61;
+ daa[15*20+11] = 105.79; daa[15*20+12] = 111.16; daa[15*20+13] = 64.29; daa[15*20+14] = 169.90;
+ daa[16*20+ 0] = 480.72; daa[16*20+ 1] = 2.08; daa[16*20+ 2] = 238.46; daa[16*20+ 3] = 28.01;
+ daa[16*20+ 4] = 179.97; daa[16*20+ 5] = 94.93; daa[16*20+ 6] = 14.82; daa[16*20+ 7] = 11.17;
+ daa[16*20+ 8] = 44.78; daa[16*20+ 9] = 368.43; daa[16*20+10] = 126.40; daa[16*20+11] = 136.33;
+ daa[16*20+12] = 528.17; daa[16*20+13] = 33.85; daa[16*20+14] = 128.22; daa[16*20+15] = 597.21;
+ daa[17*20+ 0] = 1.90; daa[17*20+ 1] = 21.95; daa[17*20+ 2] = 10.68; daa[17*20+ 3] = 19.86;
+ daa[17*20+ 4] = 33.60; daa[17*20+ 5] = 1.90; daa[17*20+ 6] = 1.90; daa[17*20+ 7] = 10.92;
+ daa[17*20+ 8] = 7.08; daa[17*20+ 9] = 1.90; daa[17*20+10] = 32.44; daa[17*20+11] = 24.00;
+ daa[17*20+12] = 21.71; daa[17*20+13] = 7.84; daa[17*20+14] = 4.21; daa[17*20+15] = 38.58;
+ daa[17*20+16] = 9.99; daa[18*20+ 0] = 6.48; daa[18*20+ 1] = 1.90; daa[18*20+ 2] = 191.36;
+ daa[18*20+ 3] = 21.21; daa[18*20+ 4] = 254.77; daa[18*20+ 5] = 38.82; daa[18*20+ 6] = 13.12;
+ daa[18*20+ 7] = 3.21; daa[18*20+ 8] = 670.14; daa[18*20+ 9] = 25.01; daa[18*20+10] = 44.15;
+ daa[18*20+11] = 51.17; daa[18*20+12] = 39.96; daa[18*20+13] = 465.58; daa[18*20+14] = 16.21;
+ daa[18*20+15] = 64.92; daa[18*20+16] = 38.73; daa[18*20+17] = 26.25; daa[19*20+ 0] = 195.06;
+ daa[19*20+ 1] = 7.64; daa[19*20+ 2] = 1.90; daa[19*20+ 3] = 1.90; daa[19*20+ 4] = 1.90;
+ daa[19*20+ 5] = 19.00; daa[19*20+ 6] = 21.14; daa[19*20+ 7] = 2.53; daa[19*20+ 8] = 1.90;
+ daa[19*20+ 9] = 1222.94; daa[19*20+10] = 91.67; daa[19*20+11] = 1.90; daa[19*20+12] = 387.54;
+ daa[19*20+13] = 6.35; daa[19*20+14] = 8.23; daa[19*20+15] = 1.90; daa[19*20+16] = 204.54;
+ daa[19*20+17] = 5.37; daa[19*20+18] = 1.90;
+
+
+ f[ 0] = 0.072000; f[ 1] = 0.019000; f[ 2] = 0.039000; f[ 3] = 0.019000;
+ f[ 4] = 0.006000; f[ 5] = 0.025000; f[ 6] = 0.024000; f[ 7] = 0.056000;
+ f[ 8] = 0.028000; f[ 9] = 0.088000; f[10] = 0.169000; f[11] = 0.023000;
+ f[12] = 0.054000; f[13] = 0.061000; f[14] = 0.054000; f[15] = 0.072000;
+ f[16] = 0.086000; f[17] = 0.029000; f[18] = 0.033000; f[19] = 0.043000;
+ }
+ break;
+ case WAG:
+ {
+ daa[ 1*20+ 0] = 55.15710; daa[ 2*20+ 0] = 50.98480; daa[ 2*20+ 1] = 63.53460;
+ daa[ 3*20+ 0] = 73.89980; daa[ 3*20+ 1] = 14.73040; daa[ 3*20+ 2] = 542.94200;
+ daa[ 4*20+ 0] = 102.70400; daa[ 4*20+ 1] = 52.81910; daa[ 4*20+ 2] = 26.52560;
+ daa[ 4*20+ 3] = 3.02949; daa[ 5*20+ 0] = 90.85980; daa[ 5*20+ 1] = 303.55000;
+ daa[ 5*20+ 2] = 154.36400; daa[ 5*20+ 3] = 61.67830; daa[ 5*20+ 4] = 9.88179;
+ daa[ 6*20+ 0] = 158.28500; daa[ 6*20+ 1] = 43.91570; daa[ 6*20+ 2] = 94.71980;
+ daa[ 6*20+ 3] = 617.41600; daa[ 6*20+ 4] = 2.13520; daa[ 6*20+ 5] = 546.94700;
+ daa[ 7*20+ 0] = 141.67200; daa[ 7*20+ 1] = 58.46650; daa[ 7*20+ 2] = 112.55600;
+ daa[ 7*20+ 3] = 86.55840; daa[ 7*20+ 4] = 30.66740; daa[ 7*20+ 5] = 33.00520;
+ daa[ 7*20+ 6] = 56.77170; daa[ 8*20+ 0] = 31.69540; daa[ 8*20+ 1] = 213.71500;
+ daa[ 8*20+ 2] = 395.62900; daa[ 8*20+ 3] = 93.06760; daa[ 8*20+ 4] = 24.89720;
+ daa[ 8*20+ 5] = 429.41100; daa[ 8*20+ 6] = 57.00250; daa[ 8*20+ 7] = 24.94100;
+ daa[ 9*20+ 0] = 19.33350; daa[ 9*20+ 1] = 18.69790; daa[ 9*20+ 2] = 55.42360;
+ daa[ 9*20+ 3] = 3.94370; daa[ 9*20+ 4] = 17.01350; daa[ 9*20+ 5] = 11.39170;
+ daa[ 9*20+ 6] = 12.73950; daa[ 9*20+ 7] = 3.04501; daa[ 9*20+ 8] = 13.81900;
+ daa[10*20+ 0] = 39.79150; daa[10*20+ 1] = 49.76710; daa[10*20+ 2] = 13.15280;
+ daa[10*20+ 3] = 8.48047; daa[10*20+ 4] = 38.42870; daa[10*20+ 5] = 86.94890;
+ daa[10*20+ 6] = 15.42630; daa[10*20+ 7] = 6.13037; daa[10*20+ 8] = 49.94620;
+ daa[10*20+ 9] = 317.09700; daa[11*20+ 0] = 90.62650; daa[11*20+ 1] = 535.14200;
+ daa[11*20+ 2] = 301.20100; daa[11*20+ 3] = 47.98550; daa[11*20+ 4] = 7.40339;
+ daa[11*20+ 5] = 389.49000; daa[11*20+ 6] = 258.44300; daa[11*20+ 7] = 37.35580;
+ daa[11*20+ 8] = 89.04320; daa[11*20+ 9] = 32.38320; daa[11*20+10] = 25.75550;
+ daa[12*20+ 0] = 89.34960; daa[12*20+ 1] = 68.31620; daa[12*20+ 2] = 19.82210;
+ daa[12*20+ 3] = 10.37540; daa[12*20+ 4] = 39.04820; daa[12*20+ 5] = 154.52600;
+ daa[12*20+ 6] = 31.51240; daa[12*20+ 7] = 17.41000; daa[12*20+ 8] = 40.41410;
+ daa[12*20+ 9] = 425.74600; daa[12*20+10] = 485.40200; daa[12*20+11] = 93.42760;
+ daa[13*20+ 0] = 21.04940; daa[13*20+ 1] = 10.27110; daa[13*20+ 2] = 9.61621;
+ daa[13*20+ 3] = 4.67304; daa[13*20+ 4] = 39.80200; daa[13*20+ 5] = 9.99208;
+ daa[13*20+ 6] = 8.11339; daa[13*20+ 7] = 4.99310; daa[13*20+ 8] = 67.93710;
+ daa[13*20+ 9] = 105.94700; daa[13*20+10] = 211.51700; daa[13*20+11] = 8.88360;
+ daa[13*20+12] = 119.06300; daa[14*20+ 0] = 143.85500; daa[14*20+ 1] = 67.94890;
+ daa[14*20+ 2] = 19.50810; daa[14*20+ 3] = 42.39840; daa[14*20+ 4] = 10.94040;
+ daa[14*20+ 5] = 93.33720; daa[14*20+ 6] = 68.23550; daa[14*20+ 7] = 24.35700;
+ daa[14*20+ 8] = 69.61980; daa[14*20+ 9] = 9.99288; daa[14*20+10] = 41.58440;
+ daa[14*20+11] = 55.68960; daa[14*20+12] = 17.13290; daa[14*20+13] = 16.14440;
+ daa[15*20+ 0] = 337.07900; daa[15*20+ 1] = 122.41900; daa[15*20+ 2] = 397.42300;
+ daa[15*20+ 3] = 107.17600; daa[15*20+ 4] = 140.76600; daa[15*20+ 5] = 102.88700;
+ daa[15*20+ 6] = 70.49390; daa[15*20+ 7] = 134.18200; daa[15*20+ 8] = 74.01690;
+ daa[15*20+ 9] = 31.94400; daa[15*20+10] = 34.47390; daa[15*20+11] = 96.71300;
+ daa[15*20+12] = 49.39050; daa[15*20+13] = 54.59310; daa[15*20+14] = 161.32800;
+ daa[16*20+ 0] = 212.11100; daa[16*20+ 1] = 55.44130; daa[16*20+ 2] = 203.00600;
+ daa[16*20+ 3] = 37.48660; daa[16*20+ 4] = 51.29840; daa[16*20+ 5] = 85.79280;
+ daa[16*20+ 6] = 82.27650; daa[16*20+ 7] = 22.58330; daa[16*20+ 8] = 47.33070;
+ daa[16*20+ 9] = 145.81600; daa[16*20+10] = 32.66220; daa[16*20+11] = 138.69800;
+ daa[16*20+12] = 151.61200; daa[16*20+13] = 17.19030; daa[16*20+14] = 79.53840;
+ daa[16*20+15] = 437.80200; daa[17*20+ 0] = 11.31330; daa[17*20+ 1] = 116.39200;
+ daa[17*20+ 2] = 7.19167; daa[17*20+ 3] = 12.97670; daa[17*20+ 4] = 71.70700;
+ daa[17*20+ 5] = 21.57370; daa[17*20+ 6] = 15.65570; daa[17*20+ 7] = 33.69830;
+ daa[17*20+ 8] = 26.25690; daa[17*20+ 9] = 21.24830; daa[17*20+10] = 66.53090;
+ daa[17*20+11] = 13.75050; daa[17*20+12] = 51.57060; daa[17*20+13] = 152.96400;
+ daa[17*20+14] = 13.94050; daa[17*20+15] = 52.37420; daa[17*20+16] = 11.08640;
+ daa[18*20+ 0] = 24.07350; daa[18*20+ 1] = 38.15330; daa[18*20+ 2] = 108.60000;
+ daa[18*20+ 3] = 32.57110; daa[18*20+ 4] = 54.38330; daa[18*20+ 5] = 22.77100;
+ daa[18*20+ 6] = 19.63030; daa[18*20+ 7] = 10.36040; daa[18*20+ 8] = 387.34400;
+ daa[18*20+ 9] = 42.01700; daa[18*20+10] = 39.86180; daa[18*20+11] = 13.32640;
+ daa[18*20+12] = 42.84370; daa[18*20+13] = 645.42800; daa[18*20+14] = 21.60460;
+ daa[18*20+15] = 78.69930; daa[18*20+16] = 29.11480; daa[18*20+17] = 248.53900;
+ daa[19*20+ 0] = 200.60100; daa[19*20+ 1] = 25.18490; daa[19*20+ 2] = 19.62460;
+ daa[19*20+ 3] = 15.23350; daa[19*20+ 4] = 100.21400; daa[19*20+ 5] = 30.12810;
+ daa[19*20+ 6] = 58.87310; daa[19*20+ 7] = 18.72470; daa[19*20+ 8] = 11.83580;
+ daa[19*20+ 9] = 782.13000; daa[19*20+10] = 180.03400; daa[19*20+11] = 30.54340;
+ daa[19*20+12] = 205.84500; daa[19*20+13] = 64.98920; daa[19*20+14] = 31.48870;
+ daa[19*20+15] = 23.27390; daa[19*20+16] = 138.82300; daa[19*20+17] = 36.53690;
+ daa[19*20+18] = 31.47300;
+
+ /*f[0] = 0.08700; f[1] = 0.04400; f[2] = 0.03900; f[3] = 0.05700;
+ f[4] = 0.01900; f[5] = 0.03700; f[6] = 0.05800; f[7] = 0.08300;
+ f[8] = 0.02400; f[9] = 0.04900; f[10] = 0.08600; f[11] = 0.06200;
+ f[12] = 0.02000; f[13] = 0.03800; f[14] = 0.04600; f[15] = 0.07000;
+ f[16] = 0.06100; f[17] = 0.01400; f[18] = 0.03500; f[19] = 0.07100;
+ */
+
+ f[0] = 0.0866279; f[1] = 0.043972; f[2] = 0.0390894; f[3] = 0.0570451;
+ f[4] = 0.0193078; f[5] = 0.0367281; f[6] = 0.0580589; f[7] = 0.0832518;
+ f[8] = 0.0244313; f[9] = 0.048466; f[10] = 0.086209; f[11] = 0.0620286;
+ f[12] = 0.0195027; f[13] = 0.0384319; f[14] = 0.0457631; f[15] = 0.0695179;
+ f[16] = 0.0610127; f[17] = 0.0143859; f[18] = 0.0352742; f[19] = 0.0708957;
+ }
+ break;
+ case RTREV:
+ {
+ daa[1*20+0]= 34; daa[2*20+0]= 51; daa[2*20+1]= 35; daa[3*20+0]= 10;
+ daa[3*20+1]= 30; daa[3*20+2]= 384; daa[4*20+0]= 439; daa[4*20+1]= 92;
+ daa[4*20+2]= 128; daa[4*20+3]= 1; daa[5*20+0]= 32; daa[5*20+1]= 221;
+ daa[5*20+2]= 236; daa[5*20+3]= 78; daa[5*20+4]= 70; daa[6*20+0]= 81;
+ daa[6*20+1]= 10; daa[6*20+2]= 79; daa[6*20+3]= 542; daa[6*20+4]= 1;
+ daa[6*20+5]= 372; daa[7*20+0]= 135; daa[7*20+1]= 41; daa[7*20+2]= 94;
+ daa[7*20+3]= 61; daa[7*20+4]= 48; daa[7*20+5]= 18; daa[7*20+6]= 70;
+ daa[8*20+0]= 30; daa[8*20+1]= 90; daa[8*20+2]= 320; daa[8*20+3]= 91;
+ daa[8*20+4]= 124; daa[8*20+5]= 387; daa[8*20+6]= 34; daa[8*20+7]= 68;
+ daa[9*20+0]= 1; daa[9*20+1]= 24; daa[9*20+2]= 35; daa[9*20+3]= 1;
+ daa[9*20+4]= 104; daa[9*20+5]= 33; daa[9*20+6]= 1; daa[9*20+7]= 1;
+ daa[9*20+8]= 34; daa[10*20+0]= 45; daa[10*20+1]= 18; daa[10*20+2]= 15;
+ daa[10*20+3]= 5; daa[10*20+4]= 110; daa[10*20+5]= 54; daa[10*20+6]= 21;
+ daa[10*20+7]= 3; daa[10*20+8]= 51; daa[10*20+9]= 385; daa[11*20+0]= 38;
+ daa[11*20+1]= 593; daa[11*20+2]= 123; daa[11*20+3]= 20; daa[11*20+4]= 16;
+ daa[11*20+5]= 309; daa[11*20+6]= 141; daa[11*20+7]= 30; daa[11*20+8]= 76;
+ daa[11*20+9]= 34; daa[11*20+10]= 23; daa[12*20+0]= 235; daa[12*20+1]= 57;
+ daa[12*20+2]= 1; daa[12*20+3]= 1; daa[12*20+4]= 156; daa[12*20+5]= 158;
+ daa[12*20+6]= 1; daa[12*20+7]= 37; daa[12*20+8]= 116; daa[12*20+9]= 375;
+ daa[12*20+10]= 581; daa[12*20+11]= 134; daa[13*20+0]= 1; daa[13*20+1]= 7;
+ daa[13*20+2]= 49; daa[13*20+3]= 1; daa[13*20+4]= 70; daa[13*20+5]= 1;
+ daa[13*20+6]= 1; daa[13*20+7]= 7; daa[13*20+8]= 141; daa[13*20+9]= 64;
+ daa[13*20+10]= 179; daa[13*20+11]= 14; daa[13*20+12]= 247; daa[14*20+0]= 97;
+ daa[14*20+1]= 24; daa[14*20+2]= 33; daa[14*20+3]= 55; daa[14*20+4]= 1;
+ daa[14*20+5]= 68; daa[14*20+6]= 52; daa[14*20+7]= 17; daa[14*20+8]= 44;
+ daa[14*20+9]= 10; daa[14*20+10]= 22; daa[14*20+11]= 43; daa[14*20+12]= 1;
+ daa[14*20+13]= 11; daa[15*20+0]= 460; daa[15*20+1]= 102; daa[15*20+2]= 294;
+ daa[15*20+3]= 136; daa[15*20+4]= 75; daa[15*20+5]= 225; daa[15*20+6]= 95;
+ daa[15*20+7]= 152; daa[15*20+8]= 183; daa[15*20+9]= 4; daa[15*20+10]= 24;
+ daa[15*20+11]= 77; daa[15*20+12]= 1; daa[15*20+13]= 20; daa[15*20+14]= 134;
+ daa[16*20+0]= 258; daa[16*20+1]= 64; daa[16*20+2]= 148; daa[16*20+3]= 55;
+ daa[16*20+4]= 117; daa[16*20+5]= 146; daa[16*20+6]= 82; daa[16*20+7]= 7;
+ daa[16*20+8]= 49; daa[16*20+9]= 72; daa[16*20+10]= 25; daa[16*20+11]= 110;
+ daa[16*20+12]= 131; daa[16*20+13]= 69; daa[16*20+14]= 62; daa[16*20+15]= 671;
+ daa[17*20+0]= 5; daa[17*20+1]= 13; daa[17*20+2]= 16; daa[17*20+3]= 1;
+ daa[17*20+4]= 55; daa[17*20+5]= 10; daa[17*20+6]= 17; daa[17*20+7]= 23;
+ daa[17*20+8]= 48; daa[17*20+9]= 39; daa[17*20+10]= 47; daa[17*20+11]= 6;
+ daa[17*20+12]= 111; daa[17*20+13]= 182; daa[17*20+14]= 9; daa[17*20+15]= 14;
+ daa[17*20+16]= 1; daa[18*20+0]= 55; daa[18*20+1]= 47; daa[18*20+2]= 28;
+ daa[18*20+3]= 1; daa[18*20+4]= 131; daa[18*20+5]= 45; daa[18*20+6]= 1;
+ daa[18*20+7]= 21; daa[18*20+8]= 307; daa[18*20+9]= 26; daa[18*20+10]= 64;
+ daa[18*20+11]= 1; daa[18*20+12]= 74; daa[18*20+13]= 1017; daa[18*20+14]= 14;
+ daa[18*20+15]= 31; daa[18*20+16]= 34; daa[18*20+17]= 176; daa[19*20+0]= 197;
+ daa[19*20+1]= 29; daa[19*20+2]= 21; daa[19*20+3]= 6; daa[19*20+4]= 295;
+ daa[19*20+5]= 36; daa[19*20+6]= 35; daa[19*20+7]= 3; daa[19*20+8]= 1;
+ daa[19*20+9]= 1048; daa[19*20+10]= 112; daa[19*20+11]= 19; daa[19*20+12]= 236;
+ daa[19*20+13]= 92; daa[19*20+14]= 25; daa[19*20+15]= 39; daa[19*20+16]= 196;
+ daa[19*20+17]= 26; daa[19*20+18]= 59;
+
+ f[0]= 0.0646; f[1]= 0.0453; f[2]= 0.0376; f[3]= 0.0422;
+ f[4]= 0.0114; f[5]= 0.0606; f[6]= 0.0607; f[7]= 0.0639;
+ f[8]= 0.0273; f[9]= 0.0679; f[10]= 0.1018; f[11]= 0.0751;
+ f[12]= 0.015; f[13]= 0.0287; f[14]= 0.0681; f[15]= 0.0488;
+ f[16]= 0.0622; f[17]= 0.0251; f[18]= 0.0318; f[19]= 0.0619;
+ }
+ break;
+ case CPREV:
+ {
+ daa[1*20+0]= 105; daa[2*20+0]= 227; daa[2*20+1]= 357; daa[3*20+0]= 175;
+ daa[3*20+1]= 43; daa[3*20+2]= 4435; daa[4*20+0]= 669; daa[4*20+1]= 823;
+ daa[4*20+2]= 538; daa[4*20+3]= 10; daa[5*20+0]= 157; daa[5*20+1]= 1745;
+ daa[5*20+2]= 768; daa[5*20+3]= 400; daa[5*20+4]= 10; daa[6*20+0]= 499;
+ daa[6*20+1]= 152; daa[6*20+2]= 1055; daa[6*20+3]= 3691; daa[6*20+4]= 10;
+ daa[6*20+5]= 3122; daa[7*20+0]= 665; daa[7*20+1]= 243; daa[7*20+2]= 653;
+ daa[7*20+3]= 431; daa[7*20+4]= 303; daa[7*20+5]= 133; daa[7*20+6]= 379;
+ daa[8*20+0]= 66; daa[8*20+1]= 715; daa[8*20+2]= 1405; daa[8*20+3]= 331;
+ daa[8*20+4]= 441; daa[8*20+5]= 1269; daa[8*20+6]= 162; daa[8*20+7]= 19;
+ daa[9*20+0]= 145; daa[9*20+1]= 136; daa[9*20+2]= 168; daa[9*20+3]= 10;
+ daa[9*20+4]= 280; daa[9*20+5]= 92; daa[9*20+6]= 148; daa[9*20+7]= 40;
+ daa[9*20+8]= 29; daa[10*20+0]= 197; daa[10*20+1]= 203; daa[10*20+2]= 113;
+ daa[10*20+3]= 10; daa[10*20+4]= 396; daa[10*20+5]= 286; daa[10*20+6]= 82;
+ daa[10*20+7]= 20; daa[10*20+8]= 66; daa[10*20+9]= 1745; daa[11*20+0]= 236;
+ daa[11*20+1]= 4482; daa[11*20+2]= 2430; daa[11*20+3]= 412; daa[11*20+4]= 48;
+ daa[11*20+5]= 3313; daa[11*20+6]= 2629; daa[11*20+7]= 263; daa[11*20+8]= 305;
+ daa[11*20+9]= 345; daa[11*20+10]= 218; daa[12*20+0]= 185; daa[12*20+1]= 125;
+ daa[12*20+2]= 61; daa[12*20+3]= 47; daa[12*20+4]= 159; daa[12*20+5]= 202;
+ daa[12*20+6]= 113; daa[12*20+7]= 21; daa[12*20+8]= 10; daa[12*20+9]= 1772;
+ daa[12*20+10]= 1351; daa[12*20+11]= 193; daa[13*20+0]= 68; daa[13*20+1]= 53;
+ daa[13*20+2]= 97; daa[13*20+3]= 22; daa[13*20+4]= 726; daa[13*20+5]= 10;
+ daa[13*20+6]= 145; daa[13*20+7]= 25; daa[13*20+8]= 127; daa[13*20+9]= 454;
+ daa[13*20+10]= 1268; daa[13*20+11]= 72; daa[13*20+12]= 327; daa[14*20+0]= 490;
+ daa[14*20+1]= 87; daa[14*20+2]= 173; daa[14*20+3]= 170; daa[14*20+4]= 285;
+ daa[14*20+5]= 323; daa[14*20+6]= 185; daa[14*20+7]= 28; daa[14*20+8]= 152;
+ daa[14*20+9]= 117; daa[14*20+10]= 219; daa[14*20+11]= 302; daa[14*20+12]= 100;
+ daa[14*20+13]= 43; daa[15*20+0]= 2440; daa[15*20+1]= 385; daa[15*20+2]= 2085;
+ daa[15*20+3]= 590; daa[15*20+4]= 2331; daa[15*20+5]= 396; daa[15*20+6]= 568;
+ daa[15*20+7]= 691; daa[15*20+8]= 303; daa[15*20+9]= 216; daa[15*20+10]= 516;
+ daa[15*20+11]= 868; daa[15*20+12]= 93; daa[15*20+13]= 487; daa[15*20+14]= 1202;
+ daa[16*20+0]= 1340; daa[16*20+1]= 314; daa[16*20+2]= 1393; daa[16*20+3]= 266;
+ daa[16*20+4]= 576; daa[16*20+5]= 241; daa[16*20+6]= 369; daa[16*20+7]= 92;
+ daa[16*20+8]= 32; daa[16*20+9]= 1040; daa[16*20+10]= 156; daa[16*20+11]= 918;
+ daa[16*20+12]= 645; daa[16*20+13]= 148; daa[16*20+14]= 260; daa[16*20+15]= 2151;
+ daa[17*20+0]= 14; daa[17*20+1]= 230; daa[17*20+2]= 40; daa[17*20+3]= 18;
+ daa[17*20+4]= 435; daa[17*20+5]= 53; daa[17*20+6]= 63; daa[17*20+7]= 82;
+ daa[17*20+8]= 69; daa[17*20+9]= 42; daa[17*20+10]= 159; daa[17*20+11]= 10;
+ daa[17*20+12]= 86; daa[17*20+13]= 468; daa[17*20+14]= 49; daa[17*20+15]= 73;
+ daa[17*20+16]= 29; daa[18*20+0]= 56; daa[18*20+1]= 323; daa[18*20+2]= 754;
+ daa[18*20+3]= 281; daa[18*20+4]= 1466; daa[18*20+5]= 391; daa[18*20+6]= 142;
+ daa[18*20+7]= 10; daa[18*20+8]= 1971; daa[18*20+9]= 89; daa[18*20+10]= 189;
+ daa[18*20+11]= 247; daa[18*20+12]= 215; daa[18*20+13]= 2370; daa[18*20+14]= 97;
+ daa[18*20+15]= 522; daa[18*20+16]= 71; daa[18*20+17]= 346; daa[19*20+0]= 968;
+ daa[19*20+1]= 92; daa[19*20+2]= 83; daa[19*20+3]= 75; daa[19*20+4]= 592;
+ daa[19*20+5]= 54; daa[19*20+6]= 200; daa[19*20+7]= 91; daa[19*20+8]= 25;
+ daa[19*20+9]= 4797; daa[19*20+10]= 865; daa[19*20+11]= 249; daa[19*20+12]= 475;
+ daa[19*20+13]= 317; daa[19*20+14]= 122; daa[19*20+15]= 167; daa[19*20+16]= 760;
+ daa[19*20+17]= 10; daa[19*20+18]= 119;
+
+ f[0]= 0.076; f[1]= 0.062; f[2]= 0.041; f[3]= 0.037;
+ f[4]= 0.009; f[5]= 0.038; f[6]= 0.049; f[7]= 0.084;
+ f[8]= 0.025; f[9]= 0.081; f[10]= 0.101; f[11]= 0.05;
+ f[12]= 0.022; f[13]= 0.051; f[14]= 0.043; f[15]= 0.062;
+ f[16]= 0.054; f[17]= 0.018; f[18]= 0.031; f[19]= 0.066;
+ }
+ break;
+ case VT:
+ {
+ /*
+ daa[1*20+0]= 0.233108; daa[2*20+0]= 0.199097; daa[2*20+1]= 0.210797; daa[3*20+0]= 0.265145;
+ daa[3*20+1]= 0.105191; daa[3*20+2]= 0.883422; daa[4*20+0]= 0.227333; daa[4*20+1]= 0.031726;
+ daa[4*20+2]= 0.027495; daa[4*20+3]= 0.010313; daa[5*20+0]= 0.310084; daa[5*20+1]= 0.493763;
+ daa[5*20+2]= 0.2757; daa[5*20+3]= 0.205842; daa[5*20+4]= 0.004315; daa[6*20+0]= 0.567957;
+ daa[6*20+1]= 0.25524; daa[6*20+2]= 0.270417; daa[6*20+3]= 1.599461; daa[6*20+4]= 0.005321;
+ daa[6*20+5]= 0.960976; daa[7*20+0]= 0.876213; daa[7*20+1]= 0.156945; daa[7*20+2]= 0.362028;
+ daa[7*20+3]= 0.311718; daa[7*20+4]= 0.050876; daa[7*20+5]= 0.12866; daa[7*20+6]= 0.250447;
+ daa[8*20+0]= 0.078692; daa[8*20+1]= 0.213164; daa[8*20+2]= 0.290006; daa[8*20+3]= 0.134252;
+ daa[8*20+4]= 0.016695; daa[8*20+5]= 0.315521; daa[8*20+6]= 0.104458; daa[8*20+7]= 0.058131;
+ daa[9*20+0]= 0.222972; daa[9*20+1]= 0.08151; daa[9*20+2]= 0.087225; daa[9*20+3]= 0.01172;
+ daa[9*20+4]= 0.046398; daa[9*20+5]= 0.054602; daa[9*20+6]= 0.046589; daa[9*20+7]= 0.051089;
+ daa[9*20+8]= 0.020039; daa[10*20+0]= 0.42463; daa[10*20+1]= 0.192364; daa[10*20+2]= 0.069245;
+ daa[10*20+3]= 0.060863; daa[10*20+4]= 0.091709; daa[10*20+5]= 0.24353; daa[10*20+6]= 0.151924;
+ daa[10*20+7]= 0.087056; daa[10*20+8]= 0.103552; daa[10*20+9]= 2.08989; daa[11*20+0]= 0.393245;
+ daa[11*20+1]= 1.755838; daa[11*20+2]= 0.50306; daa[11*20+3]= 0.261101; daa[11*20+4]= 0.004067;
+ daa[11*20+5]= 0.738208; daa[11*20+6]= 0.88863; daa[11*20+7]= 0.193243; daa[11*20+8]= 0.153323;
+ daa[11*20+9]= 0.093181; daa[11*20+10]= 0.201204; daa[12*20+0]= 0.21155; daa[12*20+1]= 0.08793;
+ daa[12*20+2]= 0.05742; daa[12*20+3]= 0.012182; daa[12*20+4]= 0.02369; daa[12*20+5]= 0.120801;
+ daa[12*20+6]= 0.058643; daa[12*20+7]= 0.04656; daa[12*20+8]= 0.021157; daa[12*20+9]= 0.493845;
+ daa[12*20+10]= 1.105667; daa[12*20+11]= 0.096474; daa[13*20+0]= 0.116646; daa[13*20+1]= 0.042569;
+ daa[13*20+2]= 0.039769; daa[13*20+3]= 0.016577; daa[13*20+4]= 0.051127; daa[13*20+5]= 0.026235;
+ daa[13*20+6]= 0.028168; daa[13*20+7]= 0.050143; daa[13*20+8]= 0.079807; daa[13*20+9]= 0.32102;
+ daa[13*20+10]= 0.946499; daa[13*20+11]= 0.038261; daa[13*20+12]= 0.173052; daa[14*20+0]= 0.399143;
+ daa[14*20+1]= 0.12848; daa[14*20+2]= 0.083956; daa[14*20+3]= 0.160063; daa[14*20+4]= 0.011137;
+ daa[14*20+5]= 0.15657; daa[14*20+6]= 0.205134; daa[14*20+7]= 0.124492; daa[14*20+8]= 0.078892;
+ daa[14*20+9]= 0.054797; daa[14*20+10]= 0.169784; daa[14*20+11]= 0.212302; daa[14*20+12]= 0.010363;
+ daa[14*20+13]= 0.042564; daa[15*20+0]= 1.817198; daa[15*20+1]= 0.292327; daa[15*20+2]= 0.847049;
+ daa[15*20+3]= 0.461519; daa[15*20+4]= 0.17527; daa[15*20+5]= 0.358017; daa[15*20+6]= 0.406035;
+ daa[15*20+7]= 0.612843; daa[15*20+8]= 0.167406; daa[15*20+9]= 0.081567; daa[15*20+10]= 0.214977;
+ daa[15*20+11]= 0.400072; daa[15*20+12]= 0.090515; daa[15*20+13]= 0.138119; daa[15*20+14]= 0.430431;
+ daa[16*20+0]= 0.877877; daa[16*20+1]= 0.204109; daa[16*20+2]= 0.471268; daa[16*20+3]= 0.178197;
+ daa[16*20+4]= 0.079511; daa[16*20+5]= 0.248992; daa[16*20+6]= 0.321028; daa[16*20+7]= 0.136266;
+ daa[16*20+8]= 0.101117; daa[16*20+9]= 0.376588; daa[16*20+10]= 0.243227; daa[16*20+11]= 0.446646;
+ daa[16*20+12]= 0.184609; daa[16*20+13]= 0.08587; daa[16*20+14]= 0.207143; daa[16*20+15]= 1.767766;
+ daa[17*20+0]= 0.030309; daa[17*20+1]= 0.046417; daa[17*20+2]= 0.010459; daa[17*20+3]= 0.011393;
+ daa[17*20+4]= 0.007732; daa[17*20+5]= 0.021248; daa[17*20+6]= 0.018844; daa[17*20+7]= 0.02399;
+ daa[17*20+8]= 0.020009; daa[17*20+9]= 0.034954; daa[17*20+10]= 0.083439; daa[17*20+11]= 0.023321;
+ daa[17*20+12]= 0.022019; daa[17*20+13]= 0.12805; daa[17*20+14]= 0.014584; daa[17*20+15]= 0.035933;
+ daa[17*20+16]= 0.020437; daa[18*20+0]= 0.087061; daa[18*20+1]= 0.09701; daa[18*20+2]= 0.093268;
+ daa[18*20+3]= 0.051664; daa[18*20+4]= 0.042823; daa[18*20+5]= 0.062544; daa[18*20+6]= 0.0552;
+ daa[18*20+7]= 0.037568; daa[18*20+8]= 0.286027; daa[18*20+9]= 0.086237; daa[18*20+10]= 0.189842;
+ daa[18*20+11]= 0.068689; daa[18*20+12]= 0.073223; daa[18*20+13]= 0.898663; daa[18*20+14]= 0.032043;
+ daa[18*20+15]= 0.121979; daa[18*20+16]= 0.094617; daa[18*20+17]= 0.124746; daa[19*20+0]= 1.230985;
+ daa[19*20+1]= 0.113146; daa[19*20+2]= 0.049824; daa[19*20+3]= 0.048769; daa[19*20+4]= 0.163831;
+ daa[19*20+5]= 0.112027; daa[19*20+6]= 0.205868; daa[19*20+7]= 0.082579; daa[19*20+8]= 0.068575;
+ daa[19*20+9]= 3.65443; daa[19*20+10]= 1.337571; daa[19*20+11]= 0.144587; daa[19*20+12]= 0.307309;
+ daa[19*20+13]= 0.247329; daa[19*20+14]= 0.129315; daa[19*20+15]= 0.1277; daa[19*20+16]= 0.740372;
+ daa[19*20+17]= 0.022134; daa[19*20+18]= 0.125733;
+
+ f[0] = 0.07900; f[1]= 0.05100; f[2] = 0.04200; f[3]= 0.05300;
+ f[4] = 0.01500; f[5]= 0.03700; f[6] = 0.06200; f[7]= 0.07100;
+ f[8] = 0.02300; f[9]= 0.06200; f[10] = 0.09600; f[11]= 0.05700;
+ f[12] = 0.02400; f[13]= 0.04300; f[14] = 0.04400; f[15]= 0.06400;
+ f[16] = 0.05600; f[17]= 0.01300; f[18] = 0.03500; f[19]= 0.07300;
+ */
+
+ daa[1*20+0]= 1.2412691067876198;
+ daa[2*20+0]= 1.2184237953498958;
+ daa[2*20+1]= 1.5720770753326880;
+ daa[3*20+0]= 1.3759368509441177;
+ daa[3*20+1]= 0.7550654439001206;
+ daa[3*20+2]= 7.8584219153689405;
+ daa[4*20+0]= 2.4731223087544874;
+ daa[4*20+1]= 1.4414262567428417;
+ daa[4*20+2]= 0.9784679122774127;
+ daa[4*20+3]= 0.2272488448121475;
+ daa[5*20+0]= 2.2155167805137470;
+ daa[5*20+1]= 5.5120819705248678;
+ daa[5*20+2]= 3.0143201670924822;
+ daa[5*20+3]= 1.6562495638176040;
+ daa[5*20+4]= 0.4587469126746136;
+ daa[6*20+0]= 2.3379911207495061;
+ daa[6*20+1]= 1.3542404860613146;
+ daa[6*20+2]= 2.0093434778398112;
+ daa[6*20+3]= 9.6883451875685065;
+ daa[6*20+4]= 0.4519167943192672;
+ daa[6*20+5]= 6.8124601839937675;
+ daa[7*20+0]= 3.3386555146457697;
+ daa[7*20+1]= 1.3121700301622004;
+ daa[7*20+2]= 2.4117632898861809;
+ daa[7*20+3]= 1.9142079025990228;
+ daa[7*20+4]= 1.1034605684472507;
+ daa[7*20+5]= 0.8776110594765502;
+ daa[7*20+6]= 1.3860121390169038;
+ daa[8*20+0]= 0.9615841926910841;
+ daa[8*20+1]= 4.9238668283945266;
+ daa[8*20+2]= 6.1974384977884114;
+ daa[8*20+3]= 2.1459640610133781;
+ daa[8*20+4]= 1.5196756759380692;
+ daa[8*20+5]= 7.9943228564946525;
+ daa[8*20+6]= 1.6360079688522375;
+ daa[8*20+7]= 0.8561248973045037;
+ daa[9*20+0]= 0.8908203061925510;
+ daa[9*20+1]= 0.4323005487925516;
+ daa[9*20+2]= 0.9179291175331520;
+ daa[9*20+3]= 0.2161660372725585;
+ daa[9*20+4]= 0.9126668032539315;
+ daa[9*20+5]= 0.4882733432879921;
+ daa[9*20+6]= 0.4035497929633328;
+ daa[9*20+7]= 0.2888075033037488;
+ daa[9*20+8]= 0.5787937115407940;
+ daa[10*20+0]= 1.0778497408764076;
+ daa[10*20+1]= 0.8386701149158265;
+ daa[10*20+2]= 0.4098311270816011;
+ daa[10*20+3]= 0.3574207468998517;
+ daa[10*20+4]= 1.4081315998413697;
+ daa[10*20+5]= 1.3318097154194044;
+ daa[10*20+6]= 0.5610717242294755;
+ daa[10*20+7]= 0.3578662395745526;
+ daa[10*20+8]= 1.0765007949562073;
+ daa[10*20+9]= 6.0019110258426362;
+ daa[11*20+0]= 1.4932055816372476;
+ daa[11*20+1]= 10.017330817366002;
+ daa[11*20+2]= 4.4034547578962568;
+ daa[11*20+3]= 1.4521790561663968;
+ daa[11*20+4]= 0.3371091785647479;
+ daa[11*20+5]= 6.0519085243118811;
+ daa[11*20+6]= 4.3290086529582830;
+ daa[11*20+7]= 0.8945563662345198;
+ daa[11*20+8]= 1.8085136096039203;
+ daa[11*20+9]= 0.6244297525127139;
+ daa[11*20+10]= 0.5642322882556321;
+ daa[12*20+0]= 1.9006455961717605;
+ daa[12*20+1]= 1.2488638689609959;
+ daa[12*20+2]= 0.9378803706165143;
+ daa[12*20+3]= 0.4075239926000898;
+ daa[12*20+4]= 1.2213054800811556;
+ daa[12*20+5]= 1.9106190827629084;
+ daa[12*20+6]= 0.7471936218068498;
+ daa[12*20+7]= 0.5954812791740037;
+ daa[12*20+8]= 1.3808291710019667;
+ daa[12*20+9]= 6.7597899772045418;
+ daa[12*20+10]= 8.0327792947421148;
+ daa[12*20+11]= 1.7129670976916258;
+ daa[13*20+0]= 0.6883439026872615;
+ daa[13*20+1]= 0.4224945197276290;
+ daa[13*20+2]= 0.5044944273324311;
+ daa[13*20+3]= 0.1675129724559251;
+ daa[13*20+4]= 1.6953951980808002;
+ daa[13*20+5]= 0.3573432522499545;
+ daa[13*20+6]= 0.2317194387691585;
+ daa[13*20+7]= 0.3693722640980460;
+ daa[13*20+8]= 1.3629765501081097;
+ daa[13*20+9]= 2.2864286949316077;
+ daa[13*20+10]= 4.3611548063555778;
+ daa[13*20+11]= 0.3910559903834828;
+ daa[13*20+12]= 2.3201373546296349;
+ daa[14*20+0]= 2.7355620089953550;
+ daa[14*20+1]= 1.3091837782420783;
+ daa[14*20+2]= 0.7103720531974738;
+ daa[14*20+3]= 1.0714605979577547;
+ daa[14*20+4]= 0.4326227078645523;
+ daa[14*20+5]= 2.3019177728300728;
+ daa[14*20+6]= 1.5132807416252063;
+ daa[14*20+7]= 0.7744933618134962;
+ daa[14*20+8]= 1.8370555852070649;
+ daa[14*20+9]= 0.4811402387911145;
+ daa[14*20+10]= 1.0084320519837335;
+ daa[14*20+11]= 1.3918935593582853;
+ daa[14*20+12]= 0.4953193808676289;
+ daa[14*20+13]= 0.3746821107962129;
+ daa[15*20+0]= 6.4208961859142883;
+ daa[15*20+1]= 1.9202994262316166;
+ daa[15*20+2]= 6.1234512396801764;
+ daa[15*20+3]= 2.2161944596741829;
+ daa[15*20+4]= 3.6366815408744255;
+ daa[15*20+5]= 2.3193703643237220;
+ daa[15*20+6]= 1.8273535587773553;
+ daa[15*20+7]= 3.0637776193717610;
+ daa[15*20+8]= 1.9699895187387506;
+ daa[15*20+9]= 0.6047491507504744;
+ daa[15*20+10]= 0.8953754669269811;
+ daa[15*20+11]= 1.9776630140912268;
+ daa[15*20+12]= 1.0657482318076852;
+ daa[15*20+13]= 1.1079144700606407;
+ daa[15*20+14]= 3.5465914843628927;
+ daa[16*20+0]= 5.2892514169776437;
+ daa[16*20+1]= 1.3363401740560601;
+ daa[16*20+2]= 3.8852506105922231;
+ daa[16*20+3]= 1.5066839872944762;
+ daa[16*20+4]= 1.7557065205837685;
+ daa[16*20+5]= 2.1576510103471440;
+ daa[16*20+6]= 1.5839981708584689;
+ daa[16*20+7]= 0.7147489676267383;
+ daa[16*20+8]= 1.6136654573285647;
+ daa[16*20+9]= 2.6344778384442731;
+ daa[16*20+10]= 1.0192004372506540;
+ daa[16*20+11]= 2.5513781312660280;
+ daa[16*20+12]= 3.3628488360462363;
+ daa[16*20+13]= 0.6882725908872254;
+ daa[16*20+14]= 1.9485376673137556;
+ daa[16*20+15]= 8.8479984061248178;
+ daa[17*20+0]= 0.5488578478106930;
+ daa[17*20+1]= 1.5170142153962840;
+ daa[17*20+2]= 0.1808525752605976;
+ daa[17*20+3]= 0.2496584188151770;
+ daa[17*20+4]= 1.6275179891253113;
+ daa[17*20+5]= 0.8959082681546182;
+ daa[17*20+6]= 0.4198391148111098;
+ daa[17*20+7]= 0.9349753595598769;
+ daa[17*20+8]= 0.6301954684360302;
+ daa[17*20+9]= 0.5604648274060783;
+ daa[17*20+10]= 1.5183114434679339;
+ daa[17*20+11]= 0.5851920879490173;
+ daa[17*20+12]= 1.4680478689711018;
+ daa[17*20+13]= 3.3448437239772266;
+ daa[17*20+14]= 0.4326058001438786;
+ daa[17*20+15]= 0.6791126595939816;
+ daa[17*20+16]= 0.4514203099376473;
+ daa[18*20+0]= 0.5411769916657778;
+ daa[18*20+1]= 0.8912614404565405;
+ daa[18*20+2]= 1.0894926581511342;
+ daa[18*20+3]= 0.7447620891784513;
+ daa[18*20+4]= 2.1579775140421025;
+ daa[18*20+5]= 0.9183596801412757;
+ daa[18*20+6]= 0.5818111331782764;
+ daa[18*20+7]= 0.3374467649724478;
+ daa[18*20+8]= 7.7587442309146040;
+ daa[18*20+9]= 0.8626796044156272;
+ daa[18*20+10]= 1.2452243224541324;
+ daa[18*20+11]= 0.7835447533710449;
+ daa[18*20+12]= 1.0899165770956820;
+ daa[18*20+13]= 10.384852333133459;
+ daa[18*20+14]= 0.4819109019647465;
+ daa[18*20+15]= 0.9547229305958682;
+ daa[18*20+16]= 0.8564314184691215;
+ daa[18*20+17]= 4.5377235790405388;
+ daa[19*20+0]= 4.6501894691803214;
+ daa[19*20+1]= 0.7807017855806767;
+ daa[19*20+2]= 0.4586061981719967;
+ daa[19*20+3]= 0.4594535241660911;
+ daa[19*20+4]= 2.2627456996290891;
+ daa[19*20+5]= 0.6366932501396869;
+ daa[19*20+6]= 0.8940572875547330;
+ daa[19*20+7]= 0.6193321034173915;
+ daa[19*20+8]= 0.5333220944030346;
+ daa[19*20+9]= 14.872933461519061;
+ daa[19*20+10]= 3.5458093276667237;
+ daa[19*20+11]= 0.7801080335991272;
+ daa[19*20+12]= 4.0584577156753401;
+ daa[19*20+13]= 1.7039730522675411;
+ daa[19*20+14]= 0.5985498912985666;
+ daa[19*20+15]= 0.9305232113028208;
+ daa[19*20+16]= 3.4242218450865543;
+ daa[19*20+17]= 0.5658969249032649;
+ daa[19*20+18]= 1.0000000000000000;
+
+ f[0]= 0.0770764620135024;
+ f[1]= 0.0500819370772208;
+ f[2]= 0.0462377395993731;
+ f[3]= 0.0537929860758246;
+ f[4]= 0.0144533387583345;
+ f[5]= 0.0408923608974345;
+ f[6]= 0.0633579339160905;
+ f[7]= 0.0655672355884439;
+ f[8]= 0.0218802687005936;
+ f[9]= 0.0591969699027449;
+ f[10]= 0.0976461276528445;
+ f[11]= 0.0592079410822730;
+ f[12]= 0.0220695876653368;
+ f[13]= 0.0413508521834260;
+ f[14]= 0.0476871596856874;
+ f[15]= 0.0707295165111524;
+ f[16]= 0.0567759161524817;
+ f[17]= 0.0127019797647213;
+ f[18]= 0.0323746050281867;
+ f[19]= 0.0669190817443274;
+ }
+ break;
+ case BLOSUM62:
+ {
+ daa[1*20+0]= 0.735790389698; daa[2*20+0]= 0.485391055466; daa[2*20+1]= 1.297446705134;
+ daa[3*20+0]= 0.543161820899;
+ daa[3*20+1]= 0.500964408555; daa[3*20+2]= 3.180100048216; daa[4*20+0]= 1.45999531047;
+ daa[4*20+1]= 0.227826574209;
+ daa[4*20+2]= 0.397358949897; daa[4*20+3]= 0.240836614802; daa[5*20+0]= 1.199705704602;
+ daa[5*20+1]= 3.020833610064;
+ daa[5*20+2]= 1.839216146992; daa[5*20+3]= 1.190945703396; daa[5*20+4]= 0.32980150463;
+ daa[6*20+0]= 1.1709490428;
+ daa[6*20+1]= 1.36057419042; daa[6*20+2]= 1.24048850864; daa[6*20+3]= 3.761625208368;
+ daa[6*20+4]= 0.140748891814;
+ daa[6*20+5]= 5.528919177928; daa[7*20+0]= 1.95588357496; daa[7*20+1]= 0.418763308518;
+ daa[7*20+2]= 1.355872344485;
+ daa[7*20+3]= 0.798473248968; daa[7*20+4]= 0.418203192284; daa[7*20+5]= 0.609846305383;
+ daa[7*20+6]= 0.423579992176;
+ daa[8*20+0]= 0.716241444998; daa[8*20+1]= 1.456141166336; daa[8*20+2]= 2.414501434208;
+ daa[8*20+3]= 0.778142664022;
+ daa[8*20+4]= 0.354058109831; daa[8*20+5]= 2.43534113114; daa[8*20+6]= 1.626891056982;
+ daa[8*20+7]= 0.539859124954;
+ daa[9*20+0]= 0.605899003687; daa[9*20+1]= 0.232036445142; daa[9*20+2]= 0.283017326278;
+ daa[9*20+3]= 0.418555732462;
+ daa[9*20+4]= 0.774894022794; daa[9*20+5]= 0.236202451204; daa[9*20+6]= 0.186848046932;
+ daa[9*20+7]= 0.189296292376;
+ daa[9*20+8]= 0.252718447885; daa[10*20+0]= 0.800016530518; daa[10*20+1]= 0.622711669692;
+ daa[10*20+2]= 0.211888159615;
+ daa[10*20+3]= 0.218131577594; daa[10*20+4]= 0.831842640142; daa[10*20+5]= 0.580737093181;
+ daa[10*20+6]= 0.372625175087;
+ daa[10*20+7]= 0.217721159236; daa[10*20+8]= 0.348072209797; daa[10*20+9]= 3.890963773304;
+ daa[11*20+0]= 1.295201266783;
+ daa[11*20+1]= 5.411115141489; daa[11*20+2]= 1.593137043457; daa[11*20+3]= 1.032447924952;
+ daa[11*20+4]= 0.285078800906;
+ daa[11*20+5]= 3.945277674515; daa[11*20+6]= 2.802427151679; daa[11*20+7]= 0.752042440303;
+ daa[11*20+8]= 1.022507035889;
+ daa[11*20+9]= 0.406193586642; daa[11*20+10]= 0.445570274261;daa[12*20+0]= 1.253758266664;
+ daa[12*20+1]= 0.983692987457;
+ daa[12*20+2]= 0.648441278787; daa[12*20+3]= 0.222621897958; daa[12*20+4]= 0.76768882348;
+ daa[12*20+5]= 2.494896077113;
+ daa[12*20+6]= 0.55541539747; daa[12*20+7]= 0.459436173579; daa[12*20+8]= 0.984311525359;
+ daa[12*20+9]= 3.364797763104;
+ daa[12*20+10]= 6.030559379572;daa[12*20+11]= 1.073061184332;daa[13*20+0]= 0.492964679748;
+ daa[13*20+1]= 0.371644693209;
+ daa[13*20+2]= 0.354861249223; daa[13*20+3]= 0.281730694207; daa[13*20+4]= 0.441337471187;
+ daa[13*20+5]= 0.14435695975;
+ daa[13*20+6]= 0.291409084165; daa[13*20+7]= 0.368166464453; daa[13*20+8]= 0.714533703928;
+ daa[13*20+9]= 1.517359325954;
+ daa[13*20+10]= 2.064839703237;daa[13*20+11]= 0.266924750511;daa[13*20+12]= 1.77385516883;
+ daa[14*20+0]= 1.173275900924;
+ daa[14*20+1]= 0.448133661718; daa[14*20+2]= 0.494887043702; daa[14*20+3]= 0.730628272998;
+ daa[14*20+4]= 0.356008498769;
+ daa[14*20+5]= 0.858570575674; daa[14*20+6]= 0.926563934846; daa[14*20+7]= 0.504086599527; daa[14*20+8]= 0.527007339151;
+ daa[14*20+9]= 0.388355409206; daa[14*20+10]= 0.374555687471;daa[14*20+11]= 1.047383450722;daa[14*20+12]= 0.454123625103;
+ daa[14*20+13]= 0.233597909629;daa[15*20+0]= 4.325092687057; daa[15*20+1]= 1.12278310421; daa[15*20+2]= 2.904101656456;
+ daa[15*20+3]= 1.582754142065; daa[15*20+4]= 1.197188415094; daa[15*20+5]= 1.934870924596; daa[15*20+6]= 1.769893238937;
+ daa[15*20+7]= 1.509326253224; daa[15*20+8]= 1.11702976291; daa[15*20+9]= 0.35754441246; daa[15*20+10]= 0.352969184527;
+ daa[15*20+11]= 1.752165917819;daa[15*20+12]= 0.918723415746;daa[15*20+13]= 0.540027644824;daa[15*20+14]= 1.169129577716;
+ daa[16*20+0]= 1.729178019485; daa[16*20+1]= 0.914665954563; daa[16*20+2]= 1.898173634533; daa[16*20+3]= 0.934187509431;
+ daa[16*20+4]= 1.119831358516; daa[16*20+5]= 1.277480294596; daa[16*20+6]= 1.071097236007; daa[16*20+7]= 0.641436011405;
+ daa[16*20+8]= 0.585407090225; daa[16*20+9]= 1.17909119726; daa[16*20+10]= 0.915259857694;daa[16*20+11]= 1.303875200799;
+ daa[16*20+12]= 1.488548053722;daa[16*20+13]= 0.488206118793;daa[16*20+14]= 1.005451683149;daa[16*20+15]= 5.15155629227;
+ daa[17*20+0]= 0.465839367725; daa[17*20+1]= 0.426382310122; daa[17*20+2]= 0.191482046247; daa[17*20+3]= 0.145345046279;
+ daa[17*20+4]= 0.527664418872; daa[17*20+5]= 0.758653808642; daa[17*20+6]= 0.407635648938; daa[17*20+7]= 0.508358924638;
+ daa[17*20+8]= 0.30124860078; daa[17*20+9]= 0.34198578754; daa[17*20+10]= 0.6914746346; daa[17*20+11]= 0.332243040634;
+ daa[17*20+12]= 0.888101098152;daa[17*20+13]= 2.074324893497;daa[17*20+14]= 0.252214830027;daa[17*20+15]= 0.387925622098;
+ daa[17*20+16]= 0.513128126891;daa[18*20+0]= 0.718206697586; daa[18*20+1]= 0.720517441216; daa[18*20+2]= 0.538222519037;
+ daa[18*20+3]= 0.261422208965; daa[18*20+4]= 0.470237733696; daa[18*20+5]= 0.95898974285; daa[18*20+6]= 0.596719300346;
+ daa[18*20+7]= 0.308055737035; daa[18*20+8]= 4.218953969389; daa[18*20+9]= 0.674617093228; daa[18*20+10]= 0.811245856323;
+ daa[18*20+11]= 0.7179934869; daa[18*20+12]= 0.951682162246;daa[18*20+13]= 6.747260430801;daa[18*20+14]= 0.369405319355;
+ daa[18*20+15]= 0.796751520761;daa[18*20+16]= 0.801010243199;daa[18*20+17]= 4.054419006558;daa[19*20+0]= 2.187774522005;
+ daa[19*20+1]= 0.438388343772; daa[19*20+2]= 0.312858797993; daa[19*20+3]= 0.258129289418; daa[19*20+4]= 1.116352478606;
+ daa[19*20+5]= 0.530785790125; daa[19*20+6]= 0.524253846338; daa[19*20+7]= 0.25334079019; daa[19*20+8]= 0.20155597175;
+ daa[19*20+9]= 8.311839405458; daa[19*20+10]= 2.231405688913;daa[19*20+11]= 0.498138475304;daa[19*20+12]= 2.575850755315;
+ daa[19*20+13]= 0.838119610178;daa[19*20+14]= 0.496908410676;daa[19*20+15]= 0.561925457442;daa[19*20+16]= 2.253074051176;
+ daa[19*20+17]= 0.266508731426;daa[19*20+18]= 1;
+
+ f[0]= 0.074; f[1]= 0.052; f[2]= 0.045; f[3]= 0.054;
+ f[4]= 0.025; f[5]= 0.034; f[6]= 0.054; f[7]= 0.074;
+ f[8]= 0.026; f[9]= 0.068; f[10]= 0.099; f[11]= 0.058;
+ f[12]= 0.025; f[13]= 0.047; f[14]= 0.039; f[15]= 0.057;
+ f[16]= 0.051; f[17]= 0.013; f[18]= 0.032; f[19]= 0.073;
+ }
+ break;
+ case MTMAM:
+ {
+ daa[1*20+0]= 32; daa[2*20+0]= 2; daa[2*20+1]= 4; daa[3*20+0]= 11;
+ daa[3*20+1]= 0; daa[3*20+2]= 864; daa[4*20+0]= 0; daa[4*20+1]= 186;
+ daa[4*20+2]= 0; daa[4*20+3]= 0; daa[5*20+0]= 0; daa[5*20+1]= 246;
+ daa[5*20+2]= 8; daa[5*20+3]= 49; daa[5*20+4]= 0; daa[6*20+0]= 0;
+ daa[6*20+1]= 0; daa[6*20+2]= 0; daa[6*20+3]= 569; daa[6*20+4]= 0;
+ daa[6*20+5]= 274; daa[7*20+0]= 78; daa[7*20+1]= 18; daa[7*20+2]= 47;
+ daa[7*20+3]= 79; daa[7*20+4]= 0; daa[7*20+5]= 0; daa[7*20+6]= 22;
+ daa[8*20+0]= 8; daa[8*20+1]= 232; daa[8*20+2]= 458; daa[8*20+3]= 11;
+ daa[8*20+4]= 305; daa[8*20+5]= 550; daa[8*20+6]= 22; daa[8*20+7]= 0;
+ daa[9*20+0]= 75; daa[9*20+1]= 0; daa[9*20+2]= 19; daa[9*20+3]= 0;
+ daa[9*20+4]= 41; daa[9*20+5]= 0; daa[9*20+6]= 0; daa[9*20+7]= 0;
+ daa[9*20+8]= 0; daa[10*20+0]= 21; daa[10*20+1]= 6; daa[10*20+2]= 0;
+ daa[10*20+3]= 0; daa[10*20+4]= 27; daa[10*20+5]= 20; daa[10*20+6]= 0;
+ daa[10*20+7]= 0; daa[10*20+8]= 26; daa[10*20+9]= 232; daa[11*20+0]= 0;
+ daa[11*20+1]= 50; daa[11*20+2]= 408; daa[11*20+3]= 0; daa[11*20+4]= 0;
+ daa[11*20+5]= 242; daa[11*20+6]= 215; daa[11*20+7]= 0; daa[11*20+8]= 0;
+ daa[11*20+9]= 6; daa[11*20+10]= 4; daa[12*20+0]= 76; daa[12*20+1]= 0;
+ daa[12*20+2]= 21; daa[12*20+3]= 0; daa[12*20+4]= 0; daa[12*20+5]= 22;
+ daa[12*20+6]= 0; daa[12*20+7]= 0; daa[12*20+8]= 0; daa[12*20+9]= 378;
+ daa[12*20+10]= 609; daa[12*20+11]= 59; daa[13*20+0]= 0; daa[13*20+1]= 0;
+ daa[13*20+2]= 6; daa[13*20+3]= 5; daa[13*20+4]= 7; daa[13*20+5]= 0;
+ daa[13*20+6]= 0; daa[13*20+7]= 0; daa[13*20+8]= 0; daa[13*20+9]= 57;
+ daa[13*20+10]= 246; daa[13*20+11]= 0; daa[13*20+12]= 11; daa[14*20+0]= 53;
+ daa[14*20+1]= 9; daa[14*20+2]= 33; daa[14*20+3]= 2; daa[14*20+4]= 0;
+ daa[14*20+5]= 51; daa[14*20+6]= 0; daa[14*20+7]= 0; daa[14*20+8]= 53;
+ daa[14*20+9]= 5; daa[14*20+10]= 43; daa[14*20+11]= 18; daa[14*20+12]= 0;
+ daa[14*20+13]= 17; daa[15*20+0]= 342; daa[15*20+1]= 3; daa[15*20+2]= 446;
+ daa[15*20+3]= 16; daa[15*20+4]= 347; daa[15*20+5]= 30; daa[15*20+6]= 21;
+ daa[15*20+7]= 112; daa[15*20+8]= 20; daa[15*20+9]= 0; daa[15*20+10]= 74;
+ daa[15*20+11]= 65; daa[15*20+12]= 47; daa[15*20+13]= 90; daa[15*20+14]= 202;
+ daa[16*20+0]= 681; daa[16*20+1]= 0; daa[16*20+2]= 110; daa[16*20+3]= 0;
+ daa[16*20+4]= 114; daa[16*20+5]= 0; daa[16*20+6]= 4; daa[16*20+7]= 0;
+ daa[16*20+8]= 1; daa[16*20+9]= 360; daa[16*20+10]= 34; daa[16*20+11]= 50;
+ daa[16*20+12]= 691; daa[16*20+13]= 8; daa[16*20+14]= 78; daa[16*20+15]= 614;
+ daa[17*20+0]= 5; daa[17*20+1]= 16; daa[17*20+2]= 6; daa[17*20+3]= 0;
+ daa[17*20+4]= 65; daa[17*20+5]= 0; daa[17*20+6]= 0; daa[17*20+7]= 0;
+ daa[17*20+8]= 0; daa[17*20+9]= 0; daa[17*20+10]= 12; daa[17*20+11]= 0;
+ daa[17*20+12]= 13; daa[17*20+13]= 0; daa[17*20+14]= 7; daa[17*20+15]= 17;
+ daa[17*20+16]= 0; daa[18*20+0]= 0; daa[18*20+1]= 0; daa[18*20+2]= 156;
+ daa[18*20+3]= 0; daa[18*20+4]= 530; daa[18*20+5]= 54; daa[18*20+6]= 0;
+ daa[18*20+7]= 1; daa[18*20+8]= 1525;daa[18*20+9]= 16; daa[18*20+10]= 25;
+ daa[18*20+11]= 67; daa[18*20+12]= 0; daa[18*20+13]= 682; daa[18*20+14]= 8;
+ daa[18*20+15]= 107; daa[18*20+16]= 0; daa[18*20+17]= 14; daa[19*20+0]= 398;
+ daa[19*20+1]= 0; daa[19*20+2]= 0; daa[19*20+3]= 10; daa[19*20+4]= 0;
+ daa[19*20+5]= 33; daa[19*20+6]= 20; daa[19*20+7]= 5; daa[19*20+8]= 0;
+ daa[19*20+9]= 2220; daa[19*20+10]= 100;daa[19*20+11]= 0; daa[19*20+12]= 832;
+ daa[19*20+13]= 6; daa[19*20+14]= 0; daa[19*20+15]= 0; daa[19*20+16]= 237;
+ daa[19*20+17]= 0; daa[19*20+18]= 0;
+
+ f[0]= 0.06920; f[1]= 0.01840; f[2]= 0.04000; f[3]= 0.018600;
+ f[4]= 0.00650; f[5]= 0.02380; f[6]= 0.02360; f[7]= 0.055700;
+ f[8]= 0.02770; f[9]= 0.09050; f[10]=0.16750; f[11]= 0.02210;
+ f[12]=0.05610; f[13]= 0.06110; f[14]=0.05360; f[15]= 0.07250;
+ f[16]=0.08700; f[17]= 0.02930; f[18]=0.03400; f[19]= 0.04280;
+ }
+ break;
+ case LG:
+ {
+ daa[1*20+0] = 0.425093;
+
+ daa[2*20+0] = 0.276818; daa[2*20+1] = 0.751878;
+
+ daa[3*20+0] = 0.395144; daa[3*20+1] = 0.123954; daa[3*20+2] = 5.076149;
+
+ daa[4*20+0] = 2.489084; daa[4*20+1] = 0.534551; daa[4*20+2] = 0.528768; daa[4*20+3] = 0.062556;
+
+ daa[5*20+0] = 0.969894; daa[5*20+1] = 2.807908; daa[5*20+2] = 1.695752; daa[5*20+3] = 0.523386; daa[5*20+4] = 0.084808;
+
+ daa[6*20+0] = 1.038545; daa[6*20+1] = 0.363970; daa[6*20+2] = 0.541712; daa[6*20+3] = 5.243870; daa[6*20+4] = 0.003499; daa[6*20+5] = 4.128591;
+
+ daa[7*20+0] = 2.066040; daa[7*20+1] = 0.390192; daa[7*20+2] = 1.437645; daa[7*20+3] = 0.844926; daa[7*20+4] = 0.569265; daa[7*20+5] = 0.267959; daa[7*20+6] = 0.348847;
+
+ daa[8*20+0] = 0.358858; daa[8*20+1] = 2.426601; daa[8*20+2] = 4.509238; daa[8*20+3] = 0.927114; daa[8*20+4] = 0.640543; daa[8*20+5] = 4.813505; daa[8*20+6] = 0.423881;
+ daa[8*20+7] = 0.311484;
+
+ daa[9*20+0] = 0.149830; daa[9*20+1] = 0.126991; daa[9*20+2] = 0.191503; daa[9*20+3] = 0.010690; daa[9*20+4] = 0.320627; daa[9*20+5] = 0.072854; daa[9*20+6] = 0.044265;
+ daa[9*20+7] = 0.008705; daa[9*20+8] = 0.108882;
+
+ daa[10*20+0] = 0.395337; daa[10*20+1] = 0.301848; daa[10*20+2] = 0.068427; daa[10*20+3] = 0.015076; daa[10*20+4] = 0.594007; daa[10*20+5] = 0.582457; daa[10*20+6] = 0.069673;
+ daa[10*20+7] = 0.044261; daa[10*20+8] = 0.366317; daa[10*20+9] = 4.145067 ;
+
+ daa[11*20+0] = 0.536518; daa[11*20+1] = 6.326067; daa[11*20+2] = 2.145078; daa[11*20+3] = 0.282959; daa[11*20+4] = 0.013266; daa[11*20+5] = 3.234294; daa[11*20+6] = 1.807177;
+ daa[11*20+7] = 0.296636; daa[11*20+8] = 0.697264; daa[11*20+9] = 0.159069; daa[11*20+10] = 0.137500;
+
+
+ daa[12*20+0] = 1.124035; daa[12*20+1] = 0.484133; daa[12*20+2] = 0.371004; daa[12*20+3] = 0.025548; daa[12*20+4] = 0.893680; daa[12*20+5] = 1.672569; daa[12*20+6] = 0.173735;
+ daa[12*20+7] = 0.139538; daa[12*20+8] = 0.442472; daa[12*20+9] = 4.273607; daa[12*20+10] = 6.312358; daa[12*20+11] = 0.656604;
+
+ daa[13*20+0] = 0.253701; daa[13*20+1] = 0.052722;daa[13*20+2] = 0.089525; daa[13*20+3] = 0.017416; daa[13*20+4] = 1.105251; daa[13*20+5] = 0.035855; daa[13*20+6] = 0.018811;
+ daa[13*20+7] = 0.089586; daa[13*20+8] = 0.682139; daa[13*20+9] = 1.112727; daa[13*20+10] = 2.592692; daa[13*20+11] = 0.023918; daa[13*20+12] = 1.798853;
+
+ daa[14*20+0] = 1.177651; daa[14*20+1] = 0.332533;daa[14*20+2] = 0.161787; daa[14*20+3] = 0.394456; daa[14*20+4] = 0.075382; daa[14*20+5] = 0.624294; daa[14*20+6] = 0.419409;
+ daa[14*20+7] = 0.196961; daa[14*20+8] = 0.508851; daa[14*20+9] = 0.078281; daa[14*20+10] = 0.249060; daa[14*20+11] = 0.390322; daa[14*20+12] = 0.099849;
+ daa[14*20+13] = 0.094464;
+
+ daa[15*20+0] = 4.727182; daa[15*20+1] = 0.858151;daa[15*20+2] = 4.008358; daa[15*20+3] = 1.240275; daa[15*20+4] = 2.784478; daa[15*20+5] = 1.223828; daa[15*20+6] = 0.611973;
+ daa[15*20+7] = 1.739990; daa[15*20+8] = 0.990012; daa[15*20+9] = 0.064105; daa[15*20+10] = 0.182287; daa[15*20+11] = 0.748683; daa[15*20+12] = 0.346960;
+ daa[15*20+13] = 0.361819; daa[15*20+14] = 1.338132;
+
+ daa[16*20+0] = 2.139501; daa[16*20+1] = 0.578987;daa[16*20+2] = 2.000679; daa[16*20+3] = 0.425860; daa[16*20+4] = 1.143480; daa[16*20+5] = 1.080136; daa[16*20+6] = 0.604545;
+ daa[16*20+7] = 0.129836; daa[16*20+8] = 0.584262; daa[16*20+9] = 1.033739; daa[16*20+10] = 0.302936; daa[16*20+11] = 1.136863; daa[16*20+12] = 2.020366;
+ daa[16*20+13] = 0.165001; daa[16*20+14] = 0.571468; daa[16*20+15] = 6.472279;
+
+ daa[17*20+0] = 0.180717; daa[17*20+1] = 0.593607;daa[17*20+2] = 0.045376; daa[17*20+3] = 0.029890; daa[17*20+4] = 0.670128; daa[17*20+5] = 0.236199; daa[17*20+6] = 0.077852;
+ daa[17*20+7] = 0.268491; daa[17*20+8] = 0.597054; daa[17*20+9] = 0.111660; daa[17*20+10] = 0.619632; daa[17*20+11] = 0.049906; daa[17*20+12] = 0.696175;
+ daa[17*20+13] = 2.457121; daa[17*20+14] = 0.095131; daa[17*20+15] = 0.248862; daa[17*20+16] = 0.140825;
+
+ daa[18*20+0] = 0.218959; daa[18*20+1] = 0.314440;daa[18*20+2] = 0.612025; daa[18*20+3] = 0.135107; daa[18*20+4] = 1.165532; daa[18*20+5] = 0.257336; daa[18*20+6] = 0.120037;
+ daa[18*20+7] = 0.054679; daa[18*20+8] = 5.306834; daa[18*20+9] = 0.232523; daa[18*20+10] = 0.299648; daa[18*20+11] = 0.131932; daa[18*20+12] = 0.481306;
+ daa[18*20+13] = 7.803902; daa[18*20+14] = 0.089613; daa[18*20+15] = 0.400547; daa[18*20+16] = 0.245841; daa[18*20+17] = 3.151815;
+
+ daa[19*20+0] = 2.547870; daa[19*20+1] = 0.170887;daa[19*20+2] = 0.083688; daa[19*20+3] = 0.037967; daa[19*20+4] = 1.959291; daa[19*20+5] = 0.210332; daa[19*20+6] = 0.245034;
+ daa[19*20+7] = 0.076701; daa[19*20+8] = 0.119013; daa[19*20+9] = 10.649107; daa[19*20+10] = 1.702745; daa[19*20+11] = 0.185202; daa[19*20+12] = 1.898718;
+ daa[19*20+13] = 0.654683; daa[19*20+14] = 0.296501; daa[19*20+15] = 0.098369; daa[19*20+16] = 2.188158; daa[19*20+17] = 0.189510; daa[19*20+18] = 0.249313;
+
+ /*f[0] = 0.07906;
+ f[1] = 0.05594;
+ f[2] = 0.04198;
+ f[3] = 0.05305;
+ f[4] = 0.01294;
+ f[5] = 0.04077;
+ f[6] = 0.07158;
+ f[7] = 0.05734;
+ f[8] = 0.02235;
+ f[9] = 0.06216;
+ f[10] = 0.09908;
+ f[11] = 0.06460;
+ f[12] = 0.02295;
+ f[13] = 0.04230;
+ f[14] = 0.04404;
+ f[15] = 0.06120;
+ f[16] = 0.05329;
+ f[17] = 0.01207;
+ f[18] = 0.03415;
+ f[19] = 0.06915; */
+
+ f[0] = 0.079066; f[1] = 0.055941; f[2] = 0.041977; f[3] = 0.053052;
+ f[4] = 0.012937; f[5] = 0.040767; f[6] = 0.071586; f[7] = 0.057337;
+ f[8] = 0.022355; f[9] = 0.062157; f[10] = 0.099081; f[11] = 0.064600;
+ f[12] = 0.022951; f[13] = 0.042302; f[14] = 0.044040; f[15] = 0.061197;
+ f[16] = 0.053287; f[17] = 0.012066; f[18] = 0.034155; f[19] = 0.069146;
+ }
+ break;
+ case LG4M:
+ {
+ double
+ rates[4][190] =
+ {
+ {
+ 0.269343
+ , 0.254612, 0.150988
+ , 0.236821, 0.031863, 0.659648
+ , 2.506547, 0.938594, 0.975736, 0.175533
+ , 0.359080, 0.348288, 0.697708, 0.086573, 0.095967
+ , 0.304674, 0.156000, 0.377704, 0.449140, 0.064706, 4.342595
+ , 1.692015, 0.286638, 0.565095, 0.380358, 0.617945, 0.202058, 0.264342
+ , 0.251974, 0.921633, 1.267609, 0.309692, 0.390429, 2.344059, 0.217750, 0.104842
+ , 1.085220, 0.325624, 0.818658, 0.037814, 1.144150, 0.534567, 0.222793, 0.062682, 0.567431
+ , 0.676353, 0.602366, 0.217027, 0.007533, 1.595775, 0.671143, 0.158424, 0.070463, 0.764255, 8.226528
+ , 0.179155, 0.971338, 1.343718, 0.133744, 0.122468, 0.983857, 0.994128, 0.220916, 0.410581, 0.387487, 0.181110
+ , 1.636817, 0.515217, 0.670461, 0.071252, 1.534848, 5.288642, 0.255628, 0.094198, 0.257229, 25.667158, 6.819689, 1.591212
+ , 0.235498, 0.123932, 0.099793, 0.030425, 0.897279, 0.112229, 0.022529, 0.047488, 0.762914, 1.344259, 0.865691, 0.038921, 2.030833
+ , 1.265605, 0.040163, 0.173354, 0.027579, 0.259961, 0.580374, 0.088041, 0.145595, 0.143676, 0.298859, 1.020117, 0.000714, 0.190019, 0.093964
+ , 5.368405, 0.470952, 5.267140, 0.780505, 4.986071, 0.890554, 0.377949, 1.755515, 0.786352, 0.527246, 0.667783, 0.659948, 0.731921, 0.837669, 1.355630
+ , 1.539394, 0.326789, 1.688169, 0.283738, 1.389282, 0.329821, 0.231770, 0.117017, 0.449977, 3.531600, 0.721586, 0.497588, 2.691697, 0.152088, 0.698040, 16.321298
+ , 0.140944, 0.375611, 0.025163, 0.002757, 0.801456, 0.257253, 0.103678, 0.132995, 0.345834, 0.377156, 0.839647, 0.176970, 0.505682, 1.670170, 0.091298, 0.210096, 0.013165
+ , 0.199836, 0.146857, 0.806275, 0.234246, 1.436970, 0.319669, 0.010076, 0.036859, 3.503317, 0.598632, 0.738969, 0.154436, 0.579000, 4.245524, 0.074524, 0.454195, 0.232913, 1.178490
+ , 9.435529, 0.285934, 0.395670, 0.130890, 6.097263, 0.516259, 0.503665, 0.222960, 0.149143, 13.666175, 2.988174, 0.162725, 5.973826, 0.843416, 0.597394, 0.701149, 4.680002, 0.300085, 0.416262
+ },
+ {
+ 0.133720
+ , 0.337212, 0.749052
+ , 0.110918, 0.105087, 4.773487
+ , 3.993460, 0.188305, 1.590332, 0.304942
+ , 0.412075, 2.585774, 1.906884, 0.438367, 0.242076
+ , 0.435295, 0.198278, 0.296366, 7.470333, 0.008443, 3.295515
+ , 7.837540, 0.164607, 0.431724, 0.153850, 1.799716, 0.269744, 0.242866
+ , 0.203872, 2.130334, 9.374479, 1.080878, 0.152458, 12.299133, 0.279589, 0.089714
+ , 0.039718, 0.024553, 0.135254, 0.014979, 0.147498, 0.033964, 0.005585, 0.007248, 0.022746
+ , 0.075784, 0.080091, 0.084971, 0.014128, 0.308347, 0.500836, 0.022833, 0.022999, 0.161270, 1.511682
+ , 0.177662, 10.373708, 1.036721, 0.038303, 0.043030, 2.181033, 0.321165, 0.103050, 0.459502, 0.021215, 0.078395
+ , 0.420784, 0.192765, 0.329545, 0.008331, 0.883142, 1.403324, 0.168673, 0.160728, 0.612573, 1.520889, 7.763266, 0.307903
+ , 0.071268, 0.019652, 0.088753, 0.013547, 0.566609, 0.071878, 0.020050, 0.041022, 0.625361, 0.382806, 1.763059, 0.044644, 1.551911
+ , 0.959127, 1.496585, 0.377794, 0.332010, 0.318192, 1.386970, 0.915904, 0.224255, 2.611479, 0.029351, 0.068250, 1.542356, 0.047525, 0.182715
+ , 11.721512, 0.359408, 2.399158, 0.219464, 9.104192, 0.767563, 0.235229, 3.621219, 0.971955, 0.033780, 0.043035, 0.236929, 0.319964, 0.124977, 0.840651
+ , 2.847068, 0.218463, 1.855386, 0.109808, 4.347048, 0.765848, 0.164569, 0.312024, 0.231569, 0.356327, 0.159597, 0.403210, 1.135162, 0.106903, 0.269190, 9.816481
+ , 0.030203, 0.387292, 0.118878, 0.067287, 0.190240, 0.122113, 0.007023, 0.137411, 0.585141, 0.020634, 0.228824, 0.000122, 0.474862, 3.135128, 0.030313, 0.093830, 0.119152
+ , 0.067183, 0.130101, 0.348730, 0.061798, 0.301198, 0.095382, 0.095764, 0.044628, 2.107384, 0.046105, 0.100117, 0.017073, 0.192383, 8.367641, 0.000937, 0.137416, 0.044722, 4.179782
+ , 0.679398, 0.041567, 0.092408, 0.023701, 1.271187, 0.115566, 0.055277, 0.086988, 0.060779, 8.235167, 0.609420, 0.061764, 0.581962, 0.184187, 0.080246, 0.098033, 1.438350, 0.023439, 0.039124
+ },
+ {
+ 0.421017
+ , 0.316236, 0.693340
+ , 0.285984, 0.059926, 6.158219
+ , 4.034031, 1.357707, 0.708088, 0.063669
+ , 0.886972, 2.791622, 1.701830, 0.484347, 0.414286
+ , 0.760525, 0.233051, 0.378723, 4.032667, 0.081977, 4.940411
+ , 0.754103, 0.402894, 2.227443, 1.102689, 0.416576, 0.459376, 0.508409
+ , 0.571422, 2.319453, 5.579973, 0.885376, 1.439275, 4.101979, 0.576745, 0.428799
+ , 0.162152, 0.085229, 0.095692, 0.006129, 0.490937, 0.104843, 0.045514, 0.004705, 0.098934
+ , 0.308006, 0.287051, 0.056994, 0.007102, 0.958988, 0.578990, 0.067119, 0.024403, 0.342983, 3.805528
+ , 0.390161, 7.663209, 1.663641, 0.105129, 0.135029, 3.364474, 0.652618, 0.457702, 0.823674, 0.129858, 0.145630
+ , 1.042298, 0.364551, 0.293222, 0.037983, 1.486520, 1.681752, 0.192414, 0.070498, 0.222626, 4.529623, 4.781730, 0.665308
+ , 0.362476, 0.073439, 0.129245, 0.020078, 1.992483, 0.114549, 0.023272, 0.064490, 1.491794, 1.113437, 2.132006, 0.041677, 1.928654
+ , 1.755491, 0.087050, 0.099325, 0.163817, 0.242851, 0.322939, 0.062943, 0.198698, 0.192904, 0.062948, 0.180283, 0.059655, 0.129323, 0.065778
+ , 3.975060, 0.893398, 5.496314, 1.397313, 3.575120, 1.385297, 0.576191, 1.733288, 1.021255, 0.065131, 0.129115, 0.600308, 0.387276, 0.446001, 1.298493
+ , 2.565079, 0.534056, 2.143993, 0.411388, 2.279084, 0.893006, 0.528209, 0.135731, 0.518741, 0.972662, 0.280700, 0.890086, 1.828755, 0.189028, 0.563778, 7.788147
+ , 0.283631, 0.497926, 0.075454, 0.043794, 1.335322, 0.308605, 0.140137, 0.150797, 1.409726, 0.119868, 0.818331, 0.080591, 1.066017, 3.754687, 0.073415, 0.435046, 0.197272
+ , 0.242513, 0.199157, 0.472207, 0.085937, 2.039787, 0.262751, 0.084578, 0.032247, 7.762326, 0.153966, 0.299828, 0.117255, 0.438215, 14.506235, 0.089180, 0.352766, 0.215417, 5.054245
+ , 2.795818, 0.107130, 0.060909, 0.029724, 2.986426, 0.197267, 0.196977, 0.044327, 0.116751, 7.144311, 1.848622, 0.118020, 1.999696, 0.705747, 0.272763, 0.096935, 1.820982, 0.217007, 0.172975
+ },
+ {
+ 0.576160
+ , 0.567606, 0.498643
+ , 0.824359, 0.050698, 3.301401
+ , 0.822724, 4.529235, 1.291808, 0.101930
+ , 1.254238, 2.169809, 1.427980, 0.449474, 0.868679
+ , 1.218615, 0.154502, 0.411471, 3.172277, 0.050239, 2.138661
+ , 1.803443, 0.604673, 2.125496, 1.276384, 1.598679, 0.502653, 0.479490
+ , 0.516862, 2.874265, 4.845769, 0.719673, 3.825677, 4.040275, 0.292773, 0.596643
+ , 0.180898, 0.444586, 0.550969, 0.023542, 2.349573, 0.370160, 0.142187, 0.016618, 0.500788
+ , 0.452099, 0.866322, 0.201033, 0.026731, 2.813990, 1.645178, 0.135556, 0.072152, 1.168817, 5.696116
+ , 0.664186, 2.902886, 2.101971, 0.127988, 0.200218, 2.505933, 0.759509, 0.333569, 0.623100, 0.547454, 0.363656
+ , 0.864415, 0.835049, 0.632649, 0.079201, 2.105931, 1.633544, 0.216462, 0.252419, 0.665406, 7.994105, 11.751178, 1.096842
+ , 0.324478, 0.208947, 0.280339, 0.041683, 4.788477, 0.107022, 0.067711, 0.171320, 3.324779, 2.965328, 5.133843, 0.084856, 4.042591
+ , 1.073043, 0.173826, 0.041985, 0.270336, 0.121299, 0.351384, 0.228565, 0.225318, 0.376089, 0.058027, 0.390354, 0.214230, 0.058954, 0.126299
+ , 3.837562, 0.884342, 4.571911, 0.942751, 6.592827, 1.080063, 0.465397, 3.137614, 1.119667, 0.362516, 0.602355, 0.716940, 0.506796, 1.444484, 1.432558
+ , 2.106026, 0.750016, 2.323325, 0.335915, 1.654673, 1.194017, 0.617231, 0.318671, 0.801030, 4.455842, 0.580191, 1.384210, 3.522468, 0.473128, 0.432718, 5.716300
+ , 0.163720, 0.818102, 0.072322, 0.068275, 3.305436, 0.373790, 0.054323, 0.476587, 1.100360, 0.392946, 1.703323, 0.085720, 1.725516, 5.436253, 0.053108, 0.498594, 0.231832
+ , 0.241167, 0.302440, 1.055095, 0.246940, 9.741942, 0.249895, 0.129973, 0.052363, 11.542498, 1.047449, 1.319667, 0.139770, 1.330225, 26.562270, 0.046986, 0.737653, 0.313460, 5.165098
+ , 1.824586, 0.435795, 0.179086, 0.091739, 3.609570, 0.649507, 0.656681, 0.225234, 0.473437, 19.897252, 3.001995, 0.452926, 3.929598, 1.692159, 0.370204, 0.373501, 3.329822, 0.326593, 0.860743
+ }
+ };
+
+ double
+ freqs[4][20] =
+ {{0.082276,0.055172,0.043853,0.053484,0.018957,0.028152,0.046679,0.157817,0.033297,0.028284,0.054284,0.025275,0.023665,0.041874,0.063071,0.066501,0.065424,0.023837,0.038633,0.049465},
+ {0.120900,0.036460,0.026510,0.040410,0.015980,0.021132,0.025191,0.036369,0.015884,0.111029,0.162852,0.024820,0.028023,0.074058,0.012065,0.041963,0.039072,0.012666,0.040478,0.114137},
+ {0.072639,0.051691,0.038642,0.055580,0.009829,0.031374,0.048731,0.065283,0.023791,0.086640,0.120847,0.052177,0.026728,0.032589,0.039238,0.046748,0.053361,0.008024,0.037426,0.098662},
+ {0.104843,0.078835,0.043513,0.090498,0.002924,0.066163,0.151640,0.038843,0.022556,0.018383,0.038687,0.104462,0.010166,0.009089,0.066950,0.053667,0.049486,0.004409,0.012924,0.031963}};
+
+
+ makeAASubstMat(daa, f, rates[lg4_index], freqs[lg4_index]);
+
+ /*int
+ i,
+ j,
+ r = 0;
+
+ for(i = 1; i < 20; i++)
+ for(j = 0; j < i; j++)
+ {
+ daa[i * 20 + j] = rates[lg4_index][r];
+ r++;
+ }
+
+ assert(r == 190);
+
+ for(i = 0; i < 20; i++)
+ f[i] = freqs[lg4_index][i]; */
+
+ }
+ break;
+ case LG4X:
+ {
+ double
+ rates[4][190] =
+ {
+ {
+ 0.295719,
+ 0.067388, 0.448317,
+ 0.253712, 0.457483, 2.358429,
+ 1.029289, 0.576016, 0.251987, 0.189008,
+ 0.107964, 1.741924, 0.216561, 0.599450, 0.029955,
+ 0.514644, 0.736017, 0.503084, 109.901504, 0.084794, 4.117654,
+ 10.868848, 0.704334, 0.435271, 1.070052, 1.862626, 0.246260, 1.202023,
+ 0.380498, 5.658311, 4.873453, 5.229858, 0.553477, 6.508329, 1.634845, 0.404968,
+ 0.084223, 0.123387, 0.090748, 0.052764, 0.151733, 0.054187, 0.060194, 0.048984, 0.204296,
+ 0.086976, 0.221777, 0.033310, 0.021407, 0.230320, 0.195703, 0.069359, 0.069963, 0.504221, 1.495537,
+ 0.188789, 93.433377, 0.746537, 0.621146, 0.096955, 1.669092, 2.448827, 0.256662, 1.991533, 0.091940, 0.122332,
+ 0.286389, 0.382175, 0.128905, 0.081091, 0.352526, 0.810168, 0.232297, 0.228519, 0.655465, 1.994320, 3.256485, 0.457430,
+ 0.155567, 0.235965, 0.127321, 0.205164, 0.590018, 0.066081, 0.064822, 0.241077, 6.799829, 0.754940, 2.261319, 0.163849, 1.559944,
+ 1.671061, 6.535048, 0.904011, 5.164456, 0.386853, 2.437439, 3.537387, 4.320442, 11.291065, 0.170343, 0.848067, 5.260446, 0.426508, 0.438856,
+ 2.132922, 0.525521, 0.939733, 0.747330, 1.559564, 0.165666, 0.435384, 3.656545, 0.961142, 0.050315, 0.064441, 0.360946, 0.132547, 0.306683, 4.586081,
+ 0.529591, 0.303537, 0.435450, 0.308078, 0.606648, 0.106333, 0.290413, 0.290216, 0.448965, 0.372166, 0.102493, 0.389413, 0.498634, 0.109129, 2.099355, 3.634276,
+ 0.115551, 0.641259, 0.046646, 0.260889, 0.587531, 0.093417, 0.280695, 0.307466, 6.227274, 0.206332, 0.459041, 0.033291, 0.559069, 18.392863, 0.411347, 0.101797, 0.034710,
+ 0.102453, 0.289466, 0.262076, 0.185083, 0.592318, 0.035149, 0.105999, 0.096556, 20.304886, 0.097050, 0.133091, 0.115301, 0.264728, 66.647302, 0.476350, 0.148995, 0.063603, 20.561407,
+ 0.916683, 0.102065, 0.043986, 0.080708, 0.885230, 0.072549, 0.206603, 0.306067, 0.205944, 5.381403, 0.561215, 0.112593, 0.693307, 0.400021, 0.584622, 0.089177, 0.755865, 0.133790, 0.154902
+ },
+ {
+ 0.066142,
+ 0.590377, 0.468325,
+ 0.069930, 0.013688, 2.851667,
+ 9.850951, 0.302287, 3.932151, 0.146882,
+ 1.101363, 1.353957, 8.159169, 0.249672, 0.582670,
+ 0.150375, 0.028386, 0.219934, 0.560142, 0.005035, 3.054085,
+ 0.568586, 0.037750, 0.421974, 0.046719, 0.275844, 0.129551, 0.037250,
+ 0.051668, 0.262130, 2.468752, 0.106259, 0.098208, 4.210126, 0.029788, 0.013513,
+ 0.127170, 0.016923, 0.344765, 0.003656, 0.445038, 0.165753, 0.008541, 0.002533, 0.031779,
+ 0.292429, 0.064289, 0.210724, 0.004200, 1.217010, 1.088704, 0.014768, 0.005848, 0.064558, 7.278994,
+ 0.071458, 0.855973, 1.172204, 0.014189, 0.033969, 1.889645, 0.125869, 0.031390, 0.065585, 0.029917, 0.042762,
+ 1.218562, 0.079621, 0.763553, 0.009876, 1.988516, 3.344809, 0.056702, 0.021612, 0.079927, 7.918203, 14.799537, 0.259400,
+ 0.075144, 0.011169, 0.082464, 0.002656, 0.681161, 0.111063, 0.004186, 0.004854, 0.095591, 0.450964, 1.506485, 0.009457, 1.375871,
+ 7.169085, 0.161937, 0.726566, 0.040244, 0.825960, 2.067758, 0.110993, 0.129497, 0.196886, 0.169797, 0.637893, 0.090576, 0.457399, 0.143327,
+ 30.139501, 0.276530, 11.149790, 0.267322, 18.762977, 3.547017, 0.201148, 0.976631, 0.408834, 0.104288, 0.123793, 0.292108, 0.598048, 0.328689, 3.478333,
+ 13.461692, 0.161053, 4.782635, 0.053740, 11.949233, 2.466507, 0.139705, 0.053397, 0.126088, 1.578530, 0.641351, 0.297913, 4.418398, 0.125011, 2.984862, 13.974326,
+ 0.021372, 0.081472, 0.058046, 0.006597, 0.286794, 0.188236, 0.009201, 0.019475, 0.037226, 0.015909, 0.154810, 0.017172, 0.239749, 0.562720, 0.061299, 0.154326, 0.060703,
+ 0.045779, 0.036742, 0.498072, 0.027639, 0.534219, 0.203493, 0.012095, 0.004964, 0.452302, 0.094365, 0.140750, 0.021976, 0.168432, 1.414883, 0.077470, 0.224675, 0.123480, 0.447011,
+ 4.270235, 0.030342, 0.258487, 0.012745, 4.336817, 0.281953, 0.043812, 0.015539, 0.016212, 16.179952, 3.416059, 0.032578, 2.950318, 0.227807, 1.050562, 0.112000, 5.294490, 0.033381, 0.045528
+ },
+ {
+ 0.733336,
+ 0.558955, 0.597671,
+ 0.503360, 0.058964, 5.581680,
+ 4.149599, 2.863355, 1.279881, 0.225860,
+ 1.415369, 2.872594, 1.335650, 0.434096, 1.043232,
+ 1.367574, 0.258365, 0.397108, 2.292917, 0.209978, 4.534772,
+ 1.263002, 0.366868, 1.840061, 1.024707, 0.823594, 0.377181, 0.496780,
+ 0.994098, 2.578946, 5.739035, 0.821921, 3.039380, 4.877840, 0.532488, 0.398817,
+ 0.517204, 0.358350, 0.284730, 0.027824, 1.463390, 0.370939, 0.232460, 0.008940, 0.349195,
+ 0.775054, 0.672023, 0.109781, 0.021443, 1.983693, 1.298542, 0.169219, 0.043707, 0.838324, 5.102837,
+ 0.763094, 5.349861, 1.612642, 0.088850, 0.397640, 3.509873, 0.755219, 0.436013, 0.888693, 0.561690, 0.401070,
+ 1.890137, 0.691594, 0.466979, 0.060820, 2.831098, 2.646440, 0.379926, 0.087640, 0.488389, 7.010411, 8.929538, 1.357738,
+ 0.540460, 0.063347, 0.141582, 0.018288, 4.102068, 0.087872, 0.020447, 0.064863, 1.385133, 3.054968, 5.525874, 0.043394, 3.135353,
+ 0.200122, 0.032875, 0.019509, 0.042687, 0.059723, 0.072299, 0.023282, 0.036426, 0.050226, 0.039318, 0.067505, 0.023126, 0.012695, 0.015631,
+ 4.972745, 0.821562, 4.670980, 1.199607, 5.901348, 1.139018, 0.503875, 1.673207, 0.962470, 0.204155, 0.273372, 0.567639, 0.570771, 0.458799, 0.233109,
+ 1.825593, 0.580847, 1.967383, 0.420710, 2.034980, 0.864479, 0.577513, 0.124068, 0.502294, 2.653232, 0.437116, 1.048288, 2.319555, 0.151684, 0.077004, 8.113282,
+ 0.450842, 0.661866, 0.088064, 0.037642, 2.600668, 0.390688, 0.109318, 0.218118, 1.065585, 0.564368, 1.927515, 0.120994, 1.856122, 4.154750, 0.011074, 0.377578, 0.222293,
+ 0.526135, 0.265730, 0.581928, 0.141233, 5.413080, 0.322761, 0.153776, 0.039217, 8.351808, 0.854294, 0.940458, 0.180650, 0.975427, 11.429924, 0.026268, 0.429221, 0.273138, 4.731579,
+ 3.839269, 0.395134, 0.145401, 0.090101, 4.193725, 0.625409, 0.696533, 0.104335, 0.377304, 15.559906, 2.508169, 0.449074, 3.404087, 1.457957, 0.052132, 0.260296, 2.903836, 0.564762, 0.681215
+ },
+ {
+ 0.658412,
+ 0.566269, 0.540749,
+ 0.854111, 0.058015, 3.060574,
+ 0.884454, 5.851132, 1.279257, 0.160296,
+ 1.309554, 2.294145, 1.438430, 0.482619, 0.992259,
+ 1.272639, 0.182966, 0.431464, 2.992763, 0.086318, 2.130054,
+ 1.874713, 0.684164, 2.075952, 1.296206, 2.149634, 0.571406, 0.507160,
+ 0.552007, 3.192521, 4.840271, 0.841829, 5.103188, 4.137385, 0.351381, 0.679853,
+ 0.227683, 0.528161, 0.644656, 0.031467, 3.775817, 0.437589, 0.189152, 0.025780, 0.665865,
+ 0.581512, 1.128882, 0.266076, 0.048542, 3.954021, 2.071689, 0.217780, 0.082005, 1.266791, 8.904999,
+ 0.695190, 3.010922, 2.084975, 0.132774, 0.190734, 2.498630, 0.767361, 0.326441, 0.680174, 0.652629, 0.440178,
+ 0.967985, 1.012866, 0.720060, 0.133055, 1.776095, 1.763546, 0.278392, 0.343977, 0.717301, 10.091413, 14.013035, 1.082703,
+ 0.344015, 0.227296, 0.291854, 0.056045, 4.495841, 0.116381, 0.092075, 0.195877, 4.001286, 2.671718, 5.069337, 0.091278, 4.643214,
+ 0.978992, 0.156635, 0.028961, 0.209188, 0.264277, 0.296578, 0.177263, 0.217424, 0.362942, 0.086367, 0.539010, 0.172734, 0.121821, 0.161015,
+ 3.427163, 0.878405, 4.071574, 0.925172, 7.063879, 1.033710, 0.451893, 3.057583, 1.189259, 0.359932, 0.742569, 0.693405, 0.584083, 1.531223, 1.287474,
+ 2.333253, 0.802754, 2.258357, 0.360522, 2.221150, 1.283423, 0.653836, 0.377558, 0.964545, 4.797423, 0.780580, 1.422571, 4.216178, 0.599244, 0.444362, 5.231362,
+ 0.154701, 0.830884, 0.073037, 0.094591, 3.017954, 0.312579, 0.074620, 0.401252, 1.350568, 0.336801, 1.331875, 0.068958, 1.677263, 5.832025, 0.076328, 0.548763, 0.208791,
+ 0.221089, 0.431617, 1.238426, 0.313945, 8.558815, 0.305772, 0.181992, 0.072258, 12.869737, 1.021885, 1.531589, 0.163829, 1.575754, 33.873091, 0.079916, 0.831890, 0.307846, 5.910440,
+ 2.088785, 0.456530, 0.199728, 0.118104, 4.310199, 0.681277, 0.752277, 0.241015, 0.531100, 23.029406, 4.414850, 0.481711, 5.046403, 1.914768, 0.466823, 0.382271, 3.717971, 0.282540, 0.964421
+ }
+ };
+
+ double
+ freqs[4][20] =
+ {{0.147383 , 0.017579 , 0.058208 , 0.017707 , 0.026331 , 0.041582 , 0.017494 , 0.027859 , 0.011849 , 0.076971 ,
+ 0.147823 , 0.019535 , 0.037132 , 0.029940 , 0.008059 , 0.088179 , 0.089653 , 0.006477 , 0.032308 , 0.097931},
+ {0.063139 , 0.066357 , 0.011586 , 0.066571 , 0.010800 , 0.009276 , 0.053984 , 0.146986 , 0.034214 , 0.088822 ,
+ 0.098196 , 0.032390 , 0.021263 , 0.072697 , 0.016761 , 0.020711 , 0.020797 , 0.025463 , 0.045615 , 0.094372},
+ {0.062457 , 0.066826 , 0.049332 , 0.065270 , 0.006513 , 0.041231 , 0.058965 , 0.080852 , 0.028024 , 0.037024 ,
+ 0.075925 , 0.064131 , 0.019620 , 0.028710 , 0.104579 , 0.056388 , 0.062027 , 0.008241 , 0.033124 , 0.050760},
+ {0.106471 , 0.074171 , 0.044513 , 0.096390 , 0.002148 , 0.066733 , 0.158908 , 0.037625 , 0.020691 , 0.014608 ,
+ 0.028797 , 0.105352 , 0.007864 , 0.007477 , 0.083595 , 0.055726 , 0.047711 , 0.003975 , 0.010088 , 0.027159}};
+
+
+ makeAASubstMat(daa, f, rates[lg4_index], freqs[lg4_index]);
+
+ /*int
+ i,
+ j,
+ r = 0;
+
+ for(i = 1; i < 20; i++)
+ for(j = 0; j < i; j++)
+ {
+ daa[i * 20 + j] = rates[lg4_index][r];
+ r++;
+ }
+
+ assert(r == 190);
+
+ for(i = 0; i < 20; i++)
+ f[i] = freqs[lg4_index][i]; */
+
+ }
+ break;
+ case STMTREV:
+ {
+ double rates[190] =
+ {
+ 0.1159435373,
+ 0.2458816714, 0.1355713516,
+ 0.9578712472, 0.0775041665, 8.4408676914,
+ 0.2327281954, 9.1379470330, 0.1137687264, 0.0582110367,
+ 0.3309250853, 5.2854173238, 0.1727184754, 0.8191776581, 0.0009722083,
+ 0.6946680829, 0.0966719296, 0.2990806606, 7.3729791633, 0.0005604799, 3.5773486727,
+ 2.8076062202, 3.0815651393, 0.5575702616, 2.2627839242, 1.1721237455, 0.0482085663, 3.3184632572,
+ 0.2275494971, 2.8251848421, 9.5228608030, 2.3191131858, 0.0483235836, 4.4138715270, 0.0343694246, 0.0948383460,
+ 0.0627691644, 0.5712158076, 0.2238609194, 0.0205779319, 0.1527276944, 0.0206129952, 0.0328079744, 0.1239000315, 0.0802374651,
+ 0.0305818840, 0.1930408758, 0.0540967250, 0.0018843293, 0.2406073246, 0.3299454620, 0.0373753435, 0.0005918940, 0.1192904610, 1.3184058362,
+ 0.2231434272, 6.0541970908, 4.3977466558, 0.1347413792, 0.0001480536, 5.2864094506, 6.8883522181, 0.5345755286, 0.3991624551, 0.2107928508, 0.1055933141,
+ 0.1874527991, 0.2427875732, 0.0433577842, 0.0000022173, 0.0927357503, 0.0109238300, 0.0663619185, 0.0128777966, 0.0722334577, 4.3016010974, 1.1493262595, 0.4773694701,
+ 0.0458112245, 0.0310030750, 0.0233493970, 0.0000080023, 0.8419347601, 0.0027817812, 0.0361207581, 0.0490593583, 0.0197089530, 0.3634155844, 2.1032860162, 0.0861057517, 0.1735660361,
+ 1.5133910481, 0.7858555362, 0.3000131148, 0.3337627573, 0.0036260499, 1.5386413234, 0.5196922389, 0.0221252552, 1.0171151697, 0.0534088166, 6.0377879080, 0.4350064365, 0.1634497017,
+ 0.3545179411,
+ 2.3008246523, 0.7625702322, 1.9431704326, 0.6961369276, 2.3726544756, 0.1837198343, 0.9087013201, 2.5477016916, 0.3081949928, 0.1713464632, 2.7297706102, 0.3416923226, 0.0730798705,
+ 4.0107845583, 8.4630191575,
+ 4.3546170435, 1.0655012755, 1.6534489471, 0.0985354973, 0.1940108923, 0.3415280861, 0.2794040892, 0.1657005971, 0.2704552047, 2.3418182855, 0.0426297282, 1.2152488582, 4.6553742047,
+ 0.0068797851, 1.1613183519, 2.2213527952,
+ 0.0565037747, 6.7852754661, 0.0000010442, 0.0000002842, 0.9529353202, 0.0009844045, 0.0002705734, 0.5068170211, 0.0000932799, 0.0050518699, 0.3163744815, 0.0000023280, 0.1010587493,
+ 0.2890102379, 0.0041564377, 0.0495269526, 0.0002026765,
+ 0.0358664532, 0.0714121777, 0.3036789915, 1.3220740967, 1.7972997876, 0.0066458178, 0.3052655031, 0.0174305437, 21.9842817264, 0.1070890246, 0.0770894218, 0.1929529483, 0.0561599188,
+ 1.6748429971, 0.0021338646, 1.8890678523, 0.2834320440, 0.3134203648,
+ 3.2116908598, 0.0108028571, 0.0860833645, 0.0426724431, 0.3652373073, 0.0287789552, 0.1484349765, 0.5158740953, 0.0059791370, 3.3648305163, 0.8763855707, 0.0776875418, 0.9145670668,
+ 0.3963331926, 0.1080226203, 0.0640951379, 0.2278998021, 0.0388755869, 0.1836950254};
+
+ double
+ freqs[20] = {0.0461811000, 0.0534080000, 0.0361971000, 0.0233326000, 0.0234170000, 0.0390397000, 0.0341284001, 0.0389164000, 0.0164640000, 0.0891534000,
+ 0.1617310001, 0.0551341000, 0.0233262000, 0.0911252000, 0.0344713001, 0.0771077000, 0.0418603001, 0.0200784000, 0.0305429000, 0.0643851996};
+
+ makeAASubstMat(daa, f, rates, freqs);
+ }
+ break;
+ case MTART:
+ {
+ daa[1*20+0]= 0.2;
+ daa[2*20+0]= 0.2;
+ daa[2*20+1]= 0.2;
+ daa[3*20+0]= 1;
+ daa[3*20+1]= 4;
+ daa[3*20+2]= 500;
+ daa[4*20+0]= 254;
+ daa[4*20+1]= 36;
+ daa[4*20+2]= 98;
+ daa[4*20+3]= 11;
+ daa[5*20+0]= 0.2;
+ daa[5*20+1]= 154;
+ daa[5*20+2]= 262;
+ daa[5*20+3]= 0.2;
+ daa[5*20+4]= 0.2;
+ daa[6*20+0]= 0.2;
+ daa[6*20+1]= 0.2;
+ daa[6*20+2]= 183;
+ daa[6*20+3]= 862;
+ daa[6*20+4]= 0.2;
+ daa[6*20+5]= 262;
+ daa[7*20+0]= 200;
+ daa[7*20+1]= 0.2;
+ daa[7*20+2]= 121;
+ daa[7*20+3]= 12;
+ daa[7*20+4]= 81;
+ daa[7*20+5]= 3;
+ daa[7*20+6]= 44;
+ daa[8*20+0]= 0.2;
+ daa[8*20+1]= 41;
+ daa[8*20+2]= 180;
+ daa[8*20+3]= 0.2;
+ daa[8*20+4]= 12;
+ daa[8*20+5]= 314;
+ daa[8*20+6]= 15;
+ daa[8*20+7]= 0.2;
+ daa[9*20+0]= 26;
+ daa[9*20+1]= 2;
+ daa[9*20+2]= 21;
+ daa[9*20+3]= 7;
+ daa[9*20+4]= 63;
+ daa[9*20+5]= 11;
+ daa[9*20+6]= 7;
+ daa[9*20+7]= 3;
+ daa[9*20+8]= 0.2;
+ daa[10*20+0]= 4;
+ daa[10*20+1]= 2;
+ daa[10*20+2]= 13;
+ daa[10*20+3]= 1;
+ daa[10*20+4]= 79;
+ daa[10*20+5]= 16;
+ daa[10*20+6]= 2;
+ daa[10*20+7]= 1;
+ daa[10*20+8]= 6;
+ daa[10*20+9]= 515;
+ daa[11*20+0]= 0.2;
+ daa[11*20+1]= 209;
+ daa[11*20+2]= 467;
+ daa[11*20+3]= 2;
+ daa[11*20+4]= 0.2;
+ daa[11*20+5]= 349;
+ daa[11*20+6]= 106;
+ daa[11*20+7]= 0.2;
+ daa[11*20+8]= 0.2;
+ daa[11*20+9]= 3;
+ daa[11*20+10]= 4;
+ daa[12*20+0]= 121;
+ daa[12*20+1]= 5;
+ daa[12*20+2]= 79;
+ daa[12*20+3]= 0.2;
+ daa[12*20+4]= 312;
+ daa[12*20+5]= 67;
+ daa[12*20+6]= 0.2;
+ daa[12*20+7]= 56;
+ daa[12*20+8]= 0.2;
+ daa[12*20+9]= 515;
+ daa[12*20+10]= 885;
+ daa[12*20+11]= 106;
+ daa[13*20+0]= 13;
+ daa[13*20+1]= 5;
+ daa[13*20+2]= 20;
+ daa[13*20+3]= 0.2;
+ daa[13*20+4]= 184;
+ daa[13*20+5]= 0.2;
+ daa[13*20+6]= 0.2;
+ daa[13*20+7]= 1;
+ daa[13*20+8]= 14;
+ daa[13*20+9]= 118;
+ daa[13*20+10]= 263;
+ daa[13*20+11]= 11;
+ daa[13*20+12]= 322;
+ daa[14*20+0]= 49;
+ daa[14*20+1]= 0.2;
+ daa[14*20+2]= 17;
+ daa[14*20+3]= 0.2;
+ daa[14*20+4]= 0.2;
+ daa[14*20+5]= 39;
+ daa[14*20+6]= 8;
+ daa[14*20+7]= 0.2;
+ daa[14*20+8]= 1;
+ daa[14*20+9]= 0.2;
+ daa[14*20+10]= 12;
+ daa[14*20+11]= 17;
+ daa[14*20+12]= 5;
+ daa[14*20+13]= 15;
+ daa[15*20+0]= 673;
+ daa[15*20+1]= 3;
+ daa[15*20+2]= 398;
+ daa[15*20+3]= 44;
+ daa[15*20+4]= 664;
+ daa[15*20+5]= 52;
+ daa[15*20+6]= 31;
+ daa[15*20+7]= 226;
+ daa[15*20+8]= 11;
+ daa[15*20+9]= 7;
+ daa[15*20+10]= 8;
+ daa[15*20+11]= 144;
+ daa[15*20+12]= 112;
+ daa[15*20+13]= 36;
+ daa[15*20+14]= 87;
+ daa[16*20+0]= 244;
+ daa[16*20+1]= 0.2;
+ daa[16*20+2]= 166;
+ daa[16*20+3]= 0.2;
+ daa[16*20+4]= 183;
+ daa[16*20+5]= 44;
+ daa[16*20+6]= 43;
+ daa[16*20+7]= 0.2;
+ daa[16*20+8]= 19;
+ daa[16*20+9]= 204;
+ daa[16*20+10]= 48;
+ daa[16*20+11]= 70;
+ daa[16*20+12]= 289;
+ daa[16*20+13]= 14;
+ daa[16*20+14]= 47;
+ daa[16*20+15]= 660;
+ daa[17*20+0]= 0.2;
+ daa[17*20+1]= 0.2;
+ daa[17*20+2]= 8;
+ daa[17*20+3]= 0.2;
+ daa[17*20+4]= 22;
+ daa[17*20+5]= 7;
+ daa[17*20+6]= 11;
+ daa[17*20+7]= 2;
+ daa[17*20+8]= 0.2;
+ daa[17*20+9]= 0.2;
+ daa[17*20+10]= 21;
+ daa[17*20+11]= 16;
+ daa[17*20+12]= 71;
+ daa[17*20+13]= 54;
+ daa[17*20+14]= 0.2;
+ daa[17*20+15]= 2;
+ daa[17*20+16]= 0.2;
+ daa[18*20+0]= 1;
+ daa[18*20+1]= 4;
+ daa[18*20+2]= 251;
+ daa[18*20+3]= 0.2;
+ daa[18*20+4]= 72;
+ daa[18*20+5]= 87;
+ daa[18*20+6]= 8;
+ daa[18*20+7]= 9;
+ daa[18*20+8]= 191;
+ daa[18*20+9]= 12;
+ daa[18*20+10]= 20;
+ daa[18*20+11]= 117;
+ daa[18*20+12]= 71;
+ daa[18*20+13]= 792;
+ daa[18*20+14]= 18;
+ daa[18*20+15]= 30;
+ daa[18*20+16]= 46;
+ daa[18*20+17]= 38;
+ daa[19*20+0]= 340;
+ daa[19*20+1]= 0.2;
+ daa[19*20+2]= 23;
+ daa[19*20+3]= 0.2;
+ daa[19*20+4]= 350;
+ daa[19*20+5]= 0.2;
+ daa[19*20+6]= 14;
+ daa[19*20+7]= 3;
+ daa[19*20+8]= 0.2;
+ daa[19*20+9]= 1855;
+ daa[19*20+10]= 85;
+ daa[19*20+11]= 26;
+ daa[19*20+12]= 281;
+ daa[19*20+13]= 52;
+ daa[19*20+14]= 32;
+ daa[19*20+15]= 61;
+ daa[19*20+16]= 544;
+ daa[19*20+17]= 0.2;
+ daa[19*20+18]= 2;
+
+ f[0]= 0.054116;
+ f[1]= 0.018227;
+ f[2]= 0.039903;
+ f[3]= 0.020160;
+ f[4]= 0.009709;
+ f[5]= 0.018781;
+ f[6]= 0.024289;
+ f[7]= 0.068183;
+ f[8]= 0.024518;
+ f[9]= 0.092638;
+ f[10]= 0.148658;
+ f[11]= 0.021718;
+ f[12]= 0.061453;
+ f[13]= 0.088668;
+ f[14]= 0.041826;
+ f[15]= 0.091030;
+ f[16]= 0.049194;
+ f[17]= 0.029786;
+ f[18]= 0.039443;
+ f[19]= 0.057700;
+ }
+ break;
+ case MTZOA:
+ {
+ daa[1*20+0]= 3.3;
+ daa[2*20+0]= 1.7;
+ daa[2*20+1]= 33.6;
+ daa[3*20+0]= 16.1;
+ daa[3*20+1]= 3.2;
+ daa[3*20+2]= 617.0;
+ daa[4*20+0]= 272.5;
+ daa[4*20+1]= 61.1;
+ daa[4*20+2]= 94.6;
+ daa[4*20+3]= 9.5;
+ daa[5*20+0]= 7.3;
+ daa[5*20+1]= 231.0;
+ daa[5*20+2]= 190.3;
+ daa[5*20+3]= 19.3;
+ daa[5*20+4]= 49.1;
+ daa[6*20+0]= 17.1;
+ daa[6*20+1]= 6.4;
+ daa[6*20+2]= 174.0;
+ daa[6*20+3]= 883.6;
+ daa[6*20+4]= 3.4;
+ daa[6*20+5]= 349.4;
+ daa[7*20+0]= 289.3;
+ daa[7*20+1]= 7.2;
+ daa[7*20+2]= 99.3;
+ daa[7*20+3]= 26.0;
+ daa[7*20+4]= 82.4;
+ daa[7*20+5]= 8.9;
+ daa[7*20+6]= 43.1;
+ daa[8*20+0]= 2.3;
+ daa[8*20+1]= 61.7;
+ daa[8*20+2]= 228.9;
+ daa[8*20+3]= 55.6;
+ daa[8*20+4]= 37.5;
+ daa[8*20+5]= 421.8;
+ daa[8*20+6]= 14.9;
+ daa[8*20+7]= 7.4;
+ daa[9*20+0]= 33.2;
+ daa[9*20+1]= 0.2;
+ daa[9*20+2]= 24.3;
+ daa[9*20+3]= 1.5;
+ daa[9*20+4]= 48.8;
+ daa[9*20+5]= 0.2;
+ daa[9*20+6]= 7.3;
+ daa[9*20+7]= 3.4;
+ daa[9*20+8]= 1.6;
+ daa[10*20+0]= 15.6;
+ daa[10*20+1]= 4.1;
+ daa[10*20+2]= 7.9;
+ daa[10*20+3]= 0.5;
+ daa[10*20+4]= 59.7;
+ daa[10*20+5]= 23.0;
+ daa[10*20+6]= 1.0;
+ daa[10*20+7]= 3.5;
+ daa[10*20+8]= 6.6;
+ daa[10*20+9]= 425.2;
+ daa[11*20+0]= 0.2;
+ daa[11*20+1]= 292.3;
+ daa[11*20+2]= 413.4;
+ daa[11*20+3]= 0.2;
+ daa[11*20+4]= 0.2;
+ daa[11*20+5]= 334.0;
+ daa[11*20+6]= 163.2;
+ daa[11*20+7]= 10.1;
+ daa[11*20+8]= 23.9;
+ daa[11*20+9]= 8.4;
+ daa[11*20+10]= 6.7;
+ daa[12*20+0]= 136.5;
+ daa[12*20+1]= 3.8;
+ daa[12*20+2]= 73.7;
+ daa[12*20+3]= 0.2;
+ daa[12*20+4]= 264.8;
+ daa[12*20+5]= 83.9;
+ daa[12*20+6]= 0.2;
+ daa[12*20+7]= 52.2;
+ daa[12*20+8]= 7.1;
+ daa[12*20+9]= 449.7;
+ daa[12*20+10]= 636.3;
+ daa[12*20+11]= 83.0;
+ daa[13*20+0]= 26.5;
+ daa[13*20+1]= 0.2;
+ daa[13*20+2]= 12.9;
+ daa[13*20+3]= 2.0;
+ daa[13*20+4]= 167.8;
+ daa[13*20+5]= 9.5;
+ daa[13*20+6]= 0.2;
+ daa[13*20+7]= 5.8;
+ daa[13*20+8]= 13.1;
+ daa[13*20+9]= 90.3;
+ daa[13*20+10]= 234.2;
+ daa[13*20+11]= 16.3;
+ daa[13*20+12]= 215.6;
+ daa[14*20+0]= 61.8;
+ daa[14*20+1]= 7.5;
+ daa[14*20+2]= 22.6;
+ daa[14*20+3]= 0.2;
+ daa[14*20+4]= 8.1;
+ daa[14*20+5]= 52.2;
+ daa[14*20+6]= 20.6;
+ daa[14*20+7]= 1.3;
+ daa[14*20+8]= 15.6;
+ daa[14*20+9]= 2.6;
+ daa[14*20+10]= 11.4;
+ daa[14*20+11]= 24.3;
+ daa[14*20+12]= 5.4;
+ daa[14*20+13]= 10.5;
+ daa[15*20+0]= 644.9;
+ daa[15*20+1]= 11.8;
+ daa[15*20+2]= 420.2;
+ daa[15*20+3]= 51.4;
+ daa[15*20+4]= 656.3;
+ daa[15*20+5]= 96.4;
+ daa[15*20+6]= 38.4;
+ daa[15*20+7]= 257.1;
+ daa[15*20+8]= 23.1;
+ daa[15*20+9]= 7.2;
+ daa[15*20+10]= 15.2;
+ daa[15*20+11]= 144.9;
+ daa[15*20+12]= 95.3;
+ daa[15*20+13]= 32.2;
+ daa[15*20+14]= 79.7;
+ daa[16*20+0]= 378.1;
+ daa[16*20+1]= 3.2;
+ daa[16*20+2]= 184.6;
+ daa[16*20+3]= 2.3;
+ daa[16*20+4]= 199.0;
+ daa[16*20+5]= 39.4;
+ daa[16*20+6]= 34.5;
+ daa[16*20+7]= 5.2;
+ daa[16*20+8]= 19.4;
+ daa[16*20+9]= 222.3;
+ daa[16*20+10]= 50.0;
+ daa[16*20+11]= 75.5;
+ daa[16*20+12]= 305.1;
+ daa[16*20+13]= 19.3;
+ daa[16*20+14]= 56.9;
+ daa[16*20+15]= 666.3;
+ daa[17*20+0]= 3.1;
+ daa[17*20+1]= 16.9;
+ daa[17*20+2]= 6.4;
+ daa[17*20+3]= 0.2;
+ daa[17*20+4]= 36.1;
+ daa[17*20+5]= 6.1;
+ daa[17*20+6]= 3.5;
+ daa[17*20+7]= 12.3;
+ daa[17*20+8]= 4.5;
+ daa[17*20+9]= 9.7;
+ daa[17*20+10]= 27.2;
+ daa[17*20+11]= 6.6;
+ daa[17*20+12]= 48.7;
+ daa[17*20+13]= 58.2;
+ daa[17*20+14]= 1.3;
+ daa[17*20+15]= 10.3;
+ daa[17*20+16]= 3.6;
+ daa[18*20+0]= 2.1;
+ daa[18*20+1]= 13.8;
+ daa[18*20+2]= 141.6;
+ daa[18*20+3]= 13.9;
+ daa[18*20+4]= 76.7;
+ daa[18*20+5]= 52.3;
+ daa[18*20+6]= 10.0;
+ daa[18*20+7]= 4.3;
+ daa[18*20+8]= 266.5;
+ daa[18*20+9]= 13.1;
+ daa[18*20+10]= 5.7;
+ daa[18*20+11]= 45.0;
+ daa[18*20+12]= 41.4;
+ daa[18*20+13]= 590.5;
+ daa[18*20+14]= 4.2;
+ daa[18*20+15]= 29.7;
+ daa[18*20+16]= 29.0;
+ daa[18*20+17]= 79.8;
+ daa[19*20+0]= 321.9;
+ daa[19*20+1]= 5.1;
+ daa[19*20+2]= 7.1;
+ daa[19*20+3]= 3.7;
+ daa[19*20+4]= 243.8;
+ daa[19*20+5]= 9.0;
+ daa[19*20+6]= 16.3;
+ daa[19*20+7]= 23.7;
+ daa[19*20+8]= 0.3;
+ daa[19*20+9]= 1710.6;
+ daa[19*20+10]= 126.1;
+ daa[19*20+11]= 11.1;
+ daa[19*20+12]= 279.6;
+ daa[19*20+13]= 59.6;
+ daa[19*20+14]= 17.9;
+ daa[19*20+15]= 49.5;
+ daa[19*20+16]= 396.4;
+ daa[19*20+17]= 13.7;
+ daa[19*20+18]= 15.6;
+
+ f[0]= 0.069;
+ f[1]= 0.021;
+ f[2]= 0.030;
+ f[3]= 0.020;
+ f[4]= 0.010;
+ f[5]= 0.019;
+ f[6]= 0.025;
+ f[7]= 0.072;
+ f[8]= 0.027;
+ f[9]= 0.085;
+ f[10]= 0.157;
+ f[11]= 0.019;
+ f[12]= 0.051;
+ f[13]= 0.082;
+ f[14]= 0.045;
+ f[15]= 0.081;
+ f[16]= 0.056;
+ f[17]= 0.028;
+ f[18]= 0.037;
+ f[19]= 0.066;
+ }
+ break;
+ case PMB:
+ {
+ daa[1*20+0]= 0.674995699;
+ daa[2*20+0]= 0.589645178;
+ daa[2*20+1]= 1.189067034;
+ daa[3*20+0]= 0.462499504;
+ daa[3*20+1]= 0.605460903;
+ daa[3*20+2]= 3.573373315;
+ daa[4*20+0]= 1.065445546;
+ daa[4*20+1]= 0.31444833;
+ daa[4*20+2]= 0.589852457;
+ daa[4*20+3]= 0.246951424;
+ daa[5*20+0]= 1.111766964;
+ daa[5*20+1]= 2.967840934;
+ daa[5*20+2]= 2.299755865;
+ daa[5*20+3]= 1.686058219;
+ daa[5*20+4]= 0.245163782;
+ daa[6*20+0]= 1.046334652;
+ daa[6*20+1]= 1.201770702;
+ daa[6*20+2]= 1.277836748;
+ daa[6*20+3]= 4.399995525;
+ daa[6*20+4]= 0.091071867;
+ daa[6*20+5]= 4.15967899;
+ daa[7*20+0]= 1.587964372;
+ daa[7*20+1]= 0.523770553;
+ daa[7*20+2]= 1.374854049;
+ daa[7*20+3]= 0.734992057;
+ daa[7*20+4]= 0.31706632;
+ daa[7*20+5]= 0.596789898;
+ daa[7*20+6]= 0.463812837;
+ daa[8*20+0]= 0.580830874;
+ daa[8*20+1]= 1.457127446;
+ daa[8*20+2]= 2.283037894;
+ daa[8*20+3]= 0.839348444;
+ daa[8*20+4]= 0.411543728;
+ daa[8*20+5]= 1.812173605;
+ daa[8*20+6]= 0.877842609;
+ daa[8*20+7]= 0.476331437;
+ daa[9*20+0]= 0.464590585;
+ daa[9*20+1]= 0.35964586;
+ daa[9*20+2]= 0.426069419;
+ daa[9*20+3]= 0.266775558;
+ daa[9*20+4]= 0.417547309;
+ daa[9*20+5]= 0.315256838;
+ daa[9*20+6]= 0.30421529;
+ daa[9*20+7]= 0.180198883;
+ daa[9*20+8]= 0.285186418;
+ daa[10*20+0]= 0.804404505;
+ daa[10*20+1]= 0.520701585;
+ daa[10*20+2]= 0.41009447;
+ daa[10*20+3]= 0.269124919;
+ daa[10*20+4]= 0.450795211;
+ daa[10*20+5]= 0.625792937;
+ daa[10*20+6]= 0.32078471;
+ daa[10*20+7]= 0.259854426;
+ daa[10*20+8]= 0.363981358;
+ daa[10*20+9]= 4.162454693;
+ daa[11*20+0]= 0.831998835;
+ daa[11*20+1]= 4.956476453;
+ daa[11*20+2]= 2.037575629;
+ daa[11*20+3]= 1.114178954;
+ daa[11*20+4]= 0.274163536;
+ daa[11*20+5]= 3.521346591;
+ daa[11*20+6]= 2.415974716;
+ daa[11*20+7]= 0.581001076;
+ daa[11*20+8]= 0.985885486;
+ daa[11*20+9]= 0.374784947;
+ daa[11*20+10]= 0.498011337;
+ daa[12*20+0]= 1.546725076;
+ daa[12*20+1]= 0.81346254;
+ daa[12*20+2]= 0.737846301;
+ daa[12*20+3]= 0.341932741;
+ daa[12*20+4]= 0.618614612;
+ daa[12*20+5]= 2.067388546;
+ daa[12*20+6]= 0.531773639;
+ daa[12*20+7]= 0.465349326;
+ daa[12*20+8]= 0.380925433;
+ daa[12*20+9]= 3.65807012;
+ daa[12*20+10]= 5.002338375;
+ daa[12*20+11]= 0.661095832;
+ daa[13*20+0]= 0.546169219;
+ daa[13*20+1]= 0.303437244;
+ daa[13*20+2]= 0.425193716;
+ daa[13*20+3]= 0.219005213;
+ daa[13*20+4]= 0.669206193;
+ daa[13*20+5]= 0.406042546;
+ daa[13*20+6]= 0.224154698;
+ daa[13*20+7]= 0.35402891;
+ daa[13*20+8]= 0.576231691;
+ daa[13*20+9]= 1.495264661;
+ daa[13*20+10]= 2.392638293;
+ daa[13*20+11]= 0.269496317;
+ daa[13*20+12]= 2.306919847;
+ daa[14*20+0]= 1.241586045;
+ daa[14*20+1]= 0.65577338;
+ daa[14*20+2]= 0.711495595;
+ daa[14*20+3]= 0.775624818;
+ daa[14*20+4]= 0.198679914;
+ daa[14*20+5]= 0.850116543;
+ daa[14*20+6]= 0.794584081;
+ daa[14*20+7]= 0.588254139;
+ daa[14*20+8]= 0.456058589;
+ daa[14*20+9]= 0.366232942;
+ daa[14*20+10]= 0.430073179;
+ daa[14*20+11]= 1.036079005;
+ daa[14*20+12]= 0.337502282;
+ daa[14*20+13]= 0.481144863;
+ daa[15*20+0]= 3.452308792;
+ daa[15*20+1]= 0.910144334;
+ daa[15*20+2]= 2.572577221;
+ daa[15*20+3]= 1.440896785;
+ daa[15*20+4]= 0.99870098;
+ daa[15*20+5]= 1.348272505;
+ daa[15*20+6]= 1.205509425;
+ daa[15*20+7]= 1.402122097;
+ daa[15*20+8]= 0.799966711;
+ daa[15*20+9]= 0.530641901;
+ daa[15*20+10]= 0.402471997;
+ daa[15*20+11]= 1.234648153;
+ daa[15*20+12]= 0.945453716;
+ daa[15*20+13]= 0.613230817;
+ daa[15*20+14]= 1.217683028;
+ daa[16*20+0]= 1.751412803;
+ daa[16*20+1]= 0.89517149;
+ daa[16*20+2]= 1.823161023;
+ daa[16*20+3]= 0.994227284;
+ daa[16*20+4]= 0.847312432;
+ daa[16*20+5]= 1.320626678;
+ daa[16*20+6]= 0.949599791;
+ daa[16*20+7]= 0.542185658;
+ daa[16*20+8]= 0.83039281;
+ daa[16*20+9]= 1.114132523;
+ daa[16*20+10]= 0.779827336;
+ daa[16*20+11]= 1.290709079;
+ daa[16*20+12]= 1.551488041;
+ daa[16*20+13]= 0.718895136;
+ daa[16*20+14]= 0.780913179;
+ daa[16*20+15]= 4.448982584;
+ daa[17*20+0]= 0.35011051;
+ daa[17*20+1]= 0.618778365;
+ daa[17*20+2]= 0.422407388;
+ daa[17*20+3]= 0.362495245;
+ daa[17*20+4]= 0.445669347;
+ daa[17*20+5]= 0.72038474;
+ daa[17*20+6]= 0.261258229;
+ daa[17*20+7]= 0.37874827;
+ daa[17*20+8]= 0.72436751;
+ daa[17*20+9]= 0.516260502;
+ daa[17*20+10]= 0.794797115;
+ daa[17*20+11]= 0.43340962;
+ daa[17*20+12]= 0.768395107;
+ daa[17*20+13]= 3.29519344;
+ daa[17*20+14]= 0.499869138;
+ daa[17*20+15]= 0.496334956;
+ daa[17*20+16]= 0.38372361;
+ daa[18*20+0]= 0.573154753;
+ daa[18*20+1]= 0.628599063;
+ daa[18*20+2]= 0.720013799;
+ daa[18*20+3]= 0.436220437;
+ daa[18*20+4]= 0.55626163;
+ daa[18*20+5]= 0.728970584;
+ daa[18*20+6]= 0.50720003;
+ daa[18*20+7]= 0.284727562;
+ daa[18*20+8]= 2.210952064;
+ daa[18*20+9]= 0.570562395;
+ daa[18*20+10]= 0.811019594;
+ daa[18*20+11]= 0.664884513;
+ daa[18*20+12]= 0.93253606;
+ daa[18*20+13]= 5.894735673;
+ daa[18*20+14]= 0.433748126;
+ daa[18*20+15]= 0.593795813;
+ daa[18*20+16]= 0.523549536;
+ daa[18*20+17]= 2.996248013;
+ daa[19*20+0]= 2.063050067;
+ daa[19*20+1]= 0.388680158;
+ daa[19*20+2]= 0.474418852;
+ daa[19*20+3]= 0.275658381;
+ daa[19*20+4]= 0.998911631;
+ daa[19*20+5]= 0.634408285;
+ daa[19*20+6]= 0.527640634;
+ daa[19*20+7]= 0.314700907;
+ daa[19*20+8]= 0.305792277;
+ daa[19*20+9]= 8.002789424;
+ daa[19*20+10]= 2.113077156;
+ daa[19*20+11]= 0.526184203;
+ daa[19*20+12]= 1.737356217;
+ daa[19*20+13]= 0.983844803;
+ daa[19*20+14]= 0.551333603;
+ daa[19*20+15]= 0.507506011;
+ daa[19*20+16]= 1.89965079;
+ daa[19*20+17]= 0.429570747;
+ daa[19*20+18]= 0.716795463;
+
+ f[0]= 0.076;
+ f[1]= 0.054;
+ f[2]= 0.038;
+ f[3]= 0.045;
+ f[4]= 0.028;
+ f[5]= 0.034;
+ f[6]= 0.053;
+ f[7]= 0.078;
+ f[8]= 0.030;
+ f[9]= 0.060;
+ f[10]= 0.096;
+ f[11]= 0.052;
+ f[12]= 0.022;
+ f[13]= 0.045;
+ f[14]= 0.042;
+ f[15]= 0.068;
+ f[16]= 0.056;
+ f[17]= 0.016;
+ f[18]= 0.036;
+ f[19]= 0.071;
+ }
+ break;
+ case HIVB:
+ {
+ daa[1*20+0]= 0.30750700;
+ daa[2*20+0]= 0.00500000;
+ daa[2*20+1]= 0.29554300;
+ daa[3*20+0]= 1.45504000;
+ daa[3*20+1]= 0.00500000;
+ daa[3*20+2]= 17.66120000;
+ daa[4*20+0]= 0.12375800;
+ daa[4*20+1]= 0.35172100;
+ daa[4*20+2]= 0.08606420;
+ daa[4*20+3]= 0.00500000;
+ daa[5*20+0]= 0.05511280;
+ daa[5*20+1]= 3.42150000;
+ daa[5*20+2]= 0.67205200;
+ daa[5*20+3]= 0.00500000;
+ daa[5*20+4]= 0.00500000;
+ daa[6*20+0]= 1.48135000;
+ daa[6*20+1]= 0.07492180;
+ daa[6*20+2]= 0.07926330;
+ daa[6*20+3]= 10.58720000;
+ daa[6*20+4]= 0.00500000;
+ daa[6*20+5]= 2.56020000;
+ daa[7*20+0]= 2.13536000;
+ daa[7*20+1]= 3.65345000;
+ daa[7*20+2]= 0.32340100;
+ daa[7*20+3]= 2.83806000;
+ daa[7*20+4]= 0.89787100;
+ daa[7*20+5]= 0.06191370;
+ daa[7*20+6]= 3.92775000;
+ daa[8*20+0]= 0.08476130;
+ daa[8*20+1]= 9.04044000;
+ daa[8*20+2]= 7.64585000;
+ daa[8*20+3]= 1.91690000;
+ daa[8*20+4]= 0.24007300;
+ daa[8*20+5]= 7.05545000;
+ daa[8*20+6]= 0.11974000;
+ daa[8*20+7]= 0.00500000;
+ daa[9*20+0]= 0.00500000;
+ daa[9*20+1]= 0.67728900;
+ daa[9*20+2]= 0.68056500;
+ daa[9*20+3]= 0.01767920;
+ daa[9*20+4]= 0.00500000;
+ daa[9*20+5]= 0.00500000;
+ daa[9*20+6]= 0.00609079;
+ daa[9*20+7]= 0.00500000;
+ daa[9*20+8]= 0.10311100;
+ daa[10*20+0]= 0.21525600;
+ daa[10*20+1]= 0.70142700;
+ daa[10*20+2]= 0.00500000;
+ daa[10*20+3]= 0.00876048;
+ daa[10*20+4]= 0.12977700;
+ daa[10*20+5]= 1.49456000;
+ daa[10*20+6]= 0.00500000;
+ daa[10*20+7]= 0.00500000;
+ daa[10*20+8]= 1.74171000;
+ daa[10*20+9]= 5.95879000;
+ daa[11*20+0]= 0.00500000;
+ daa[11*20+1]= 20.45000000;
+ daa[11*20+2]= 7.90443000;
+ daa[11*20+3]= 0.00500000;
+ daa[11*20+4]= 0.00500000;
+ daa[11*20+5]= 6.54737000;
+ daa[11*20+6]= 4.61482000;
+ daa[11*20+7]= 0.52170500;
+ daa[11*20+8]= 0.00500000;
+ daa[11*20+9]= 0.32231900;
+ daa[11*20+10]= 0.08149950;
+ daa[12*20+0]= 0.01866430;
+ daa[12*20+1]= 2.51394000;
+ daa[12*20+2]= 0.00500000;
+ daa[12*20+3]= 0.00500000;
+ daa[12*20+4]= 0.00500000;
+ daa[12*20+5]= 0.30367600;
+ daa[12*20+6]= 0.17578900;
+ daa[12*20+7]= 0.00500000;
+ daa[12*20+8]= 0.00500000;
+ daa[12*20+9]= 11.20650000;
+ daa[12*20+10]= 5.31961000;
+ daa[12*20+11]= 1.28246000;
+ daa[13*20+0]= 0.01412690;
+ daa[13*20+1]= 0.00500000;
+ daa[13*20+2]= 0.00500000;
+ daa[13*20+3]= 0.00500000;
+ daa[13*20+4]= 9.29815000;
+ daa[13*20+5]= 0.00500000;
+ daa[13*20+6]= 0.00500000;
+ daa[13*20+7]= 0.29156100;
+ daa[13*20+8]= 0.14555800;
+ daa[13*20+9]= 3.39836000;
+ daa[13*20+10]= 8.52484000;
+ daa[13*20+11]= 0.03426580;
+ daa[13*20+12]= 0.18802500;
+ daa[14*20+0]= 2.12217000;
+ daa[14*20+1]= 1.28355000;
+ daa[14*20+2]= 0.00739578;
+ daa[14*20+3]= 0.03426580;
+ daa[14*20+4]= 0.00500000;
+ daa[14*20+5]= 4.47211000;
+ daa[14*20+6]= 0.01202260;
+ daa[14*20+7]= 0.00500000;
+ daa[14*20+8]= 2.45318000;
+ daa[14*20+9]= 0.04105930;
+ daa[14*20+10]= 2.07757000;
+ daa[14*20+11]= 0.03138620;
+ daa[14*20+12]= 0.00500000;
+ daa[14*20+13]= 0.00500000;
+ daa[15*20+0]= 2.46633000;
+ daa[15*20+1]= 3.47910000;
+ daa[15*20+2]= 13.14470000;
+ daa[15*20+3]= 0.52823000;
+ daa[15*20+4]= 4.69314000;
+ daa[15*20+5]= 0.11631100;
+ daa[15*20+6]= 0.00500000;
+ daa[15*20+7]= 4.38041000;
+ daa[15*20+8]= 0.38274700;
+ daa[15*20+9]= 1.21803000;
+ daa[15*20+10]= 0.92765600;
+ daa[15*20+11]= 0.50411100;
+ daa[15*20+12]= 0.00500000;
+ daa[15*20+13]= 0.95647200;
+ daa[15*20+14]= 5.37762000;
+ daa[16*20+0]= 15.91830000;
+ daa[16*20+1]= 2.86868000;
+ daa[16*20+2]= 6.88667000;
+ daa[16*20+3]= 0.27472400;
+ daa[16*20+4]= 0.73996900;
+ daa[16*20+5]= 0.24358900;
+ daa[16*20+6]= 0.28977400;
+ daa[16*20+7]= 0.36961500;
+ daa[16*20+8]= 0.71159400;
+ daa[16*20+9]= 8.61217000;
+ daa[16*20+10]= 0.04376730;
+ daa[16*20+11]= 4.67142000;
+ daa[16*20+12]= 4.94026000;
+ daa[16*20+13]= 0.01412690;
+ daa[16*20+14]= 2.01417000;
+ daa[16*20+15]= 8.93107000;
+ daa[17*20+0]= 0.00500000;
+ daa[17*20+1]= 0.99133800;
+ daa[17*20+2]= 0.00500000;
+ daa[17*20+3]= 0.00500000;
+ daa[17*20+4]= 2.63277000;
+ daa[17*20+5]= 0.02665600;
+ daa[17*20+6]= 0.00500000;
+ daa[17*20+7]= 1.21674000;
+ daa[17*20+8]= 0.06951790;
+ daa[17*20+9]= 0.00500000;
+ daa[17*20+10]= 0.74884300;
+ daa[17*20+11]= 0.00500000;
+ daa[17*20+12]= 0.08907800;
+ daa[17*20+13]= 0.82934300;
+ daa[17*20+14]= 0.04445060;
+ daa[17*20+15]= 0.02487280;
+ daa[17*20+16]= 0.00500000;
+ daa[18*20+0]= 0.00500000;
+ daa[18*20+1]= 0.00991826;
+ daa[18*20+2]= 1.76417000;
+ daa[18*20+3]= 0.67465300;
+ daa[18*20+4]= 7.57932000;
+ daa[18*20+5]= 0.11303300;
+ daa[18*20+6]= 0.07926330;
+ daa[18*20+7]= 0.00500000;
+ daa[18*20+8]= 18.69430000;
+ daa[18*20+9]= 0.14816800;
+ daa[18*20+10]= 0.11198600;
+ daa[18*20+11]= 0.00500000;
+ daa[18*20+12]= 0.00500000;
+ daa[18*20+13]= 15.34000000;
+ daa[18*20+14]= 0.03043810;
+ daa[18*20+15]= 0.64802400;
+ daa[18*20+16]= 0.10565200;
+ daa[18*20+17]= 1.28022000;
+ daa[19*20+0]= 7.61428000;
+ daa[19*20+1]= 0.08124540;
+ daa[19*20+2]= 0.02665600;
+ daa[19*20+3]= 1.04793000;
+ daa[19*20+4]= 0.42002700;
+ daa[19*20+5]= 0.02091530;
+ daa[19*20+6]= 1.02847000;
+ daa[19*20+7]= 0.95315500;
+ daa[19*20+8]= 0.00500000;
+ daa[19*20+9]= 17.73890000;
+ daa[19*20+10]= 1.41036000;
+ daa[19*20+11]= 0.26582900;
+ daa[19*20+12]= 6.85320000;
+ daa[19*20+13]= 0.72327400;
+ daa[19*20+14]= 0.00500000;
+ daa[19*20+15]= 0.07492180;
+ daa[19*20+16]= 0.70922600;
+ daa[19*20+17]= 0.00500000;
+ daa[19*20+18]= 0.04105930;
+
+ /*f[0]= 0.060;
+ f[1]= 0.066;
+ f[2]= 0.044;
+ f[3]= 0.042;
+ f[4]= 0.020;
+ f[5]= 0.054;
+ f[6]= 0.071;
+ f[7]= 0.072;
+ f[8]= 0.022;
+ f[9]= 0.070;
+ f[10]= 0.099;
+ f[11]= 0.057;
+ f[12]= 0.020;
+ f[13]= 0.029;
+ f[14]= 0.046;
+ f[15]= 0.051;
+ f[16]= 0.054;
+ f[17]= 0.033;
+ f[18]= 0.028;
+ f[19]= 0.062;*/
+
+ f[0]= 0.060490222; f[1]= 0.066039665; f[2]= 0.044127815; f[3]= 0.042109048;
+ f[4]= 0.020075899; f[5]= 0.053606488; f[6]= 0.071567447; f[7]= 0.072308239;
+ f[8]= 0.022293943; f[9]= 0.069730629; f[10]= 0.098851122; f[11]= 0.056968211;
+ f[12]= 0.019768318; f[13]= 0.028809447; f[14]= 0.046025282; f[15]= 0.05060433;
+ f[16]= 0.053636813; f[17]= 0.033011601; f[18]= 0.028350243; f[19]= 0.061625237;
+ }
+ break;
+ case HIVW:
+ {
+ daa[1*20+0]= 0.0744808;
+ daa[2*20+0]= 0.6175090;
+ daa[2*20+1]= 0.1602400;
+ daa[3*20+0]= 4.4352100;
+ daa[3*20+1]= 0.0674539;
+ daa[3*20+2]= 29.4087000;
+ daa[4*20+0]= 0.1676530;
+ daa[4*20+1]= 2.8636400;
+ daa[4*20+2]= 0.0604932;
+ daa[4*20+3]= 0.0050000;
+ daa[5*20+0]= 0.0050000;
+ daa[5*20+1]= 10.6746000;
+ daa[5*20+2]= 0.3420680;
+ daa[5*20+3]= 0.0050000;
+ daa[5*20+4]= 0.0050000;
+ daa[6*20+0]= 5.5632500;
+ daa[6*20+1]= 0.0251632;
+ daa[6*20+2]= 0.2015260;
+ daa[6*20+3]= 12.1233000;
+ daa[6*20+4]= 0.0050000;
+ daa[6*20+5]= 3.2065600;
+ daa[7*20+0]= 1.8685000;
+ daa[7*20+1]= 13.4379000;
+ daa[7*20+2]= 0.0604932;
+ daa[7*20+3]= 10.3969000;
+ daa[7*20+4]= 0.0489798;
+ daa[7*20+5]= 0.0604932;
+ daa[7*20+6]= 14.7801000;
+ daa[8*20+0]= 0.0050000;
+ daa[8*20+1]= 6.8440500;
+ daa[8*20+2]= 8.5987600;
+ daa[8*20+3]= 2.3177900;
+ daa[8*20+4]= 0.0050000;
+ daa[8*20+5]= 18.5465000;
+ daa[8*20+6]= 0.0050000;
+ daa[8*20+7]= 0.0050000;
+ daa[9*20+0]= 0.0050000;
+ daa[9*20+1]= 1.3406900;
+ daa[9*20+2]= 0.9870280;
+ daa[9*20+3]= 0.1451240;
+ daa[9*20+4]= 0.0050000;
+ daa[9*20+5]= 0.0342252;
+ daa[9*20+6]= 0.0390512;
+ daa[9*20+7]= 0.0050000;
+ daa[9*20+8]= 0.0050000;
+ daa[10*20+0]= 0.1602400;
+ daa[10*20+1]= 0.5867570;
+ daa[10*20+2]= 0.0050000;
+ daa[10*20+3]= 0.0050000;
+ daa[10*20+4]= 0.0050000;
+ daa[10*20+5]= 2.8904800;
+ daa[10*20+6]= 0.1298390;
+ daa[10*20+7]= 0.0489798;
+ daa[10*20+8]= 1.7638200;
+ daa[10*20+9]= 9.1024600;
+ daa[11*20+0]= 0.5927840;
+ daa[11*20+1]= 39.8897000;
+ daa[11*20+2]= 10.6655000;
+ daa[11*20+3]= 0.8943130;
+ daa[11*20+4]= 0.0050000;
+ daa[11*20+5]= 13.0705000;
+ daa[11*20+6]= 23.9626000;
+ daa[11*20+7]= 0.2794250;
+ daa[11*20+8]= 0.2240600;
+ daa[11*20+9]= 0.8174810;
+ daa[11*20+10]= 0.0050000;
+ daa[12*20+0]= 0.0050000;
+ daa[12*20+1]= 3.2865200;
+ daa[12*20+2]= 0.2015260;
+ daa[12*20+3]= 0.0050000;
+ daa[12*20+4]= 0.0050000;
+ daa[12*20+5]= 0.0050000;
+ daa[12*20+6]= 0.0050000;
+ daa[12*20+7]= 0.0489798;
+ daa[12*20+8]= 0.0050000;
+ daa[12*20+9]= 17.3064000;
+ daa[12*20+10]= 11.3839000;
+ daa[12*20+11]= 4.0956400;
+ daa[13*20+0]= 0.5979230;
+ daa[13*20+1]= 0.0050000;
+ daa[13*20+2]= 0.0050000;
+ daa[13*20+3]= 0.0050000;
+ daa[13*20+4]= 0.3629590;
+ daa[13*20+5]= 0.0050000;
+ daa[13*20+6]= 0.0050000;
+ daa[13*20+7]= 0.0050000;
+ daa[13*20+8]= 0.0050000;
+ daa[13*20+9]= 1.4828800;
+ daa[13*20+10]= 7.4878100;
+ daa[13*20+11]= 0.0050000;
+ daa[13*20+12]= 0.0050000;
+ daa[14*20+0]= 1.0098100;
+ daa[14*20+1]= 0.4047230;
+ daa[14*20+2]= 0.3448480;
+ daa[14*20+3]= 0.0050000;
+ daa[14*20+4]= 0.0050000;
+ daa[14*20+5]= 3.0450200;
+ daa[14*20+6]= 0.0050000;
+ daa[14*20+7]= 0.0050000;
+ daa[14*20+8]= 13.9444000;
+ daa[14*20+9]= 0.0050000;
+ daa[14*20+10]= 9.8309500;
+ daa[14*20+11]= 0.1119280;
+ daa[14*20+12]= 0.0050000;
+ daa[14*20+13]= 0.0342252;
+ daa[15*20+0]= 8.5942000;
+ daa[15*20+1]= 8.3502400;
+ daa[15*20+2]= 14.5699000;
+ daa[15*20+3]= 0.4278810;
+ daa[15*20+4]= 1.1219500;
+ daa[15*20+5]= 0.1602400;
+ daa[15*20+6]= 0.0050000;
+ daa[15*20+7]= 6.2796600;
+ daa[15*20+8]= 0.7251570;
+ daa[15*20+9]= 0.7400910;
+ daa[15*20+10]= 6.1439600;
+ daa[15*20+11]= 0.0050000;
+ daa[15*20+12]= 0.3925750;
+ daa[15*20+13]= 4.2793900;
+ daa[15*20+14]= 14.2490000;
+ daa[16*20+0]= 24.1422000;
+ daa[16*20+1]= 0.9282030;
+ daa[16*20+2]= 4.5420600;
+ daa[16*20+3]= 0.6303950;
+ daa[16*20+4]= 0.0050000;
+ daa[16*20+5]= 0.2030910;
+ daa[16*20+6]= 0.4587430;
+ daa[16*20+7]= 0.0489798;
+ daa[16*20+8]= 0.9595600;
+ daa[16*20+9]= 9.3634500;
+ daa[16*20+10]= 0.0050000;
+ daa[16*20+11]= 4.0480200;
+ daa[16*20+12]= 7.4131300;
+ daa[16*20+13]= 0.1145120;
+ daa[16*20+14]= 4.3370100;
+ daa[16*20+15]= 6.3407900;
+ daa[17*20+0]= 0.0050000;
+ daa[17*20+1]= 5.9656400;
+ daa[17*20+2]= 0.0050000;
+ daa[17*20+3]= 0.0050000;
+ daa[17*20+4]= 5.4989400;
+ daa[17*20+5]= 0.0443298;
+ daa[17*20+6]= 0.0050000;
+ daa[17*20+7]= 2.8258000;
+ daa[17*20+8]= 0.0050000;
+ daa[17*20+9]= 0.0050000;
+ daa[17*20+10]= 1.3703100;
+ daa[17*20+11]= 0.0050000;
+ daa[17*20+12]= 0.0050000;
+ daa[17*20+13]= 0.0050000;
+ daa[17*20+14]= 0.0050000;
+ daa[17*20+15]= 1.1015600;
+ daa[17*20+16]= 0.0050000;
+ daa[18*20+0]= 0.0050000;
+ daa[18*20+1]= 0.0050000;
+ daa[18*20+2]= 5.0647500;
+ daa[18*20+3]= 2.2815400;
+ daa[18*20+4]= 8.3483500;
+ daa[18*20+5]= 0.0050000;
+ daa[18*20+6]= 0.0050000;
+ daa[18*20+7]= 0.0050000;
+ daa[18*20+8]= 47.4889000;
+ daa[18*20+9]= 0.1145120;
+ daa[18*20+10]= 0.0050000;
+ daa[18*20+11]= 0.0050000;
+ daa[18*20+12]= 0.5791980;
+ daa[18*20+13]= 4.1272800;
+ daa[18*20+14]= 0.0050000;
+ daa[18*20+15]= 0.9331420;
+ daa[18*20+16]= 0.4906080;
+ daa[18*20+17]= 0.0050000;
+ daa[19*20+0]= 24.8094000;
+ daa[19*20+1]= 0.2794250;
+ daa[19*20+2]= 0.0744808;
+ daa[19*20+3]= 2.9178600;
+ daa[19*20+4]= 0.0050000;
+ daa[19*20+5]= 0.0050000;
+ daa[19*20+6]= 2.1995200;
+ daa[19*20+7]= 2.7962200;
+ daa[19*20+8]= 0.8274790;
+ daa[19*20+9]= 24.8231000;
+ daa[19*20+10]= 2.9534400;
+ daa[19*20+11]= 0.1280650;
+ daa[19*20+12]= 14.7683000;
+ daa[19*20+13]= 2.2800000;
+ daa[19*20+14]= 0.0050000;
+ daa[19*20+15]= 0.8626370;
+ daa[19*20+16]= 0.0050000;
+ daa[19*20+17]= 0.0050000;
+ daa[19*20+18]= 1.3548200;
+
+ /*f[0]= 0.038;
+ f[1]= 0.057;
+ f[2]= 0.089;
+ f[3]= 0.034;
+ f[4]= 0.024;
+ f[5]= 0.044;
+ f[6]= 0.062;
+ f[7]= 0.084;
+ f[8]= 0.016;
+ f[9]= 0.098;
+ f[10]= 0.058;
+ f[11]= 0.064;
+ f[12]= 0.016;
+ f[13]= 0.042;
+ f[14]= 0.046;
+ f[15]= 0.055;
+ f[16]= 0.081;
+ f[17]= 0.020;
+ f[18]= 0.021;
+ f[19]= 0.051;*/
+
+ f[0]= 0.0377494; f[1]= 0.057321; f[2]= 0.0891129; f[3]= 0.0342034;
+ f[4]= 0.0240105; f[5]= 0.0437824; f[6]= 0.0618606; f[7]= 0.0838496;
+ f[8]= 0.0156076; f[9]= 0.0983641; f[10]= 0.0577867; f[11]= 0.0641682;
+ f[12]= 0.0158419; f[13]= 0.0422741; f[14]= 0.0458601; f[15]= 0.0550846;
+ f[16]= 0.0813774; f[17]= 0.019597; f[18]= 0.0205847; f[19]= 0.0515638;
+ }
+ break;
+ case JTTDCMUT:
+ {
+ daa[1*20+0]= 0.531678;
+ daa[2*20+0]= 0.557967;
+ daa[2*20+1]= 0.451095;
+ daa[3*20+0]= 0.827445;
+ daa[3*20+1]= 0.154899;
+ daa[3*20+2]= 5.549530;
+ daa[4*20+0]= 0.574478;
+ daa[4*20+1]= 1.019843;
+ daa[4*20+2]= 0.313311;
+ daa[4*20+3]= 0.105625;
+ daa[5*20+0]= 0.556725;
+ daa[5*20+1]= 3.021995;
+ daa[5*20+2]= 0.768834;
+ daa[5*20+3]= 0.521646;
+ daa[5*20+4]= 0.091304;
+ daa[6*20+0]= 1.066681;
+ daa[6*20+1]= 0.318483;
+ daa[6*20+2]= 0.578115;
+ daa[6*20+3]= 7.766557;
+ daa[6*20+4]= 0.053907;
+ daa[6*20+5]= 3.417706;
+ daa[7*20+0]= 1.740159;
+ daa[7*20+1]= 1.359652;
+ daa[7*20+2]= 0.773313;
+ daa[7*20+3]= 1.272434;
+ daa[7*20+4]= 0.546389;
+ daa[7*20+5]= 0.231294;
+ daa[7*20+6]= 1.115632;
+ daa[8*20+0]= 0.219970;
+ daa[8*20+1]= 3.210671;
+ daa[8*20+2]= 4.025778;
+ daa[8*20+3]= 1.032342;
+ daa[8*20+4]= 0.724998;
+ daa[8*20+5]= 5.684080;
+ daa[8*20+6]= 0.243768;
+ daa[8*20+7]= 0.201696;
+ daa[9*20+0]= 0.361684;
+ daa[9*20+1]= 0.239195;
+ daa[9*20+2]= 0.491003;
+ daa[9*20+3]= 0.115968;
+ daa[9*20+4]= 0.150559;
+ daa[9*20+5]= 0.078270;
+ daa[9*20+6]= 0.111773;
+ daa[9*20+7]= 0.053769;
+ daa[9*20+8]= 0.181788;
+ daa[10*20+0]= 0.310007;
+ daa[10*20+1]= 0.372261;
+ daa[10*20+2]= 0.137289;
+ daa[10*20+3]= 0.061486;
+ daa[10*20+4]= 0.164593;
+ daa[10*20+5]= 0.709004;
+ daa[10*20+6]= 0.097485;
+ daa[10*20+7]= 0.069492;
+ daa[10*20+8]= 0.540571;
+ daa[10*20+9]= 2.335139;
+ daa[11*20+0]= 0.369437;
+ daa[11*20+1]= 6.529255;
+ daa[11*20+2]= 2.529517;
+ daa[11*20+3]= 0.282466;
+ daa[11*20+4]= 0.049009;
+ daa[11*20+5]= 2.966732;
+ daa[11*20+6]= 1.731684;
+ daa[11*20+7]= 0.269840;
+ daa[11*20+8]= 0.525096;
+ daa[11*20+9]= 0.202562;
+ daa[11*20+10]= 0.146481;
+ daa[12*20+0]= 0.469395;
+ daa[12*20+1]= 0.431045;
+ daa[12*20+2]= 0.330720;
+ daa[12*20+3]= 0.190001;
+ daa[12*20+4]= 0.409202;
+ daa[12*20+5]= 0.456901;
+ daa[12*20+6]= 0.175084;
+ daa[12*20+7]= 0.130379;
+ daa[12*20+8]= 0.329660;
+ daa[12*20+9]= 4.831666;
+ daa[12*20+10]= 3.856906;
+ daa[12*20+11]= 0.624581;
+ daa[13*20+0]= 0.138293;
+ daa[13*20+1]= 0.065314;
+ daa[13*20+2]= 0.073481;
+ daa[13*20+3]= 0.032522;
+ daa[13*20+4]= 0.678335;
+ daa[13*20+5]= 0.045683;
+ daa[13*20+6]= 0.043829;
+ daa[13*20+7]= 0.050212;
+ daa[13*20+8]= 0.453428;
+ daa[13*20+9]= 0.777090;
+ daa[13*20+10]= 2.500294;
+ daa[13*20+11]= 0.024521;
+ daa[13*20+12]= 0.436181;
+ daa[14*20+0]= 1.959599;
+ daa[14*20+1]= 0.710489;
+ daa[14*20+2]= 0.121804;
+ daa[14*20+3]= 0.127164;
+ daa[14*20+4]= 0.123653;
+ daa[14*20+5]= 1.608126;
+ daa[14*20+6]= 0.191994;
+ daa[14*20+7]= 0.208081;
+ daa[14*20+8]= 1.141961;
+ daa[14*20+9]= 0.098580;
+ daa[14*20+10]= 1.060504;
+ daa[14*20+11]= 0.216345;
+ daa[14*20+12]= 0.164215;
+ daa[14*20+13]= 0.148483;
+ daa[15*20+0]= 3.887095;
+ daa[15*20+1]= 1.001551;
+ daa[15*20+2]= 5.057964;
+ daa[15*20+3]= 0.589268;
+ daa[15*20+4]= 2.155331;
+ daa[15*20+5]= 0.548807;
+ daa[15*20+6]= 0.312449;
+ daa[15*20+7]= 1.874296;
+ daa[15*20+8]= 0.743458;
+ daa[15*20+9]= 0.405119;
+ daa[15*20+10]= 0.592511;
+ daa[15*20+11]= 0.474478;
+ daa[15*20+12]= 0.285564;
+ daa[15*20+13]= 0.943971;
+ daa[15*20+14]= 2.788406;
+ daa[16*20+0]= 4.582565;
+ daa[16*20+1]= 0.650282;
+ daa[16*20+2]= 2.351311;
+ daa[16*20+3]= 0.425159;
+ daa[16*20+4]= 0.469823;
+ daa[16*20+5]= 0.523825;
+ daa[16*20+6]= 0.331584;
+ daa[16*20+7]= 0.316862;
+ daa[16*20+8]= 0.477355;
+ daa[16*20+9]= 2.553806;
+ daa[16*20+10]= 0.272514;
+ daa[16*20+11]= 0.965641;
+ daa[16*20+12]= 2.114728;
+ daa[16*20+13]= 0.138904;
+ daa[16*20+14]= 1.176961;
+ daa[16*20+15]= 4.777647;
+ daa[17*20+0]= 0.084329;
+ daa[17*20+1]= 1.257961;
+ daa[17*20+2]= 0.027700;
+ daa[17*20+3]= 0.057466;
+ daa[17*20+4]= 1.104181;
+ daa[17*20+5]= 0.172206;
+ daa[17*20+6]= 0.114381;
+ daa[17*20+7]= 0.544180;
+ daa[17*20+8]= 0.128193;
+ daa[17*20+9]= 0.134510;
+ daa[17*20+10]= 0.530324;
+ daa[17*20+11]= 0.089134;
+ daa[17*20+12]= 0.201334;
+ daa[17*20+13]= 0.537922;
+ daa[17*20+14]= 0.069965;
+ daa[17*20+15]= 0.310927;
+ daa[17*20+16]= 0.080556;
+ daa[18*20+0]= 0.139492;
+ daa[18*20+1]= 0.235601;
+ daa[18*20+2]= 0.700693;
+ daa[18*20+3]= 0.453952;
+ daa[18*20+4]= 2.114852;
+ daa[18*20+5]= 0.254745;
+ daa[18*20+6]= 0.063452;
+ daa[18*20+7]= 0.052500;
+ daa[18*20+8]= 5.848400;
+ daa[18*20+9]= 0.303445;
+ daa[18*20+10]= 0.241094;
+ daa[18*20+11]= 0.087904;
+ daa[18*20+12]= 0.189870;
+ daa[18*20+13]= 5.484236;
+ daa[18*20+14]= 0.113850;
+ daa[18*20+15]= 0.628608;
+ daa[18*20+16]= 0.201094;
+ daa[18*20+17]= 0.747889;
+ daa[19*20+0]= 2.924161;
+ daa[19*20+1]= 0.171995;
+ daa[19*20+2]= 0.164525;
+ daa[19*20+3]= 0.315261;
+ daa[19*20+4]= 0.621323;
+ daa[19*20+5]= 0.179771;
+ daa[19*20+6]= 0.465271;
+ daa[19*20+7]= 0.470140;
+ daa[19*20+8]= 0.121827;
+ daa[19*20+9]= 9.533943;
+ daa[19*20+10]= 1.761439;
+ daa[19*20+11]= 0.124066;
+ daa[19*20+12]= 3.038533;
+ daa[19*20+13]= 0.593478;
+ daa[19*20+14]= 0.211561;
+ daa[19*20+15]= 0.408532;
+ daa[19*20+16]= 1.143980;
+ daa[19*20+17]= 0.239697;
+ daa[19*20+18]= 0.165473;
+
+ f[0]= 0.077;
+ f[1]= 0.051;
+ f[2]= 0.043;
+ f[3]= 0.051;
+ f[4]= 0.020;
+ f[5]= 0.041;
+ f[6]= 0.062;
+ f[7]= 0.075;
+ f[8]= 0.023;
+ f[9]= 0.053;
+ f[10]= 0.091;
+ f[11]= 0.059;
+ f[12]= 0.024;
+ f[13]= 0.040;
+ f[14]= 0.051;
+ f[15]= 0.068;
+ f[16]= 0.059;
+ f[17]= 0.014;
+ f[18]= 0.032;
+ f[19]= 0.066;
+ }
+ break;
+ case FLU:
+ {
+ daa[ 1*20+ 0] = 0.138658765 ;
+ daa[ 2*20+ 0] = 0.053366579 ;
+ daa[ 2*20+ 1] = 0.161000889 ;
+ daa[ 3*20+ 0] = 0.584852306 ;
+ daa[ 3*20+ 1] = 0.006771843 ;
+ daa[ 3*20+ 2] = 7.737392871 ;
+ daa[ 4*20+ 0] = 0.026447095 ;
+ daa[ 4*20+ 1] = 0.167207008 ;
+ daa[ 4*20+ 2] = 1.30E-05 ;
+ daa[ 4*20+ 3] = 1.41E-02 ;
+ daa[ 5*20+ 0] = 0.353753982 ;
+ daa[ 5*20+ 1] = 3.292716942 ;
+ daa[ 5*20+ 2] = 0.530642655 ;
+ daa[ 5*20+ 3] = 0.145469388 ;
+ daa[ 5*20+ 4] = 0.002547334 ;
+ daa[ 6*20+ 0] = 1.484234503 ;
+ daa[ 6*20+ 1] = 0.124897617 ;
+ daa[ 6*20+ 2] = 0.061652192 ;
+ daa[ 6*20+ 3] = 5.370511279 ;
+ daa[ 6*20+ 4] = 3.91E-11 ;
+ daa[ 6*20+ 5] = 1.195629122 ;
+ daa[ 7*20+ 0] = 1.132313122 ;
+ daa[ 7*20+ 1] = 1.190624465 ;
+ daa[ 7*20+ 2] = 0.322524648 ;
+ daa[ 7*20+ 3] = 1.934832784 ;
+ daa[ 7*20+ 4] = 0.116941459 ;
+ daa[ 7*20+ 5] = 0.108051341 ;
+ daa[ 7*20+ 6] = 1.593098825 ;
+ daa[ 8*20+ 0] = 0.214757862 ;
+ daa[ 8*20+ 1] = 1.879569938 ;
+ daa[ 8*20+ 2] = 1.387096032 ;
+ daa[ 8*20+ 3] = 0.887570549 ;
+ daa[ 8*20+ 4] = 2.18E-02 ;
+ daa[ 8*20+ 5] = 5.330313412 ;
+ daa[ 8*20+ 6] = 0.256491863 ;
+ daa[ 8*20+ 7] = 0.058774527 ;
+ daa[ 9*20+ 0] = 0.149926734 ;
+ daa[ 9*20+ 1] = 0.246117172 ;
+ daa[ 9*20+ 2] = 0.218571975 ;
+ daa[ 9*20+ 3] = 0.014085917 ;
+ daa[ 9*20+ 4] = 0.001112158 ;
+ daa[ 9*20+ 5] = 0.02883995 ;
+ daa[ 9*20+ 6] = 1.42E-02 ;
+ daa[ 9*20+ 7] = 1.63E-05 ;
+ daa[ 9*20+ 8] = 0.243190142 ;
+ daa[10*20+ 0] = 0.023116952 ;
+ daa[10*20+ 1] = 0.296045557 ;
+ daa[10*20+ 2] = 8.36E-04 ;
+ daa[10*20+ 3] = 0.005730682 ;
+ daa[10*20+ 4] = 0.005613627 ;
+ daa[10*20+ 5] = 1.020366955 ;
+ daa[10*20+ 6] = 0.016499536 ;
+ daa[10*20+ 7] = 0.006516229 ;
+ daa[10*20+ 8] = 0.321611694 ;
+ daa[10*20+ 9] = 3.512072282 ;
+ daa[11*20+ 0] = 0.47433361 ;
+ daa[11*20+ 1] = 15.30009662 ;
+ daa[11*20+ 2] = 2.646847965 ;
+ daa[11*20+ 3] = 0.29004298 ;
+ daa[11*20+ 4] = 3.83E-06 ;
+ daa[11*20+ 5] = 2.559587177 ;
+ daa[11*20+ 6] = 3.881488809 ;
+ daa[11*20+ 7] = 0.264148929 ;
+ daa[11*20+ 8] = 0.347302791 ;
+ daa[11*20+ 9] = 0.227707997 ;
+ daa[11*20+10] = 0.129223639 ;
+ daa[12*20+ 0] = 0.058745423 ;
+ daa[12*20+ 1] = 0.890162346 ;
+ daa[12*20+ 2] = 0.005251688 ;
+ daa[12*20+ 3] = 0.041762964 ;
+ daa[12*20+ 4] = 0.11145731 ;
+ daa[12*20+ 5] = 0.190259181 ;
+ daa[12*20+ 6] = 0.313974351 ;
+ daa[12*20+ 7] = 0.001500467 ;
+ daa[12*20+ 8] = 0.001273509 ;
+ daa[12*20+ 9] = 9.017954203 ;
+ daa[12*20+10] = 6.746936485 ;
+ daa[12*20+11] = 1.331291619 ;
+ daa[13*20+ 0] = 0.080490909 ;
+ daa[13*20+ 1] = 1.61E-02 ;
+ daa[13*20+ 2] = 8.36E-04 ;
+ daa[13*20+ 3] = 1.06E-06 ;
+ daa[13*20+ 4] = 0.104053666 ;
+ daa[13*20+ 5] = 0.032680657 ;
+ daa[13*20+ 6] = 0.001003501 ;
+ daa[13*20+ 7] = 0.001236645 ;
+ daa[13*20+ 8] = 0.119028506 ;
+ daa[13*20+ 9] = 1.463357278 ;
+ daa[13*20+10] = 2.986800036 ;
+ daa[13*20+11] = 3.20E-01 ;
+ daa[13*20+12] = 0.279910509 ;
+ daa[14*20+ 0] = 0.659311478 ;
+ daa[14*20+ 1] = 0.15402718 ;
+ daa[14*20+ 2] = 3.64E-02 ;
+ daa[14*20+ 3] = 0.188539456 ;
+ daa[14*20+ 4] = 1.59E-13 ;
+ daa[14*20+ 5] = 0.712769599 ;
+ daa[14*20+ 6] = 0.319558828 ;
+ daa[14*20+ 7] = 0.038631761 ;
+ daa[14*20+ 8] = 0.924466914 ;
+ daa[14*20+ 9] = 0.080543327 ;
+ daa[14*20+10] = 0.634308521 ;
+ daa[14*20+11] = 0.195750632 ;
+ daa[14*20+12] = 5.69E-02 ;
+ daa[14*20+13] = 0.00713243 ;
+ daa[15*20+ 0] = 3.011344519 ;
+ daa[15*20+ 1] = 0.95013841 ;
+ daa[15*20+ 2] = 3.881310531 ;
+ daa[15*20+ 3] = 0.338372183 ;
+ daa[15*20+ 4] = 0.336263345 ;
+ daa[15*20+ 5] = 0.487822499 ;
+ daa[15*20+ 6] = 0.307140298 ;
+ daa[15*20+ 7] = 1.585646577 ;
+ daa[15*20+ 8] = 0.58070425 ;
+ daa[15*20+ 9] = 0.290381075 ;
+ daa[15*20+10] = 0.570766693 ;
+ daa[15*20+11] = 0.283807672 ;
+ daa[15*20+12] = 0.007026588 ;
+ daa[15*20+13] = 0.99668567 ;
+ daa[15*20+14] = 2.087385344 ;
+ daa[16*20+ 0] = 5.418298175 ;
+ daa[16*20+ 1] = 0.183076905 ;
+ daa[16*20+ 2] = 2.140332316 ;
+ daa[16*20+ 3] = 0.135481233 ;
+ daa[16*20+ 4] = 0.011975266 ;
+ daa[16*20+ 5] = 0.602340963 ;
+ daa[16*20+ 6] = 0.280124895 ;
+ daa[16*20+ 7] = 0.01880803 ;
+ daa[16*20+ 8] = 0.368713573 ;
+ daa[16*20+ 9] = 2.904052286 ;
+ daa[16*20+10] = 0.044926357 ;
+ daa[16*20+11] = 1.5269642 ;
+ daa[16*20+12] = 2.031511321 ;
+ daa[16*20+13] = 0.000134906 ;
+ daa[16*20+14] = 0.542251094 ;
+ daa[16*20+15] = 2.206859934 ;
+ daa[17*20+ 0] = 1.96E-01 ;
+ daa[17*20+ 1] = 1.369429408 ;
+ daa[17*20+ 2] = 5.36E-04 ;
+ daa[17*20+ 3] = 1.49E-05 ;
+ daa[17*20+ 4] = 0.09410668 ;
+ daa[17*20+ 5] = 4.40E-02 ;
+ daa[17*20+ 6] = 0.155245492 ;
+ daa[17*20+ 7] = 0.196486447 ;
+ daa[17*20+ 8] = 2.24E-02 ;
+ daa[17*20+ 9] = 0.03213215 ;
+ daa[17*20+10] = 0.431277663 ;
+ daa[17*20+11] = 4.98E-05 ;
+ daa[17*20+12] = 0.070460039 ;
+ daa[17*20+13] = 0.814753094 ;
+ daa[17*20+14] = 0.000431021 ;
+ daa[17*20+15] = 0.099835753 ;
+ daa[17*20+16] = 0.207066206 ;
+ daa[18*20+ 0] = 0.018289288 ;
+ daa[18*20+ 1] = 0.099855497 ;
+ daa[18*20+ 2] = 0.373101927 ;
+ daa[18*20+ 3] = 0.525398543 ;
+ daa[18*20+ 4] = 0.601692431 ;
+ daa[18*20+ 5] = 0.072205935 ;
+ daa[18*20+ 6] = 0.10409287 ;
+ daa[18*20+ 7] = 0.074814997 ;
+ daa[18*20+ 8] = 6.448954446 ;
+ daa[18*20+ 9] = 0.273934263 ;
+ daa[18*20+10] = 0.340058468 ;
+ daa[18*20+11] = 0.012416222 ;
+ daa[18*20+12] = 0.874272175 ;
+ daa[18*20+13] = 5.393924245 ;
+ daa[18*20+14] = 1.82E-04 ;
+ daa[18*20+15] = 0.39255224 ;
+ daa[18*20+16] = 0.12489802 ;
+ daa[18*20+17] = 0.42775543 ;
+ daa[19*20+ 0] = 3.53200527 ;
+ daa[19*20+ 1] = 0.103964386 ;
+ daa[19*20+ 2] = 0.010257517 ;
+ daa[19*20+ 3] = 0.297123975 ;
+ daa[19*20+ 4] = 0.054904564 ;
+ daa[19*20+ 5] = 0.406697814 ;
+ daa[19*20+ 6] = 0.285047948 ;
+ daa[19*20+ 7] = 0.337229619 ;
+ daa[19*20+ 8] = 0.098631355 ;
+ daa[19*20+ 9] = 14.39405219 ;
+ daa[19*20+10] = 0.890598579 ;
+ daa[19*20+11] = 0.07312793 ;
+ daa[19*20+12] = 4.904842235 ;
+ daa[19*20+13] = 0.592587985 ;
+ daa[19*20+14] = 0.058971975 ;
+ daa[19*20+15] = 0.088256423 ;
+ daa[19*20+16] = 0.654109108 ;
+ daa[19*20+17] = 0.256900461 ;
+ daa[19*20+18] = 0.167581647 ;
+
+
+
+ f[0] = 0.0471 ;
+ f[1] = 0.0509 ;
+ f[2] = 0.0742 ;
+ f[3] = 0.0479 ;
+ f[4] = 0.0250 ;
+ f[5] = 0.0333 ;
+ f[6] = 0.0546 ;
+ f[7] = 0.0764 ;
+ f[8] = 0.0200 ;
+ f[9] = 0.0671 ;
+ f[10] = 0.0715 ;
+ f[11] = 0.0568 ;
+ f[12] = 0.0181 ;
+ f[13] = 0.0305 ;
+ f[14] = 0.0507 ;
+ f[15] = 0.0884 ;
+ f[16] = 0.0743 ;
+ f[17] = 0.0185 ;
+ f[18] = 0.0315 ;
+ f[19] = 0.0632 ;
+ }
+ break;
+ default:
+ assert(0);
+ }
+ }
+
+
+ /*
+
+ TODO review frequency sums for fixed as well as empirical base frequencies !
+
+ NUMERICAL BUG fix, rounded AA freqs in some models, such that
+ they actually really sum to 1.0 +/- epsilon
+
+ {
+ double acc = 0.0;
+
+ for(i = 0; i < 20; i++)
+ acc += f[i];
+
+ printf("%1.80f\n", acc);
+ assert(acc == 1.0);
+ }
+ */
+
+
+
+ for (i=0; i<20; i++)
+ for (j=0; j<i; j++)
+ daa[j*20+i] = daa[i*20+j];
+
+
+ /*
+ for (i=0; i<20; i++)
+ {
+ for (j=0; j<20; j++)
+ {
+ if(i == j)
+ printf("0.0 ");
+ else
+ printf("%f ", daa[i * 20 + j]);
+ }
+ printf("\n");
+ }
+
+ for (i=0; i<20; i++)
+ printf("%f ", f[i]);
+ printf("\n");
+ */
+
+
+ max = 0;
+
+ for(i = 0; i < 19; i++)
+ for(j = i + 1; j < 20; j++)
+ {
+ q[i][j] = temp = daa[i * 20 + j];
+ if(temp > max)
+ max = temp;
+ }
+
+ scaler = AA_SCALE / max;
+
+ /* SCALING HAS BEEN RE-INTRODUCED TO RESOLVE NUMERICAL PROBLEMS */
+
+ r = 0;
+ for(i = 0; i < 19; i++)
+ {
+ for(j = i + 1; j < 20; j++)
+ {
+
+ q[i][j] *= scaler;
+
+
+ assert(q[i][j] <= AA_SCALE_PLUS_EPSILON);
+
+ initialRates[r++] = q[i][j];
+ }
+ }
+}
+
+
+
+
+static void mytred2(double **a, const int n, double *d, double *e)
+{
+ int l, k, j, i;
+ double scale, hh, h, g, f;
+
+ for (i = n; i > 1; i--)
+ {
+ l = i - 1;
+ h = 0.0;
+ scale = 0.0;
+
+ if (l > 1)
+ {
+ for (k = 1; k <= l; k++)
+ scale += fabs(a[k - 1][i - 1]);
+ if (scale == 0.0)
+ e[i - 1] = a[l - 1][i - 1];
+ else
+ {
+ for (k = 1; k <= l; k++)
+ {
+ a[k - 1][i - 1] /= scale;
+ h += a[k - 1][i - 1] * a[k - 1][i - 1];
+ }
+ f = a[l - 1][i - 1];
+ g = ((f > 0) ? -sqrt(h) : sqrt(h)); /* diff */
+ e[i - 1] = scale * g;
+ h -= f * g;
+ a[l - 1][i - 1] = f - g;
+ f = 0.0;
+ for (j = 1; j <= l; j++)
+ {
+ a[i - 1][j - 1] = a[j - 1][i - 1] / h;
+ g = 0.0;
+ for (k = 1; k <= j; k++)
+ g += a[k - 1][j - 1] * a[k - 1][i - 1];
+ for (k = j + 1; k <= l; k++)
+ g += a[j - 1][k - 1] * a[k - 1][i - 1];
+ e[j - 1] = g / h;
+ f += e[j - 1] * a[j - 1][i - 1];
+ }
+ hh = f / (h + h);
+ for (j = 1; j <= l; j++)
+ {
+ f = a[j - 1][i - 1];
+ g = e[j - 1] - hh * f;
+ e[j - 1] = g;
+ for (k = 1; k <= j; k++)
+ a[k - 1][j - 1] -= (f * e[k - 1] + g * a[k - 1][i - 1]);
+ }
+ }
+ }
+ else
+ e[i - 1] = a[l - 1][i - 1];
+ d[i - 1] = h;
+ }
+ d[0] = 0.0;
+ e[0] = 0.0;
+
+ for (i = 1; i <= n; i++)
+ {
+ l = i - 1;
+ if (d[i - 1] != 0.0)
+ {
+ for (j = 1; j <= l; j++)
+ {
+ g = 0.0;
+ for (k = 1; k <= l; k++)
+ g += a[k - 1][i - 1] * a[j - 1][k - 1];
+ for(k = 1; k <= l; k++)
+ a[j - 1][k - 1] -= g * a[i - 1][k - 1];
+ }
+ }
+ d[i - 1] = a[i - 1][i - 1];
+ a[i - 1][i - 1] = 1.0;
+ for (j = 1; j <= l; j++)
+ a[i - 1][j - 1] = a[j - 1][i - 1] = 0.0;
+ }
+
+
+}
+/*#define MYSIGN(a,b) ((b)<0 ? -fabs(a) : fabs(a))*/
+
+static int mytqli(double *d, double *e, const int n, double **z)
+{
+ int m, l, iter, i, k;
+ double s, r, p, g, f, dd, c, b;
+
+ for (i = 2; i <= n; i++)
+ e[i - 2] = e[i - 1];
+
+ e[n - 1] = 0.0;
+
+ for (l = 1; l <= n; l++)
+ {
+ iter = 0;
+ do
+ {
+ for (m = l; m <= n - 1; m++)
+ {
+ dd = fabs(d[m - 1]) + fabs(d[m]);
+ if (fabs(e[m - 1]) + dd == dd)
+ break;
+ }
+
+ if (m != l)
+ {
+ assert(iter < 30);
+
+ g = (d[l] - d[l - 1]) / (2.0 * e[l - 1]);
+ r = sqrt((g * g) + 1.0);
+ g = d[m - 1] - d[l - 1] + e[l - 1] / (g + ((g < 0)?-fabs(r):fabs(r)));/*MYSIGN(r, g));*/
+ s = c = 1.0;
+ p = 0.0;
+
+ for (i = m - 1; i >= l; i--)
+ {
+ f = s * e[i - 1];
+ b = c * e[i - 1];
+ if (fabs(f) >= fabs(g))
+ {
+ c = g / f;
+ r = sqrt((c * c) + 1.0);
+ e[i] = f * r;
+ c *= (s = 1.0 / r);
+ }
+ else
+ {
+ s = f / g;
+ r = sqrt((s * s) + 1.0);
+ e[i] = g * r;
+ s *= (c = 1.0 / r);
+ }
+ g = d[i] - p;
+ r = (d[i - 1] - g) * s + 2.0 * c * b;
+ p = s * r;
+ d[i] = g + p;
+ g = c * r - b;
+ for (k = 1; k <= n; k++)
+ {
+ f = z[i][k-1];
+ z[i][k-1] = s * z[i - 1][k - 1] + c * f;
+ z[i - 1][k - 1] = c * z[i - 1][k - 1] - s * f;
+ }
+ }
+
+ d[l - 1] = d[l - 1] - p;
+ e[l - 1] = g;
+ e[m - 1] = 0.0;
+ }
+ }
+ while (m != l);
+ }
+
+
+
+ return (1);
+ }
+
+
+static void makeEigen(double **_a, const int n, double *d, double *e)
+{
+ mytred2(_a, n, d, e);
+ mytqli(d, e, n, _a);
+}
+
+static void initGeneric(const int n, const unsigned int *valueVector, int valueVectorLength,
+ double *ext_EIGN,
+ double *EV,
+ double *EI,
+ double *frequencies,
+ double *ext_initialRates,
+ double *tipVector,
+ int model)
+{
+ double
+ fracchange = 0.0,
+ **r,
+ **a,
+ **EIGV,
+ *initialRates = ext_initialRates,
+ *f,
+ *e,
+ *d,
+ *invfreq,
+ *EIGN,
+ *eptr;
+
+ int
+ i,
+ j,
+ k,
+ m,
+ l;
+
+ r = (double **)malloc(n * sizeof(double *));
+ EIGV = (double **)malloc(n * sizeof(double *));
+ a = (double **)malloc(n * sizeof(double *));
+
+ for(i = 0; i < n; i++)
+ {
+ a[i] = (double*)malloc(n * sizeof(double));
+ EIGV[i] = (double*)malloc(n * sizeof(double));
+ r[i] = (double*)malloc(n * sizeof(double));
+ }
+
+ f = (double*)malloc(n * sizeof(double));
+ e = (double*)malloc(n * sizeof(double));
+ d = (double*)malloc(n * sizeof(double));
+ invfreq = (double*)malloc(n * sizeof(double));
+ EIGN = (double*)malloc(n * sizeof(double));
+
+ for(l = 0; l < n; l++)
+ f[l] = frequencies[l];
+
+
+ i = 0;
+
+ for(j = 0; j < n; j++)
+ for(k = 0; k < n; k++)
+ r[j][k] = 0.0;
+
+ for(j = 0; j < n - 1; j++)
+ for (k = j+1; k < n; k++)
+ r[j][k] = initialRates[i++];
+
+ for (j = 0; j < n; j++)
+ {
+ r[j][j] = 0.0;
+ for (k = 0; k < j; k++)
+ r[j][k] = r[k][j];
+ }
+
+ for (j = 0; j< n; j++)
+ for (k = 0; k< n; k++)
+ fracchange += f[j] * r[j][k] * f[k];
+
+ m = 0;
+
+ for(i=0; i< n; i++)
+ a[i][i] = 0;
+
+ /* assert(r[n - 2][n - 1] == 1.0);*/
+
+ for(i=0; i < n; i++)
+ {
+ for(j=i+1; j < n; j++)
+ {
+ double factor = initialRates[m++];
+ a[i][j] = a[j][i] = factor * sqrt( f[i] * f[j]);
+ a[i][i] -= factor * f[j];
+ a[j][j] -= factor * f[i];
+ }
+ }
+
+ makeEigen(a, n, d, e);
+
+
+
+ for(i=0; i<n; i++)
+ for(j=0; j<n; j++)
+ a[i][j] *= sqrt(f[j]);
+
+
+
+ for (i=0; i<n; i++)
+ {
+ if (d[i] > -1e-8)
+ {
+ if (i != 0)
+ {
+ double tmp = d[i], sum=0;
+ d[i] = d[0];
+ d[0] = tmp;
+ for (j=0; j < n; j++)
+ {
+ tmp = a[i][j];
+ a[i][j] = a[0][j];
+ sum += (a[0][j] = tmp);
+ }
+ for (j=0; j < n; j++)
+ a[0][j] /= sum;
+ }
+ break;
+ }
+ }
+
+ for (i=0; i< n; i++)
+ {
+ EIGN[i] = -d[i];
+
+ for (j=0; j<n; j++)
+ EIGV[i][j] = a[j][i];
+ invfreq[i] = 1 / EIGV[i][0];
+ }
+
+ ext_EIGN[0] = 0.0;
+
+ for(l = 1; l < n; l++)
+ {
+ ext_EIGN[l] = EIGN[l] * (1.0 / fracchange);
+ assert(ext_EIGN[l] > 0.0);
+ }
+
+ eptr = EV;
+
+ for(i = 0; i < n; i++)
+ for(j = 0; j < n; j++)
+ {
+ *eptr++ = EIGV[i][j];
+
+ }
+
+ for(i = 0; i < n; i++)
+ for(j = 0; j < n; j++)
+ {
+ if(j == 0)
+ EI[i * n + j] = 1.0;
+ else
+ EI[i * n + j] = EV[i * n + j] * invfreq[i];
+ }
+
+ /*
+ printf("EIGN\n");
+
+ for(i = 0; i < n; i++)
+ printf("%f ", ext_EIGN[i]);
+ printf("\n");
+
+ printf("EI\n");
+ for(i = 0; i < n; i++)
+ {
+ for(j = 0; j < n; j++)
+ {
+ printf("%f ", EI[i * n + j]);
+ }
+ printf("\n");
+ }
+ */
+
+
+
+ for(i=0; i < valueVectorLength; i++)
+ {
+ unsigned int value = valueVector[i];
+
+ for(j = 0; j < n; j++)
+ tipVector[i * n + j] = 0;
+
+ if(value > 0)
+ {
+ for (j = 0; j < n; j++)
+ {
+ if ((value >> j) & 1)
+ {
+ int l;
+ for(l = 0; l < n; l++)
+ tipVector[i * n + l] += EIGV[j][l];
+ }
+ }
+ }
+ }
+
+ for(i = 0; i < valueVectorLength; i++)
+ {
+ for(j = 0; j < n; j++)
+ if(tipVector[i * n + j] > MAX_TIP_EV)
+ tipVector[i * n + j] = MAX_TIP_EV;
+ }
+
+
+
+
+ for(i = 0; i < n; i++)
+ {
+ free(EIGV[i]);
+ free(a[i]);
+ free(r[i]);
+ }
+
+ free(r);
+ free(a);
+ free(EIGV);
+
+ free(f);
+ free(e);
+ free(d);
+ free(invfreq);
+ free(EIGN);
+}
+
+
+
+
+void initReversibleGTR(tree *tr, int model)
+{
+ double
+ *ext_EIGN = tr->partitionData[model].EIGN,
+ *EV = tr->partitionData[model].EV,
+ *EI = tr->partitionData[model].EI,
+ *frequencies = tr->partitionData[model].frequencies,
+ *ext_initialRates = tr->partitionData[model].substRates,
+ *tipVector = tr->partitionData[model].tipVector;
+
+ int
+ states = tr->partitionData[model].states;
+
+ switch(tr->partitionData[model].dataType)
+ {
+ case GENERIC_32:
+ case GENERIC_64:
+ case SECONDARY_DATA_6:
+ case SECONDARY_DATA_7:
+ case SECONDARY_DATA:
+ case DNA_DATA:
+ case BINARY_DATA:
+ initGeneric(states,
+ getBitVector(tr->partitionData[model].dataType),
+ getUndetermined(tr->partitionData[model].dataType) + 1,
+ ext_EIGN,
+ EV,
+ EI,
+ frequencies,
+ ext_initialRates,
+ tipVector,
+ model);
+ break;
+ case AA_DATA:
+ if(tr->partitionData[model].protModels != GTR)
+ {
+ double
+ f[20];
+
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+ {
+ int
+ i;
+
+ for(i = 0; i < 4; i++)
+ {
+ initProtMat(f, tr->partitionData[model].protModels, &(tr->partitionData[model].substRates_LG4[i][0]), i);
+
+ if(!tr->partitionData[model].protFreqs && !tr->partitionData[model].optimizeBaseFrequencies)
+ memcpy(tr->partitionData[model].frequencies_LG4[i], f, 20 * sizeof(double));
+ //for(l = 0; l < 20; l++)
+ // tr->partitionData[model].frequencies_LG4[i][l] = f[l];
+ else
+ memcpy(tr->partitionData[model].frequencies_LG4[i], frequencies, 20 * sizeof(double));
+ }
+ }
+ else
+ {
+ if(tr->partitionData[model].protModels == AUTO)
+ initProtMat(f, tr->partitionData[model].autoProtModels, ext_initialRates, 0);
+ else
+ initProtMat(f, tr->partitionData[model].protModels, ext_initialRates, 0);
+
+ /*if(adef->protEmpiricalFreqs && tr->NumberOfModels == 1)
+ assert(tr->partitionData[model].protFreqs);*/
+
+ if(tr->partitionData[model].protModels == AUTO)
+ {
+ if(tr->partitionData[model].protFreqs)
+ memcpy(frequencies, f, 20 * sizeof(double));
+ else
+ memcpy(frequencies, tr->partitionData[model].empiricalFrequencies, 20 * sizeof(double));
+ }
+ else
+ {
+ if(!tr->partitionData[model].optimizeBaseFrequencies)
+ {
+ if(!tr->partitionData[model].protFreqs)
+ {
+ memcpy(frequencies, f, 20 * sizeof(double));
+ /*for(l = 0; l < 20; l++)
+ frequencies[l] = f[l]; */
+ }
+ else
+ {
+ memcpy(frequencies, tr->partitionData[model].empiricalFrequencies, 20 * sizeof(double));
+ /*for(l = 0; l < 20; l++)
+ frequencies[l] = tr->partitionData[model].empiricalFrequencies[l]; */
+ }
+ }
+ }
+ }
+ }
+
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+ {
+ int
+ i;
+
+ for(i = 0; i < 4; i++)
+ initGeneric(states, bitVectorAA, 23,
+ tr->partitionData[model].rawEIGN_LG4[i], tr->partitionData[model].EV_LG4[i],
+ tr->partitionData[model].EI_LG4[i], tr->partitionData[model].frequencies_LG4[i],
+ tr->partitionData[model].substRates_LG4[i],
+ tr->partitionData[model].tipVector_LG4[i],
+ model);
+
+ scaleLG4X_EIGN(tr, model);
+ }
+ else
+ initGeneric(states, bitVectorAA, 23,
+ ext_EIGN, EV, EI, frequencies, ext_initialRates,
+ tipVector,
+ model);
+ break;
+ default:
+ assert(0);
+ }
+
+#ifdef __MIC_NATIVE
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+ updateModel_LG4_MIC(&tr->partitionData[model]);
+ else
+ updateModel_MIC(&tr->partitionData[model]);
+#endif
+}
+
+
+double LnGamma (double alpha)
+{
+/* returns ln(gamma(alpha)) for alpha>0, accurate to 10 decimal places.
+ Stirling's formula is used for the central polynomial part of the procedure.
+ Pike MC & Hill ID (1966) Algorithm 291: Logarithm of the gamma function.
+ Communications of the Association for Computing Machinery, 9:684
+*/
+ double x, f, z, result;
+
+ x = alpha;
+ f = 0.0;
+
+ if ( x < 7.0)
+ {
+ f = 1.0;
+ z = alpha - 1.0;
+
+ while ((z = z + 1.0) < 7.0)
+ {
+ f *= z;
+ }
+ x = z;
+
+ assert(f != 0.0);
+
+ f=-log(f);
+ }
+
+ z = 1/(x*x);
+
+ result = f + (x-0.5)*log(x) - x + .918938533204673
+ + (((-.000595238095238*z+.000793650793651)*z-.002777777777778)*z
+ +.083333333333333)/x;
+
+ return result;
+}
+
+
+
+double IncompleteGamma (double x, double alpha, double ln_gamma_alpha)
+{
+/* returns the incomplete gamma ratio I(x,alpha) where x is the upper
+ limit of the integration and alpha is the shape parameter.
+ returns (-1) if in error
+ ln_gamma_alpha = ln(Gamma(alpha)), is almost redundant.
+ (1) series expansion if (alpha>x || x<=1)
+ (2) continued fraction otherwise
+ RATNEST FORTRAN by
+ Bhattacharjee GP (1970) The incomplete gamma integral. Applied Statistics,
+ 19: 285-287 (AS32)
+*/
+ int i;
+ double p=alpha, g=ln_gamma_alpha;
+ double accurate=1e-8, overflow=1e30;
+ double factor, gin=0, rn=0, a=0,b=0,an=0,dif=0, term=0, pn[6];
+
+
+ if (x==0) return (0);
+ if (x<0 || p<=0) return (-1);
+
+
+ factor=exp(p*log(x)-x-g);
+ if (x>1 && x>=p) goto l30;
+ /* (1) series expansion */
+ gin=1; term=1; rn=p;
+ l20:
+ rn++;
+ term*=x/rn; gin+=term;
+
+ if (term > accurate) goto l20;
+ gin*=factor/p;
+ goto l50;
+ l30:
+ /* (2) continued fraction */
+ a=1-p; b=a+x+1; term=0;
+ pn[0]=1; pn[1]=x; pn[2]=x+1; pn[3]=x*b;
+ gin=pn[2]/pn[3];
+ l32:
+ a++;
+ b+=2;
+ term++;
+ an=a*term;
+ for (i=0; i<2; i++)
+ pn[i+4]=b*pn[i+2]-an*pn[i];
+ if (pn[5] == 0) goto l35;
+ rn=pn[4]/pn[5];
+ dif=fabs(gin-rn);
+ if (dif>accurate) goto l34;
+ if (dif<=accurate*rn) goto l42;
+ l34:
+ gin=rn;
+ l35:
+ for (i=0; i<4; i++)
+ pn[i]=pn[i+2];
+ if (fabs(pn[4]) < overflow)
+ goto l32;
+
+ for (i=0; i<4; i++)
+ pn[i]/=overflow;
+
+
+ goto l32;
+ l42:
+ gin=1-factor*gin;
+
+ l50:
+ return (gin);
+}
+
+
+
+
+double PointNormal (double prob)
+{
+/* returns z so that Prob{x<z}=prob where x ~ N(0,1) and (1e-12)<prob<1-(1e-12)
+ returns (-9999) if in error
+ Odeh RE & Evans JO (1974) The percentage points of the normal distribution.
+ Applied Statistics 22: 96-97 (AS70)
+
+ Newer methods:
+ Wichura MJ (1988) Algorithm AS 241: the percentage points of the
+ normal distribution. 37: 477-484.
+ Beasley JD & Springer SG (1977). Algorithm AS 111: the percentage
+ points of the normal distribution. 26: 118-121.
+
+*/
+ double a0=-.322232431088, a1=-1, a2=-.342242088547, a3=-.0204231210245;
+ double a4=-.453642210148e-4, b0=.0993484626060, b1=.588581570495;
+ double b2=.531103462366, b3=.103537752850, b4=.0038560700634;
+ double y, z=0, p=prob, p1;
+
+ p1 = (p<0.5 ? p : 1-p);
+ if (p1<1e-20) return (-9999);
+
+ y = sqrt (log(1/(p1*p1)));
+ z = y + ((((y*a4+a3)*y+a2)*y+a1)*y+a0) / ((((y*b4+b3)*y+b2)*y+b1)*y+b0);
+ return (p<0.5 ? -z : z);
+}
+
+
+double PointChi2 (double prob, double v)
+{
+/* returns z so that Prob{x<z}=prob where x is Chi2 distributed with df=v
+ returns -1 if in error. 0.000002<prob<0.999998
+ RATNEST FORTRAN by
+ Best DJ & Roberts DE (1975) The percentage points of the
+ Chi2 distribution. Applied Statistics 24: 385-388. (AS91)
+ Converted into C by Ziheng Yang, Oct. 1993.
+*/
+ double e=.5e-6, aa=.6931471805, p=prob, g;
+ double xx, c, ch, a=0,q=0,p1=0,p2=0,t=0,x=0,b=0,s1,s2,s3,s4,s5,s6;
+
+ if (p<.000002 || p>.999998 || v<=0) return (-1);
+
+ g = LnGamma(v/2);
+
+ xx=v/2; c=xx-1;
+ if (v >= -1.24*log(p)) goto l1;
+
+ ch=pow((p*xx*exp(g+xx*aa)), 1/xx);
+ if (ch-e<0) return (ch);
+ goto l4;
+l1:
+ if (v>.32) goto l3;
+ ch=0.4; a=log(1-p);
+l2:
+ q=ch; p1=1+ch*(4.67+ch); p2=ch*(6.73+ch*(6.66+ch));
+ t=-0.5+(4.67+2*ch)/p1 - (6.73+ch*(13.32+3*ch))/p2;
+ ch-=(1-exp(a+g+.5*ch+c*aa)*p2/p1)/t;
+ if (fabs(q/ch-1)-.01 <= 0) goto l4;
+ else goto l2;
+
+l3:
+ x=PointNormal (p);
+ p1=0.222222/v; ch=v*pow((x*sqrt(p1)+1-p1), 3.0);
+ if (ch>2.2*v+6) ch=-2*(log(1-p)-c*log(.5*ch)+g);
+l4:
+ q=ch; p1=.5*ch;
+ if ((t=IncompleteGamma (p1, xx, g))< 0.0)
+ {
+ printf ("IncompleteGamma \n");
+ return (-1);
+ }
+
+ p2=p-t;
+ t=p2*exp(xx*aa+g+p1-c*log(ch));
+ b=t/ch; a=0.5*t-b*c;
+
+ s1=(210+a*(140+a*(105+a*(84+a*(70+60*a))))) / 420;
+ s2=(420+a*(735+a*(966+a*(1141+1278*a))))/2520;
+ s3=(210+a*(462+a*(707+932*a)))/2520;
+ s4=(252+a*(672+1182*a)+c*(294+a*(889+1740*a)))/5040;
+ s5=(84+264*a+c*(175+606*a))/2520;
+ s6=(120+c*(346+127*c))/5040;
+ ch+=t*(1+0.5*t*s1-b*c*(s1-b*(s2-b*(s3-b*(s4-b*(s5-b*s6))))));
+ if (fabs(q/ch-1) > e) goto l4;
+
+ return (ch);
+}
+
+
+
+
+
+
+void makeGammaCats(double alpha, double *gammaRates, int K, boolean useMedian)
+{
+ int
+ i;
+
+ double
+ factor = alpha / alpha * K,
+ lnga1,
+ alfa = alpha,
+ beta = alpha,
+ *gammaProbs = (double *)malloc(K * sizeof(double));
+
+ /* Note that ALPHA_MIN setting is somewhat critical due to */
+ /* numerical instability caused by very small rate[0] values */
+ /* induced by low alpha values around 0.01 */
+
+ assert(alfa >= ALPHA_MIN);
+
+ if(useMedian)
+ {
+ double
+ middle = 1.0 / (2.0*K),
+ t = 0.0;
+
+ for(i = 0; i < K; i++)
+ gammaRates[i] = PointGamma((double)(i * 2 + 1) * middle, alfa, beta);
+
+ for (i = 0; i < K; i++)
+ t += gammaRates[i];
+ for( i = 0; i < K; i++)
+ gammaRates[i] *= factor / t;
+ }
+ else
+ {
+ lnga1 = LnGamma(alfa + 1);
+
+ for (i = 0; i < K - 1; i++)
+ gammaProbs[i] = PointGamma((i + 1.0) / K, alfa, beta);
+
+ for (i = 0; i < K - 1; i++)
+ gammaProbs[i] = IncompleteGamma(gammaProbs[i] * beta, alfa + 1, lnga1);
+
+ gammaRates[0] = gammaProbs[0] * factor;
+
+ gammaRates[K - 1] = (1 - gammaProbs[K - 2]) * factor;
+
+ for (i= 1; i < K - 1; i++)
+ gammaRates[i] = (gammaProbs[i] - gammaProbs[i - 1]) * factor;
+ }
+ /* assert(gammaRates[0] >= 0.00000000000000000000000000000044136090435925743185910935350715027016962154188875); */
+
+ free(gammaProbs);
+
+ return;
+}
+
+
+static void setRates(double *r, int rates)
+{
+ int i;
+
+ //changed to 1.0 instead of 0.5 for making the
+ //implementation of an interface function to set other models
+ //than GTR easier
+
+ for(i = 0; i < rates - 1; i++)
+ r[i] = 1.0;
+
+ r[rates - 1] = 1.0;
+}
+
+void initRateMatrix(tree *tr)
+{
+ int model;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ int
+ i,
+ states = tr->partitionData[model].states,
+ rates = (states * states - states) / 2;
+
+ switch(tr->partitionData[model].dataType)
+ {
+ case BINARY_DATA:
+ case DNA_DATA:
+ case SECONDARY_DATA:
+ case SECONDARY_DATA_6:
+ case SECONDARY_DATA_7:
+ setRates(tr->partitionData[model].substRates, rates);
+ break;
+ case GENERIC_32:
+ case GENERIC_64:
+ switch(tr->multiStateModel)
+ {
+ case ORDERED_MULTI_STATE:
+ {
+ int
+ j,
+ k,
+ i = 0;
+
+ for(j = 0; j < states; j++)
+ for(k = j + 1; k < states; k++)
+ tr->partitionData[model].substRates[i++] = (double)(k - j);
+ assert(i == rates);
+ }
+ break;
+ case MK_MULTI_STATE:
+ for(i = 0; i < rates; i++)
+ tr->partitionData[model].substRates[i] = 1.0;
+
+ break;
+ case GTR_MULTI_STATE:
+ setRates(tr->partitionData[model].substRates, rates);
+ break;
+ default:
+ assert(0);
+ }
+ break;
+ case AA_DATA:
+ if(tr->partitionData[model].protModels == GTR)
+ putWAG(tr->partitionData[model].substRates);
+ break;
+ default:
+ assert(0);
+ }
+
+ if(tr->partitionData[model].nonGTR)
+ {
+ assert(tr->partitionData[model].dataType == SECONDARY_DATA ||
+ tr->partitionData[model].dataType == SECONDARY_DATA_6 ||
+ tr->partitionData[model].dataType == SECONDARY_DATA_7);
+
+ for(i = 0; i < rates; i++)
+ {
+ if(tr->partitionData[model].symmetryVector[i] == -1)
+ tr->partitionData[model].substRates[i] = 0.0;
+ else
+ {
+ if(tr->partitionData[model].symmetryVector[i] == tr->partitionData[model].symmetryVector[rates - 1])
+ tr->partitionData[model].substRates[i] = 1.0;
+ }
+ }
+ }
+ }
+}
+
+static void setSymmetry(int *s, int *sDest, const int sCount, int *f, int *fDest, const int fCount)
+{
+ int i;
+
+ for(i = 0; i < sCount; i++)
+ sDest[i] = s[i];
+
+ for(i = 0; i < fCount; i++)
+ fDest[i] = f[i];
+}
+
+static void setupSecondaryStructureSymmetries(tree *tr)
+{
+ int model;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ if(tr->partitionData[model].dataType == SECONDARY_DATA ||
+ tr->partitionData[model].dataType == SECONDARY_DATA_6 ||
+ tr->partitionData[model].dataType == SECONDARY_DATA_7)
+ {
+ switch(tr->secondaryStructureModel)
+ {
+ case SEC_6_A:
+ tr->partitionData[model].nonGTR = FALSE;
+ break;
+ case SEC_6_B:
+ {
+ int f[6] = {0, 1, 2, 3, 4, 5};
+ int s[15] = {2, 0, 1, 2, 2, 2, 2, 0, 1, 1, 2, 2, 2, 2, 1};
+
+ setSymmetry(s, tr->partitionData[model].symmetryVector, 15, f, tr->partitionData[model].frequencyGrouping, 6);
+
+ tr->partitionData[model].nonGTR = TRUE;
+ }
+ break;
+ case SEC_6_C:
+ {
+ int f[6] = {0, 2, 2, 1, 0, 1};
+ int s[15] = {2, 0, 1, 2, 2, 2, 2, 0, 1, 1, 2, 2, 2, 2, 1};
+
+ setSymmetry(s, tr->partitionData[model].symmetryVector, 15, f, tr->partitionData[model].frequencyGrouping, 6);
+
+ tr->partitionData[model].nonGTR = TRUE;
+ }
+ break;
+ case SEC_6_D:
+ {
+ int f[6] = {0, 2, 2, 1, 0, 1};
+ int s[15] = {2, -1, 1, 2, 2, 2, 2, -1, 1, 1, 2, 2, 2, 2, 1};
+
+ setSymmetry(s, tr->partitionData[model].symmetryVector, 15, f, tr->partitionData[model].frequencyGrouping, 6);
+
+ tr->partitionData[model].nonGTR = TRUE;
+ }
+ break;
+ case SEC_6_E:
+ {
+ int f[6] = {0, 1, 2, 3, 4, 5};
+ int s[15] = {2, -1, 1, 2, 2, 2, 2, -1, 1, 1, 2, 2, 2, 2, 1};
+
+ setSymmetry(s, tr->partitionData[model].symmetryVector, 15, f, tr->partitionData[model].frequencyGrouping, 6);
+
+ tr->partitionData[model].nonGTR = TRUE;
+ }
+ break;
+ case SEC_7_A:
+ tr->partitionData[model].nonGTR = FALSE;
+ break;
+ case SEC_7_B:
+ {
+ int f[7] = {0, 2, 2, 1, 0, 1, 3};
+ int s[21] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20};
+
+ setSymmetry(s, tr->partitionData[model].symmetryVector, 21, f, tr->partitionData[model].frequencyGrouping, 7);
+
+ tr->partitionData[model].nonGTR = TRUE;
+
+ }
+ break;
+ case SEC_7_C:
+ {
+ int f[7] = {0, 1, 2, 3, 4, 5, 6};
+ int s[21] = {-1, -1, 0, -1, -1, 4, -1, -1, -1, 3, 5, 1, -1, -1, 6, -1, -1, 7, 2, 8, 9};
+
+ setSymmetry(s, tr->partitionData[model].symmetryVector, 21, f, tr->partitionData[model].frequencyGrouping, 7);
+
+ tr->partitionData[model].nonGTR = TRUE;
+
+ }
+ break;
+ case SEC_7_D:
+ {
+ int f[7] = {0, 1, 2, 3, 4, 5, 6};
+ int s[21] = {2, 0, 1, 2, 2, 3, 2, 2, 0, 1, 3, 1, 2, 2, 3, 2, 2, 3, 1, 3, 3};
+
+ setSymmetry(s, tr->partitionData[model].symmetryVector, 21, f, tr->partitionData[model].frequencyGrouping, 7);
+
+ tr->partitionData[model].nonGTR = TRUE;
+
+ }
+ break;
+ case SEC_7_E:
+ {
+ int f[7] = {0, 1, 2, 3, 4, 5, 6};
+ int s[21] = {-1, -1, 0, -1, -1, 1, -1, -1, -1, 0, 1, 0, -1, -1, 1, -1, -1, 1, 0, 1, 1};
+
+ setSymmetry(s, tr->partitionData[model].symmetryVector, 21, f, tr->partitionData[model].frequencyGrouping, 7);
+
+ tr->partitionData[model].nonGTR = TRUE;
+
+ }
+ break;
+ case SEC_7_F:
+ {
+ int f[7] = {0, 2, 2, 1, 0, 1, 3};
+ int s[21] = {2, 0, 1, 2, 2, 3, 2, 2, 0, 1, 3, 1, 2, 2, 3, 2, 2, 3, 1, 3, 3};
+
+ setSymmetry(s, tr->partitionData[model].symmetryVector, 21, f, tr->partitionData[model].frequencyGrouping, 7);
+
+ tr->partitionData[model].nonGTR = TRUE;
+
+ }
+ break;
+
+ case SEC_16:
+ tr->partitionData[1].nonGTR = FALSE;
+ break;
+ case SEC_16_A:
+ {
+ int f[16] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
+ int s[120] = {/* AA */ 4, 4, 3, 4, -1, -1, -1, 4, -1, -1, -1, 3, -1, -1, -1,
+ /* AC */ 4, 3, -1, 4, -1, -1, -1, 3, -1, -1, -1, 4, -1, -1,
+ /* AG */ 3, -1, -1, 3, -1, -1, -1, 4, -1, -1, -1, 3, -1,
+ /* AU */ -1, -1, 2, 3, -1, 0, -1, 1, 2, -1, 2, 3,
+ /* CA */ 4, 3, 4, 4, -1, -1, -1, 3, -1, -1, -1,
+ /* CC */ 3, 4, -1, 3, -1, -1, -1, 4, -1, -1,
+ /* CG */ 3, -1, 2, 3, 2, 0, -1, 1, -1,
+ /* CU */ -1, -1, -1, 3, -1, -1, -1, 4,
+ /* GA */ 3, 4, 3, 3, -1, -1, -1,
+ /* GC */ 3, 1, 2, 3, 2, -1,
+ /* GG */ 3, -1, -1, 3, -1,
+ /* GU */ 2, -1, 2, 3,
+ /* UA */ 3, 1, 3,
+ /* UC */ 3, 4,
+ /* UG */ 3};
+
+
+ setSymmetry(s, tr->partitionData[model].symmetryVector, 120, f, tr->partitionData[model].frequencyGrouping, 16);
+
+ tr->partitionData[model].nonGTR = TRUE;
+
+ }
+ break;
+ case SEC_16_B:
+ {
+ int f[16] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
+ int s[120] = {/* AA */ 0, 0, 0, 0, -1, -1, -1, 0, -1, -1, -1, 0, -1, -1, -1,
+ /* AC */ 0, 0, -1, 0, -1, -1, -1, 0, -1, -1, -1, 0, -1, -1,
+ /* AG */ 0, -1, -1, 0, -1, -1, -1, 0, -1, -1, -1, 0, -1,
+ /* AU */ -1, -1, 0, 0, -1, 0, -1, 0, 0, -1, 0, 0,
+ /* CA */ 0, 0, 0, 0, -1, -1, -1, 0, -1, -1, -1,
+ /* CC */ 0, 0, -1, 0, -1, -1, -1, 0, -1, -1,
+ /* CG */ 0, -1, 0, 0, 0, 0, -1, 0, -1,
+ /* CU */ -1, -1, -1, 0, -1, -1, -1, 0,
+ /* GA */ 0, 0, 0, 0, -1, -1, -1,
+ /* GC */ 0, 0, 0, 0, 0, -1,
+ /* GG */ 0, -1, -1, 0, -1,
+ /* GU */ 0, -1, 0, 0,
+ /* UA */ 0, 0, 0,
+ /* UC */ 0, 0,
+ /* UG */ 0};
+
+
+ setSymmetry(s, tr->partitionData[model].symmetryVector, 120, f, tr->partitionData[model].frequencyGrouping, 16);
+
+ tr->partitionData[model].nonGTR = TRUE;
+ }
+ break;
+ case SEC_16_C:
+ case SEC_16_D:
+ case SEC_16_E:
+ case SEC_16_F:
+ case SEC_16_I:
+ case SEC_16_J:
+ case SEC_16_K:
+ assert(0);
+ default:
+ assert(0);
+ }
+ }
+
+ }
+
+}
+
+
+/* this function is only called once at program start-up ! */
+
+static void initializeBaseFreqs(tree *tr)
+{
+ size_t
+ model;
+
+ for(model = 0; model < (size_t)tr->NumberOfModels; model++)
+ {
+ if(tr->partitionData[model].optimizeBaseFrequencies)
+ {
+ //set all base frequencies to identical starting values 1.0 / numberOfDataStates
+ //if we want to optimize base freqeuncies for the current partition
+
+ int
+ l,
+ numFreqs = tr->partitionData[model].states;
+
+ double
+ f = 1.0 / ((double)numFreqs);
+
+ for(l = 0; l < numFreqs; l++)
+ {
+ tr->partitionData[model].frequencies[l] = f;
+ tr->partitionData[model].empiricalFrequencies[l] = f;
+ }
+ }
+ else
+ {
+ //otherwise, at startup examl reads and stores the empirical base frequencies as determined by the
+ //parser code in .frequencies, now we just store them in .empiricalFrequencies such that we can
+ //overwrite .frequencies without losing the empirical base freqs
+
+ memcpy(tr->partitionData[model].empiricalFrequencies, tr->partitionData[model].frequencies, sizeof(double) * tr->partitionData[model].states);
+ }
+ }
+}
+
+/* this function is only called once at program start-up ! */
+
+void initModel(tree *tr)
+{
+ int
+ model;
+
+
+ optimizeRateCategoryInvocations = 1;
+ tr->numberOfInvariableColumns = 0;
+ tr->weightOfInvariableColumns = 0;
+
+ if(tr->rateHetModel == CAT)
+ {
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ tr->partitionData[model].numberOfCategories = 1;
+ tr->partitionData[model].perSiteRates[0] = 1.0;
+
+ size_t i;
+ for(i = 0; i < tr->partitionData[model].width; ++i)
+ {
+ tr->partitionData[model].rateCategory[i] = 0;
+ tr->partitionData[model].patrat[i] = 1.;
+ }
+ }
+
+ checkPerSiteRates(tr);
+ }
+
+ setupSecondaryStructureSymmetries(tr);
+
+ initRateMatrix(tr);
+
+ initializeBaseFreqs(tr);
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ int
+ k;
+
+ tr->partitionData[model].alpha = 1.0;
+
+ if(tr->partitionData[model].protModels == AUTO)
+ tr->partitionData[model].autoProtModels = WAG; /* initialize by WAG per default when AUTO is used */
+
+ makeGammaCats(tr->partitionData[model].alpha, tr->partitionData[model].gammaRates, 4, tr->useMedian);
+
+ for(k = 0; k < tr->partitionData[model].states; k++)
+ tr->partitionData[model].freqExponents[k] = 0.0;
+
+ //LG4X inits
+
+ for(k = 0; k < 4; k++)
+ {
+ tr->partitionData[model].weights[k] = 0.25;
+ tr->partitionData[model].weightExponents[k] = 0.0;
+ }
+
+ initReversibleGTR(tr, model);
+ }
+}
+
+
+
+
diff --git a/examl/newviewGenericSpecial.c b/examl/newviewGenericSpecial.c
new file mode 100644
index 0000000..b8e6daf
--- /dev/null
+++ b/examl/newviewGenericSpecial.c
@@ -0,0 +1,6218 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdint.h>
+#include <limits.h>
+#include "axml.h"
+
+#ifdef __SIM_SSE3
+
+#include <stdint.h>
+#include <xmmintrin.h>
+#include <pmmintrin.h>
+
+/* required to compute the absoliute values of double precision numbers with SSE3 */
+
+const union __attribute__ ((aligned (BYTE_ALIGNMENT)))
+{
+ uint64_t i[2];
+ __m128d m;
+} absMask = {{0x7fffffffffffffffULL , 0x7fffffffffffffffULL }};
+
+
+
+#endif
+
+/* includes MIC-optimized functions */
+
+#ifdef __MIC_NATIVE
+#include "mic_native.h"
+#endif
+
+extern int processID;
+
+/* bit mask */
+
+extern const unsigned int mask32[32];
+
+
+/* generic function for computing the P matrices, for computing the conditional likelihood at a node p, given child nodes q and r
+ we compute P(z1) and P(z2) here */
+
+static void makeP(double z1, double z2, double *rptr, double *EI, double *EIGN, int numberOfCategories, double *left, double *right, boolean saveMem, int maxCat, const int states)
+{
+ int
+ i,
+ j,
+ k,
+ /* square of the number of states = P-matrix size */
+ statesSquare = states * states;
+
+ /* assign some space for pre-computing and later re-using functions */
+
+ double
+ *lz1 = (double*)malloc(sizeof(double) * states),
+ *lz2 = (double*)malloc(sizeof(double) * states),
+ *d1 = (double*)malloc(sizeof(double) * states),
+ *d2 = (double*)malloc(sizeof(double) * states);
+
+ /* multiply branch lengths with eigenvalues */
+
+ for(i = 1; i < states; i++)
+ {
+ lz1[i] = EIGN[i] * z1;
+ lz2[i] = EIGN[i] * z2;
+ }
+
+
+ /* loop over the number of rate categories, this will be 4 for the GAMMA model and
+ variable for the CAT model */
+
+ for(i = 0; i < numberOfCategories; i++)
+ {
+ /* exponentiate the rate multiplied by the branch */
+
+ for(j = 1; j < states; j++)
+ {
+ d1[j] = EXP(rptr[i] * lz1[j]);
+ d2[j] = EXP(rptr[i] * lz2[j]);
+ }
+
+ /* now fill the P matrices for the two branch length values */
+
+ for(j = 0; j < states; j++)
+ {
+ /* left and right are pre-allocated arrays */
+
+ left[statesSquare * i + states * j] = 1.0;
+ right[statesSquare * i + states * j] = 1.0;
+
+ for(k = 1; k < states; k++)
+ {
+ left[statesSquare * i + states * j + k] = d1[k] * EI[states * j + k];
+ right[statesSquare * i + states * j + k] = d2[k] * EI[states * j + k];
+ }
+ }
+ }
+
+
+ /* if memory saving is enabled and we are using CAT we need to do one additional P matrix
+ calculation for a rate of 1.0 to compute the entries of a column/tree site comprising only gaps */
+
+
+ if(saveMem)
+ {
+ i = maxCat;
+
+ for(j = 1; j < states; j++)
+ {
+ d1[j] = EXP (lz1[j]);
+ d2[j] = EXP (lz2[j]);
+ }
+
+ for(j = 0; j < states; j++)
+ {
+ left[statesSquare * i + states * j] = 1.0;
+ right[statesSquare * i + states * j] = 1.0;
+
+ for(k = 1; k < states; k++)
+ {
+ left[statesSquare * i + states * j + k] = d1[k] * EI[states * j + k];
+ right[statesSquare * i + states * j + k] = d2[k] * EI[states * j + k];
+ }
+ }
+ }
+
+ /* free the temporary buffers */
+
+ free(lz1);
+ free(lz2);
+ free(d1);
+ free(d2);
+}
+
+static void makeP_FlexLG4(double z1, double z2, double *rptr, double *EI[4], double *EIGN[4], int numberOfCategories, double *left, double *right, const int numStates)
+{
+ int
+ i,
+ j,
+ k;
+
+ const int
+ statesSquare = numStates * numStates;
+
+ double
+ d1[64],
+ d2[64];
+
+ assert(numStates <= 64);
+
+ for(i = 0; i < numberOfCategories; i++)
+ {
+ for(j = 1; j < numStates; j++)
+ {
+ d1[j] = EXP (rptr[i] * EIGN[i][j] * z1);
+ d2[j] = EXP (rptr[i] * EIGN[i][j] * z2);
+ }
+
+ for(j = 0; j < numStates; j++)
+ {
+ left[statesSquare * i + numStates * j] = 1.0;
+ right[statesSquare * i + numStates * j] = 1.0;
+
+ for(k = 1; k < numStates; k++)
+ {
+ left[statesSquare * i + numStates * j + k] = d1[k] * EI[i][numStates * j + k];
+ right[statesSquare * i + numStates * j + k] = d2[k] * EI[i][numStates * j + k];
+ }
+ }
+ }
+}
+
+
+
+/* The functions here are organized in a similar way as in evaluateGenericSpecial.c
+ I provide generic, slow but readable function implementations for computing the
+ conditional likelihood arrays at p, given child nodes q and r. Once again we need
+ two generic function implementations, one for CAT and one for GAMMA */
+
+#ifndef _OPTIMIZED_FUNCTIONS
+
+static void newviewCAT_FLEX(int tipCase, double *extEV,
+ int *cptr,
+ double *x1, double *x2, double *x3, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement, const int states)
+{
+ double
+ *le,
+ *ri,
+ *v,
+ *vl,
+ *vr,
+ ump_x1,
+ ump_x2,
+ x1px2;
+
+ int
+ i,
+ l,
+ j,
+ scale,
+ addScale = 0;
+
+ const int
+ statesSquare = states * states;
+
+
+ /* here we switch over the different cases for efficiency, but also because
+ each case accesses different data types.
+
+ We consider three cases: either q and r are both tips, q or r are tips, and q and r are inner
+ nodes.
+ */
+
+
+ switch(tipCase)
+ {
+
+ /* both child nodes of p weher we want to update the conditional likelihood are tips */
+ case TIP_TIP:
+ /* loop over sites */
+ for (i = 0; i < n; i++)
+ {
+ /* set a pointer to the P-Matrices for the rate category of this site */
+ le = &left[cptr[i] * statesSquare];
+ ri = &right[cptr[i] * statesSquare];
+
+ /* pointers to the likelihood entries of the tips q (vl) and r (vr)
+ We will do reading accesses to these values only.
+ */
+ vl = &(tipVector[states * tipX1[i]]);
+ vr = &(tipVector[states * tipX2[i]]);
+
+ /* address of the conditional likelihood array entres at site i. This is
+ a writing access to v */
+ v = &x3[states * i];
+
+ /* initialize v */
+ for(l = 0; l < states; l++)
+ v[l] = 0.0;
+
+ /* loop over states to compute the cond likelihoods at p (v) */
+
+ for(l = 0; l < states; l++)
+ {
+ ump_x1 = 0.0;
+ ump_x2 = 0.0;
+
+ /* le and ri are the P-matrices */
+
+ for(j = 0; j < states; j++)
+ {
+ ump_x1 += vl[j] * le[l * states + j];
+ ump_x2 += vr[j] * ri[l * states + j];
+ }
+
+ x1px2 = ump_x1 * ump_x2;
+
+ /* multiply with matrix of eigenvectors extEV */
+
+ for(j = 0; j < states; j++)
+ v[j] += x1px2 * extEV[l * states + j];
+ }
+ }
+ break;
+ case TIP_INNER:
+
+ /* same as above, only that now vl is a tip and vr is the conditional probability vector
+ at an inner node. Note that, if we have the case that either q or r is a tip, the
+ nodes will be flipped to ensure that tipX1 always points to the sequence at the tip.
+ */
+
+ for (i = 0; i < n; i++)
+ {
+ le = &left[cptr[i] * statesSquare];
+ ri = &right[cptr[i] * statesSquare];
+
+ /* access tip vector lookup table */
+ vl = &(tipVector[states * tipX1[i]]);
+
+ /* access conditional likelihoo arrays */
+ /* again, vl and vr are reading accesses, while v is a writing access */
+ vr = &x2[states * i];
+ v = &x3[states * i];
+
+ /* same as in the loop above */
+
+ for(l = 0; l < states; l++)
+ v[l] = 0.0;
+
+ for(l = 0; l < states; l++)
+ {
+ ump_x1 = 0.0;
+ ump_x2 = 0.0;
+
+ for(j = 0; j < states; j++)
+ {
+ ump_x1 += vl[j] * le[l * states + j];
+ ump_x2 += vr[j] * ri[l * states + j];
+ }
+
+ x1px2 = ump_x1 * ump_x2;
+
+ for(j = 0; j < states; j++)
+ v[j] += x1px2 * extEV[l * states + j];
+ }
+
+ /* now let's check for numerical scaling.
+ The maths in RAxML are a bit non-standard to avoid/economize on arithmetic operations
+ at the virtual root and for branch length optimization and hence values stored
+ in the conditional likelihood vectors can become negative.
+ Below we check if all absolute values stored at position i of v are smaller
+ than a pre-defined value in axml.h. If they are all smaller we can then safely
+ multiply them by a large, constant number twotothe256 (without numerical overflow)
+ that is also speced in axml.h */
+
+ scale = 1;
+ for(l = 0; scale && (l < states); l++)
+ scale = ((v[l] < minlikelihood) && (v[l] > minusminlikelihood));
+
+ if(scale)
+ {
+ for(l = 0; l < states; l++)
+ v[l] *= twotothe256;
+
+ /* if we have scaled the entries to prevent underflow, we need to keep track of how many scaling
+ multiplications we did per node such as to undo them at the virtual root, e.g., in
+ evaluateGeneric()
+ Note here, that, if we scaled the site we need to increment the scaling counter by the wieght, i.e.,
+ the number of sites this potentially compressed pattern represents ! */
+
+ addScale += wgt[i];
+ }
+ }
+ break;
+ case INNER_INNER:
+
+ /* same as above, only that the two child nodes q and r are now inner nodes */
+
+ for(i = 0; i < n; i++)
+ {
+ le = &left[cptr[i] * statesSquare];
+ ri = &right[cptr[i] * statesSquare];
+
+ /* index conditional likelihood vectors of inner nodes */
+
+ vl = &x1[states * i];
+ vr = &x2[states * i];
+ v = &x3[states * i];
+
+ for(l = 0; l < states; l++)
+ v[l] = 0.0;
+
+ for(l = 0; l < states; l++)
+ {
+ ump_x1 = 0.0;
+ ump_x2 = 0.0;
+
+ for(j = 0; j < states; j++)
+ {
+ ump_x1 += vl[j] * le[l * states + j];
+ ump_x2 += vr[j] * ri[l * states + j];
+ }
+
+ x1px2 = ump_x1 * ump_x2;
+
+ for(j = 0; j < states; j++)
+ v[j] += x1px2 * extEV[l * states + j];
+ }
+
+ scale = 1;
+ for(l = 0; scale && (l < states); l++)
+ scale = ((v[l] < minlikelihood) && (v[l] > minusminlikelihood));
+
+ if(scale)
+ {
+ for(l = 0; l < states; l++)
+ v[l] *= twotothe256;
+
+ addScale += wgt[i];
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ /* increment the scaling counter by the additional scalings done at node p */
+
+ *scalerIncrement = addScale;
+}
+
+
+static void newviewGAMMA_FLEX(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement, const int states, const int maxStateValue)
+{
+ double
+ *uX1,
+ *uX2,
+ *v,
+ x1px2,
+ *vl,
+ *vr,
+ al,
+ ar;
+
+ int
+ i,
+ j,
+ l,
+ k,
+ scale,
+ addScale = 0;
+
+ const int
+ statesSquare = states * states,
+ span = states * 4,
+ /* this is required for doing some pre-computations that help to save
+ numerical operations. What we are actually computing here are additional lookup tables
+ for each possible state a certain data-type can assume.
+ for DNA with ambuguity coding this is 15, for proteins this is 22 or 23, since there
+ also exist one or two amibguity codes for protein data.
+ Essentially this is very similar to the tip vectors which we also use as lookup tables */
+ precomputeLength = maxStateValue * span;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ /* allocate pre-compute memory space */
+
+ double
+ *umpX1 = (double*)malloc(sizeof(double) * precomputeLength),
+ *umpX2 = (double*)malloc(sizeof(double) * precomputeLength);
+
+ /* multiply all possible tip state vectors with the respective P-matrices
+ */
+
+ for(i = 0; i < maxStateValue; i++)
+ {
+ v = &(tipVector[states * i]);
+
+ for(k = 0; k < span; k++)
+ {
+
+ umpX1[span * i + k] = 0.0;
+ umpX2[span * i + k] = 0.0;
+
+ for(l = 0; l < states; l++)
+ {
+ umpX1[span * i + k] += v[l] * left[k * states + l];
+ umpX2[span * i + k] += v[l] * right[k * states + l];
+ }
+
+ }
+ }
+
+ for(i = 0; i < n; i++)
+ {
+ /* access the precomputed arrays (pre-computed multiplication of conditional with the tip state)
+ */
+
+ uX1 = &umpX1[span * tipX1[i]];
+ uX2 = &umpX2[span * tipX2[i]];
+
+ /* loop over discrete GAMMA rates */
+
+ for(j = 0; j < 4; j++)
+ {
+ /* the rest is the same as for CAT */
+ v = &x3[i * span + j * states];
+
+ for(k = 0; k < states; k++)
+ v[k] = 0.0;
+
+ for(k = 0; k < states; k++)
+ {
+ x1px2 = uX1[j * states + k] * uX2[j * states + k];
+
+ for(l = 0; l < states; l++)
+ v[l] += x1px2 * extEV[states * k + l];
+ }
+
+ }
+ }
+
+ /* free precomputed vectors */
+
+ free(umpX1);
+ free(umpX2);
+ }
+ break;
+ case TIP_INNER:
+ {
+ /* we do analogous pre-computations as above, with the only difference that we now do them
+ only for one tip vector */
+
+ double
+ *umpX1 = (double*)malloc(sizeof(double) * precomputeLength),
+ *ump_x2 = (double*)malloc(sizeof(double) * states);
+
+ /* precompute P and left tip vector product */
+
+ for(i = 0; i < maxStateValue; i++)
+ {
+ v = &(tipVector[states * i]);
+
+ for(k = 0; k < span; k++)
+ {
+
+ umpX1[span * i + k] = 0.0;
+
+ for(l = 0; l < states; l++)
+ umpX1[span * i + k] += v[l] * left[k * states + l];
+
+
+ }
+ }
+
+ for (i = 0; i < n; i++)
+ {
+ /* access pre-computed value based on the raw sequence data tipX1 that is used as an index */
+
+ uX1 = &umpX1[span * tipX1[i]];
+
+ /* loop over discrete GAMMA rates */
+
+ for(k = 0; k < 4; k++)
+ {
+ v = &(x2[span * i + k * states]);
+
+ for(l = 0; l < states; l++)
+ {
+ ump_x2[l] = 0.0;
+
+ for(j = 0; j < states; j++)
+ ump_x2[l] += v[j] * right[k * statesSquare + l * states + j];
+ }
+
+ v = &(x3[span * i + states * k]);
+
+ for(l = 0; l < states; l++)
+ v[l] = 0;
+
+ for(l = 0; l < states; l++)
+ {
+ x1px2 = uX1[k * states + l] * ump_x2[l];
+ for(j = 0; j < states; j++)
+ v[j] += x1px2 * extEV[l * states + j];
+ }
+ }
+
+ /* also do numerical scaling as above. Note that here we need to scale
+ 4 * 4 values for DNA or 4 * 20 values for protein data.
+ If they are ALL smaller than our threshold, we scale. Note that,
+ this can cause numerical problems with GAMMA, if the values generated
+ by the four discrete GAMMA rates are too different.
+
+ For details, see:
+
+ F. Izquierdo-Carrasco, S.A. Smith, A. Stamatakis: "Algorithms, Data Structures, and Numerics for Likelihood-based Phylogenetic Inference of Huge Trees"
+
+ */
+
+
+ v = &x3[span * i];
+ scale = 1;
+ for(l = 0; scale && (l < span); l++)
+ scale = (ABS(v[l]) < minlikelihood);
+
+
+ if (scale)
+ {
+ for(l = 0; l < span; l++)
+ v[l] *= twotothe256;
+
+ addScale += wgt[i];
+ }
+ }
+
+ free(umpX1);
+ free(ump_x2);
+ }
+ break;
+ case INNER_INNER:
+
+ /* same as above, without pre-computations */
+
+ for (i = 0; i < n; i++)
+ {
+ for(k = 0; k < 4; k++)
+ {
+ vl = &(x1[span * i + states * k]);
+ vr = &(x2[span * i + states * k]);
+ v = &(x3[span * i + states * k]);
+
+
+ for(l = 0; l < states; l++)
+ v[l] = 0;
+
+
+ for(l = 0; l < states; l++)
+ {
+
+ al = 0.0;
+ ar = 0.0;
+
+ for(j = 0; j < states; j++)
+ {
+ al += vl[j] * left[k * statesSquare + l * states + j];
+ ar += vr[j] * right[k * statesSquare + l * states + j];
+ }
+
+ x1px2 = al * ar;
+
+ for(j = 0; j < states; j++)
+ v[j] += x1px2 * extEV[states * l + j];
+
+ }
+ }
+
+ v = &(x3[span * i]);
+ scale = 1;
+ for(l = 0; scale && (l < span); l++)
+ scale = ((ABS(v[l]) < minlikelihood));
+
+ if(scale)
+ {
+ for(l = 0; l < span; l++)
+ v[l] *= twotothe256;
+
+ addScale += wgt[i];
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ /* as above, increment the global counter that counts scaling multiplications by the scaling multiplications
+ carried out for computing the likelihood array at node p */
+
+ *scalerIncrement = addScale;
+}
+
+#endif
+
+
+
+/* The function below computes partial traversals only down to the point/node in the tree where the
+ conditional likelihhod vector summarizing a subtree is already oriented in the correct direction */
+
+void computeTraversalInfo(nodeptr p, traversalInfo *ti, int *counter, int maxTips, int numBranches, boolean partialTraversal)
+{
+ /* if it's a tip we don't do anything */
+
+ if(isTip(p->number, maxTips))
+ return;
+
+ {
+ int
+ i;
+
+ /* get the left and right descendants */
+
+ nodeptr
+ q = p->next->back,
+ r = p->next->next->back;
+
+ /* if the left and right children are tips there is not that much to do */
+
+ if(isTip(r->number, maxTips) && isTip(q->number, maxTips))
+ {
+ /* fix the orientation of p->x */
+
+ if (! p->x)
+ getxnode(p);
+ assert(p->x);
+
+ /* add the current node triplet p,q,r to the traversal descriptor */
+
+ ti[*counter].tipCase = TIP_TIP;
+ ti[*counter].pNumber = p->number;
+ ti[*counter].qNumber = q->number;
+ ti[*counter].rNumber = r->number;
+
+ /* copy branches to traversal descriptor */
+
+ for(i = 0; i < numBranches; i++)
+ {
+ ti[*counter].qz[i] = q->z[i];
+ ti[*counter].rz[i] = r->z[i];
+ }
+
+ /* increment length counter */
+
+ *counter = *counter + 1;
+ }
+ else
+ {
+ /* if either r or q are tips, flip them to make sure that the tip data is stored
+ for q */
+
+ if(isTip(r->number, maxTips) || isTip(q->number, maxTips))
+ {
+ if(isTip(r->number, maxTips))
+ {
+ nodeptr
+ tmp = r;
+ r = q;
+ q = tmp;
+ }
+
+ /* if the orientation of the liklihood vector at r is not correct we need to re-compute it
+ and descend into its subtree to figure out if there are more vrctors in there to re-compute and
+ re-orient */
+
+ if(!r->x || !partialTraversal)
+ computeTraversalInfo(r, ti, counter, maxTips, numBranches, partialTraversal);
+ if(! p->x)
+ getxnode(p);
+
+ /* make sure that everything is consistent now */
+
+ assert(p->x && r->x);
+
+ /* store data for p, q, r in the traversal descriptor */
+
+ ti[*counter].tipCase = TIP_INNER;
+ ti[*counter].pNumber = p->number;
+ ti[*counter].qNumber = q->number;
+ ti[*counter].rNumber = r->number;
+
+ for(i = 0; i < numBranches; i++)
+ {
+ ti[*counter].qz[i] = q->z[i];
+ ti[*counter].rz[i] = r->z[i];
+ }
+
+ *counter = *counter + 1;
+ }
+ else
+ {
+ /* same as above, only now q and r are inner nodes. Hence if they are not
+ oriented correctly they will need to be recomputed and we need to descend into the
+ respective subtrees to check if everything is consistent in there, potentially expanding
+ the traversal descriptor */
+
+ if(! q->x || !partialTraversal)
+ computeTraversalInfo(q, ti, counter, maxTips, numBranches, partialTraversal);
+ if(! r->x || !partialTraversal)
+ computeTraversalInfo(r, ti, counter, maxTips, numBranches, partialTraversal);
+ if(! p->x)
+ getxnode(p);
+
+ /* check that the vector orientations are consistent now */
+
+ assert(p->x && r->x && q->x);
+
+ ti[*counter].tipCase = INNER_INNER;
+ ti[*counter].pNumber = p->number;
+ ti[*counter].qNumber = q->number;
+ ti[*counter].rNumber = r->number;
+
+ for(i = 0; i < numBranches; i++)
+ {
+ ti[*counter].qz[i] = q->z[i];
+ ti[*counter].rz[i] = r->z[i];
+ }
+
+ *counter = *counter + 1;
+ }
+ }
+ }
+}
+
+/* below are the optimized unrolled, and vectorized versions of the above generi cfunctions
+ for computing the conditional likelihood at p given child nodes q and r. The actual implementation is located at the end/bottom of this
+ file.
+*/
+
+#if (defined(_OPTIMIZED_FUNCTIONS) && !defined(__AVX))
+
+static void newviewGTRGAMMAPROT_LG4(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV[4], double *tipVector[4],
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling);
+
+static void newviewGTRGAMMA_GAPPED_SAVE(int tipCase,
+ double *x1_start, double *x2_start, double *x3_start,
+ double *EV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ const int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn);
+
+static void newviewGTRGAMMA(int tipCase,
+ double *x1_start, double *x2_start, double *x3_start,
+ double *EV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ const int n, double *left, double *right, int *wgt, int *scalerIncrement
+ );
+
+static void newviewGTRCAT( int tipCase, double *EV, int *cptr,
+ double *x1_start, double *x2_start, double *x3_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement);
+
+
+static void newviewGTRCAT_SAVE( int tipCase, double *EV, int *cptr,
+ double *x1_start, double *x2_start, double *x3_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats);
+
+static void newviewGTRGAMMAPROT_GAPPED_SAVE(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn
+ );
+
+static void newviewGTRGAMMAPROT(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement);
+static void newviewGTRCATPROT(int tipCase, double *extEV,
+ int *cptr,
+ double *x1, double *x2, double *x3, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement );
+
+static void newviewGTRCATPROT_SAVE(int tipCase, double *extEV,
+ int *cptr,
+ double *x1, double *x2, double *x3, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats);
+
+#endif
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+static void newviewGTRCAT_BINARY( int tipCase, double *EV, int *cptr,
+ double *x1_start, double *x2_start, double *x3_start, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling);
+
+static void newviewGTRGAMMA_BINARY(int tipCase,
+ double *x1_start, double *x2_start, double *x3_start,
+ double *EV, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ const int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling
+ );
+
+#endif
+
+boolean isGap(unsigned int *x, int pos)
+{
+ return (x[pos / 32] & mask32[pos % 32]);
+}
+
+boolean noGap(unsigned int *x, int pos)
+{
+ return (!(x[pos / 32] & mask32[pos % 32]));
+}
+
+/* now this is the function that just iterates over the length of the traversal descriptor and
+ just computes the conditional likelihhod arrays in the order given by the descriptor.
+ So in a sense, this function has no clue that there is any tree-like structure
+ in the traversal descriptor, it just operates on an array of structs of given length */
+
+
+extern const char inverseMeaningDNA[16];
+
+void newviewIterative (tree *tr, int startIndex)
+{
+ traversalInfo
+ *ti = tr->td[0].ti;
+
+ int
+ i;
+
+ /* loop over traversal descriptor length. Note that on average we only re-compute the conditionals on 3 -4
+ nodes in RAxML */
+
+ for(i = startIndex; i < tr->td[0].count; i++)
+ {
+ traversalInfo
+ *tInfo = &ti[i];
+
+ int
+ model;
+
+#ifdef _USE_OMP
+#pragma omp parallel for
+#endif
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ /* check if this partition has to be processed now - otherwise no need to compute P matrix */
+ if(!tr->td[0].executeModel[model] || tr->partitionData[model].width == 0)
+ continue;
+
+ int
+ categories,
+ states = tr->partitionData[model].states;
+
+ double
+ qz,
+ rz,
+ *rateCategories,
+ *left = tr->partitionData[model].left,
+ *right = tr->partitionData[model].right;
+
+ /* figure out what kind of rate heterogeneity approach we are using */
+ if(tr->rateHetModel == CAT)
+ {
+ rateCategories = tr->partitionData[model].perSiteRates;
+ categories = tr->partitionData[model].numberOfCategories;
+ }
+ else
+ {
+ rateCategories = tr->partitionData[model].gammaRates;
+ categories = 4;
+ }
+
+ /* if we use per-partition branch length optimization
+ get the branch length of partition model and take the log otherwise
+ use the joint branch length among all partitions that is always stored
+ at index [0] */
+ if(tr->numBranches > 1)
+ {
+ qz = tInfo->qz[model];
+ rz = tInfo->rz[model];
+ }
+ else
+ {
+ qz = tInfo->qz[0];
+ rz = tInfo->rz[0];
+ }
+
+ qz = (qz > zmin) ? log(qz) : log(zmin);
+ rz = (rz > zmin) ? log(rz) : log(zmin);
+
+ /* compute the left and right P matrices */
+#ifdef __MIC_NATIVE
+ switch (tr->partitionData[model].states)
+ {
+ case 2: /* BINARY data */
+ assert(0 && "Binary data model is not implemented on Intel MIC");
+ break;
+ case 4: /* DNA data */
+ {
+ makeP_DNA_MIC(qz, rz, rateCategories, tr->partitionData[model].EI,
+ tr->partitionData[model].EIGN, categories,
+ left, right, tr->saveMemory, tr->maxCategories);
+
+ precomputeTips_DNA_MIC(tInfo->tipCase, tr->partitionData[model].tipVector,
+ left, right,
+ tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight,
+ categories);
+ }
+ break;
+ case 20: /* AA data */
+ {
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+ {
+ makeP_PROT_LG4_MIC(qz, rz, tr->partitionData[model].gammaRates,
+ tr->partitionData[model].EI_LG4, tr->partitionData[model].EIGN_LG4,
+ 4, left, right);
+
+ precomputeTips_PROT_LG4_MIC(tInfo->tipCase, tr->partitionData[model].tipVector_LG4,
+ left, right,
+ tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight,
+ categories);
+ }
+ else
+ {
+ makeP_PROT_MIC(qz, rz, rateCategories, tr->partitionData[model].EI,
+ tr->partitionData[model].EIGN, categories,
+ left, right, tr->saveMemory, tr->maxCategories);
+
+ precomputeTips_PROT_MIC(tInfo->tipCase, tr->partitionData[model].tipVector,
+ left, right,
+ tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight,
+ categories);
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+#else
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+ makeP_FlexLG4(qz, rz, tr->partitionData[model].gammaRates,
+ tr->partitionData[model].EI_LG4,
+ tr->partitionData[model].EIGN_LG4,
+ 4, left, right, 20);
+ else
+ makeP(qz, rz, rateCategories, tr->partitionData[model].EI,
+ tr->partitionData[model].EIGN, categories,
+ left, right, tr->saveMemory, tr->maxCategories, states);
+#endif
+ } // for model
+
+
+ /* now loop over all partitions for nodes p, q, and r of the current traversal vector entry */
+#ifdef _USE_OMP
+#pragma omp parallel
+#endif
+ {
+ int
+ m,
+ model,
+ maxModel;
+
+#ifdef _USE_OMP
+ maxModel = tr->maxModelsPerThread;
+#else
+ maxModel = tr->NumberOfModels;
+#endif
+
+ for(m = 0; m < maxModel; m++)
+ {
+ size_t
+ width = 0,
+ offset = 0;
+
+ double
+ *left = (double*)NULL,
+ *right = (double*)NULL;
+
+ unsigned int
+ *globalScaler = (unsigned int*)NULL;
+
+#ifdef _USE_OMP
+ int
+ tid = omp_get_thread_num();
+
+ /* check if this thread should process this partition */
+ Assign*
+ pAss = tr->threadPartAssigns[tid * tr->maxModelsPerThread + m];
+
+ if(pAss)
+ {
+ assert(tid == pAss->procId);
+
+ model = pAss->partitionId;
+ width = pAss->width;
+ offset = pAss->offset;
+
+ left = tr->partitionData[model].left;
+ right = tr->partitionData[model].right;
+ globalScaler = tr->partitionData[model].threadGlobalScaler[tid];
+ }
+ else
+ break;
+#else
+ model = m;
+
+ /* number of sites in this partition */
+ width = (size_t)tr->partitionData[model].width;
+ offset = 0;
+
+ /* set the pointers to the left and right P matrices to the pre-allocated memory space for storing them */
+
+ left = tr->partitionData[model].left;
+ right = tr->partitionData[model].right;
+ globalScaler = tr->partitionData[model].globalScaler;
+#endif
+
+ /* this conditional statement is exactly identical to what we do in evaluateIterative */
+ if(tr->td[0].executeModel[model] && width > 0)
+ {
+ double
+ *x1_start = (double*)NULL,
+ *x2_start = (double*)NULL,
+ *x3_start = (double*)NULL, //tr->partitionData[model].xVector[tInfo->pNumber - tr->mxtips - 1],
+ *x1_gapColumn = (double*)NULL,
+ *x2_gapColumn = (double*)NULL,
+ *x3_gapColumn = (double*)NULL;
+
+ int
+ scalerIncrement = 0,
+
+ /* integer wieght vector with pattern compression weights */
+ *wgt = tr->partitionData[model].wgt + offset,
+
+ /* integer rate category vector (for each pattern, _number_ of PSR category assigned to it, NOT actual rate!) */
+ *rateCategory = tr->partitionData[model].rateCategory + offset;
+
+ unsigned int
+ *x1_gap = (unsigned int*)NULL,
+ *x2_gap = (unsigned int*)NULL,
+ *x3_gap = (unsigned int*)NULL;
+
+ unsigned char
+ *tipX1 = (unsigned char *)NULL,
+ *tipX2 = (unsigned char *)NULL;
+
+ size_t
+ gapOffset = 0,
+ rateHet = discreteRateCategories(tr->rateHetModel),
+
+ /* get the number of states in the data stored in partition model */
+
+ states = (size_t)tr->partitionData[model].states,
+
+ /* span for single alignment site (in doubles!) */
+ span = rateHet * states,
+ x_offset = offset * (size_t)span,
+
+
+ /* get the length of the current likelihood array stored at node p. This is
+ important mainly for the SEV-based memory saving option described in here:
+
+ F. Izquierdo-Carrasco, S.A. Smith, A. Stamatakis: "Algorithms, Data Structures, and Numerics for Likelihood-based Phylogenetic Inference of Huge Trees".
+
+ So tr->partitionData[model].xSpaceVector[i] provides the length of the allocated conditional array of partition model
+ and node i
+ */
+
+ availableLength = tr->partitionData[model].xSpaceVector[(tInfo->pNumber - tr->mxtips - 1)],
+ requiredLength = 0;
+
+ x3_start = tr->partitionData[model].xVector[tInfo->pNumber - tr->mxtips - 1] + x_offset;
+
+ /* memory saving stuff, not important right now, but if you are interested ask Fernando */
+ if(tr->saveMemory)
+ {
+ size_t
+ j,
+ setBits = 0;
+
+ gapOffset = states * (size_t)getUndetermined(tr->partitionData[model].dataType);
+
+ x1_gap = &(tr->partitionData[model].gapVector[tInfo->qNumber * tr->partitionData[model].gapVectorLength]);
+ x2_gap = &(tr->partitionData[model].gapVector[tInfo->rNumber * tr->partitionData[model].gapVectorLength]);
+ x3_gap = &(tr->partitionData[model].gapVector[tInfo->pNumber * tr->partitionData[model].gapVectorLength]);
+
+ for(j = 0; j < (size_t)tr->partitionData[model].gapVectorLength; j++)
+ {
+ x3_gap[j] = x1_gap[j] & x2_gap[j];
+ setBits += (size_t)(precomputed16_bitcount(x3_gap[j], tr->bits_in_16bits));
+ }
+
+ requiredLength = (width - setBits) * rateHet * states * sizeof(double);
+ }
+ else
+ /* if we are not trying to save memory the space required to store an inner likelihood array
+ is the number of sites in the partition times the number of states of the data type in the partition
+ times the number of discrete GAMMA rates (1 for CAT essentially) times 8 bytes */
+ requiredLength = width * rateHet * states * sizeof(double);
+
+ /* Initially, even when not using memory saving no space is allocated for inner likelihood arrats hence
+ availableLength will be zero at the very first time we traverse the tree.
+ Hence we need to allocate something here */
+#ifndef _USE_OMP
+ if(requiredLength != availableLength)
+ {
+ /* if there is a vector of incorrect length assigned here i.e., x3 != NULL we must free
+ it first */
+ if(x3_start)
+ free(x3_start);
+
+ /* allocate memory: note that here we use a byte-boundary aligned malloc, because we need the vectors
+ to be aligned at 16 BYTE (SSE3) or 32 BYTE (AVX) boundaries! */
+
+ x3_start = (double*)malloc_aligned(requiredLength);
+
+ /* update the data structures for consistent bookkeeping */
+ tr->partitionData[model].xVector[tInfo->pNumber - tr->mxtips - 1] = x3_start;
+ tr->partitionData[model].xSpaceVector[(tInfo->pNumber - tr->mxtips - 1)] = requiredLength;
+ }
+#endif
+
+ /* now just set the pointers for data accesses in the newview() implementations above to the corresponding values
+ according to the tip case */
+
+ switch(tInfo->tipCase)
+ {
+ case TIP_TIP:
+ tipX1 = tr->partitionData[model].yVector[tInfo->qNumber] + offset;
+ tipX2 = tr->partitionData[model].yVector[tInfo->rNumber] + offset;
+
+ if(tr->saveMemory)
+ {
+ x1_gapColumn = &(tr->partitionData[model].tipVector[gapOffset]);
+ x2_gapColumn = &(tr->partitionData[model].tipVector[gapOffset]);
+ x3_gapColumn = &tr->partitionData[model].gapColumn[(tInfo->pNumber - tr->mxtips - 1) * states * rateHet];
+ }
+
+ break;
+ case TIP_INNER:
+ tipX1 = tr->partitionData[model].yVector[tInfo->qNumber] + offset;
+ x2_start = tr->partitionData[model].xVector[tInfo->rNumber - tr->mxtips - 1] + x_offset;
+
+ if(tr->saveMemory)
+ {
+ x1_gapColumn = &(tr->partitionData[model].tipVector[gapOffset]);
+ x2_gapColumn = &tr->partitionData[model].gapColumn[(tInfo->rNumber - tr->mxtips - 1) * states * rateHet];
+ x3_gapColumn = &tr->partitionData[model].gapColumn[(tInfo->pNumber - tr->mxtips - 1) * states * rateHet];
+ }
+
+ break;
+ case INNER_INNER:
+ x1_start = tr->partitionData[model].xVector[tInfo->qNumber - tr->mxtips - 1] + x_offset;
+ x2_start = tr->partitionData[model].xVector[tInfo->rNumber - tr->mxtips - 1] + x_offset;
+
+ if(tr->saveMemory)
+ {
+ x1_gapColumn = &tr->partitionData[model].gapColumn[(tInfo->qNumber - tr->mxtips - 1) * states * rateHet];
+ x2_gapColumn = &tr->partitionData[model].gapColumn[(tInfo->rNumber - tr->mxtips - 1) * states * rateHet];
+ x3_gapColumn = &tr->partitionData[model].gapColumn[(tInfo->pNumber - tr->mxtips - 1) * states * rateHet];
+ }
+
+ break;
+ default:
+ assert(0);
+ }
+
+#ifndef _OPTIMIZED_FUNCTIONS
+
+ /* memory saving not implemented */
+
+ assert(!tr->saveMemory);
+
+ /* figure out if we need to compute the CAT or GAMMA model of rate heterogeneity */
+
+ if(tr->rateHetModel == CAT)
+ newviewCAT_FLEX(tInfo->tipCase, tr->partitionData[model].EV, rateCategory,
+ x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement, states);
+ else
+ newviewGAMMA_FLEX(tInfo->tipCase,
+ x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement, states, getUndetermined(tr->partitionData[model].dataType) + 1);
+
+#else
+ /* dedicated highly optimized functions. Analogously to the functions in evaluateGeneric()
+ we also siwtch over the state number */
+
+ switch(states)
+ {
+ case 2:
+#ifdef __MIC_NATIVE
+ assert(0 && "Binary data model is not implemented on Intel MIC");
+#else
+ assert(!tr->saveMemory);
+ if(tr->rateHetModel == CAT)
+ newviewGTRCAT_BINARY(tInfo->tipCase, tr->partitionData[model].EV, rateCategory,
+ x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+ (int*)NULL, tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement, TRUE);
+ else
+ newviewGTRGAMMA_BINARY(tInfo->tipCase,
+ x1_start, x2_start, x3_start,
+ tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+ (int *)NULL, tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement, TRUE);
+#endif
+ break;
+ case 4: /* DNA */
+ if(tr->rateHetModel == CAT)
+ {
+ if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#elif __AVX
+ newviewGTRCAT_AVX_GAPPED_SAVE(tInfo->tipCase, tr->partitionData[model].EV, rateCategory,
+ x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+ (int*)NULL, tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement, TRUE, x1_gap, x2_gap, x3_gap,
+ x1_gapColumn, x2_gapColumn, x3_gapColumn, tr->maxCategories);
+#else
+ newviewGTRCAT_SAVE(tInfo->tipCase, tr->partitionData[model].EV, rateCategory,
+ x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement, x1_gap, x2_gap, x3_gap,
+ x1_gapColumn, x2_gapColumn, x3_gapColumn, tr->maxCategories);
+#endif
+ else
+#ifdef __MIC_NATIVE
+ assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#elif __AVX
+ newviewGTRCAT_AVX(tInfo->tipCase, tr->partitionData[model].EV, rateCategory,
+ x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement);
+#else
+ newviewGTRCAT(tInfo->tipCase, tr->partitionData[model].EV, rateCategory,
+ x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement);
+#endif
+ }
+ else
+ {
+
+
+ if(tr->saveMemory)
+#ifdef __MIC_NATIVE
+ assert(0 && "Memory saving is not implemented on Intel MIC");
+#elif __AVX
+ newviewGTRGAMMA_AVX_GAPPED_SAVE(tInfo->tipCase,
+ x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector, (int*)NULL,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement, TRUE,
+ x1_gap, x2_gap, x3_gap,
+ x1_gapColumn, x2_gapColumn, x3_gapColumn);
+#else
+ newviewGTRGAMMA_GAPPED_SAVE(tInfo->tipCase,
+ x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement,
+ x1_gap, x2_gap, x3_gap,
+ x1_gapColumn, x2_gapColumn, x3_gapColumn);
+#endif
+ else
+#ifdef __MIC_NATIVE
+ newviewGTRGAMMA_MIC(tInfo->tipCase,
+ x1_start, x2_start, x3_start, tr->partitionData[model].mic_EV, tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement,
+ tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight);
+#elif __AVX
+ newviewGTRGAMMA_AVX(tInfo->tipCase,
+ x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement);
+#else
+ newviewGTRGAMMA(tInfo->tipCase,
+ x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement);
+#endif
+ }
+
+ break;
+ case 20: /* proteins */
+
+ if(tr->rateHetModel == CAT)
+ {
+ if(tr->saveMemory)
+ {
+#ifdef __MIC_NATIVE
+ assert(0 && "Neither CAT model of rate heterogeneity nor memory saving are implemented on Intel MIC");
+#elif __AVX
+ newviewGTRCATPROT_AVX_GAPPED_SAVE(tInfo->tipCase, tr->partitionData[model].EV, rateCategory,
+ x1_start, x2_start, x3_start, tr->partitionData[model].tipVector, (int*)NULL,
+ tipX1, tipX2, width, left, right, wgt, &scalerIncrement, TRUE, x1_gap, x2_gap, x3_gap,
+ x1_gapColumn, x2_gapColumn, x3_gapColumn, tr->maxCategories);
+#else
+ newviewGTRCATPROT_SAVE(tInfo->tipCase, tr->partitionData[model].EV, rateCategory,
+ x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2, width, left, right, wgt, &scalerIncrement, x1_gap, x2_gap, x3_gap,
+ x1_gapColumn, x2_gapColumn, x3_gapColumn, tr->maxCategories);
+#endif
+ }
+ else
+ {
+#ifdef __MIC_NATIVE
+ assert(0 && "CAT model of rate heterogeneity is not implemented on Intel MIC");
+#elif __AVX
+ newviewGTRCATPROT_AVX(tInfo->tipCase, tr->partitionData[model].EV, rateCategory,
+ x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2, width, left, right, wgt, &scalerIncrement);
+#else
+ newviewGTRCATPROT(tInfo->tipCase, tr->partitionData[model].EV, rateCategory,
+ x1_start, x2_start, x3_start, tr->partitionData[model].tipVector,
+ tipX1, tipX2, width, left, right, wgt, &scalerIncrement);
+#endif
+ }
+ }
+ else
+ {
+ if(tr->saveMemory)
+ {
+#ifdef __MIC_NATIVE
+ assert(0 && "Memory saving is not implemented on Intel MIC");
+#elif __AVX
+ newviewGTRGAMMAPROT_AVX_GAPPED_SAVE(tInfo->tipCase,
+ x1_start, x2_start, x3_start,
+ tr->partitionData[model].EV,
+ tr->partitionData[model].tipVector, (int*)NULL,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement, TRUE,
+ x1_gap, x2_gap, x3_gap,
+ x1_gapColumn, x2_gapColumn, x3_gapColumn);
+#else
+ newviewGTRGAMMAPROT_GAPPED_SAVE(tInfo->tipCase,
+ x1_start, x2_start, x3_start,
+ tr->partitionData[model].EV,
+ tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement,
+ x1_gap, x2_gap, x3_gap,
+ x1_gapColumn, x2_gapColumn, x3_gapColumn);
+#endif
+ }
+ else
+ {
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+ {
+#ifdef __MIC_NATIVE
+ newviewGTRGAMMAPROT_LG4_MIC(tInfo->tipCase,
+ x1_start, x2_start, x3_start, tr->partitionData[model].mic_EV, tr->partitionData[model].mic_tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement,
+ tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight);
+#elif __AVX
+ newviewGTRGAMMAPROT_AVX_LG4(tInfo->tipCase,
+ x1_start, x2_start, x3_start,
+ tr->partitionData[model].EV_LG4,
+ tr->partitionData[model].tipVector_LG4,
+ (int*)NULL, tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement, TRUE);
+#else
+ newviewGTRGAMMAPROT_LG4(tInfo->tipCase,
+ x1_start, x2_start, x3_start,
+ tr->partitionData[model].EV_LG4,
+ tr->partitionData[model].tipVector_LG4,
+ (int*)NULL, tipX1, tipX2,
+ width, left, right,
+ wgt, &scalerIncrement, TRUE);
+#endif
+ }
+ else
+ {
+#ifdef __MIC_NATIVE
+ newviewGTRGAMMAPROT_MIC(tInfo->tipCase,
+ x1_start, x2_start, x3_start, tr->partitionData[model].mic_EV, tr->partitionData[model].mic_tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement,
+ tr->partitionData[model].mic_umpLeft, tr->partitionData[model].mic_umpRight);
+#elif __AVX
+ newviewGTRGAMMAPROT_AVX(tInfo->tipCase,
+ x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement);
+#else
+ newviewGTRGAMMAPROT(tInfo->tipCase,
+ x1_start, x2_start, x3_start, tr->partitionData[model].EV, tr->partitionData[model].tipVector,
+ tipX1, tipX2,
+ width, left, right, wgt, &scalerIncrement);
+#endif
+ }
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+#endif
+
+ /* important step, here we essentiallt recursively compute the number of scaling multiplications
+ at node p: it's the sum of the number of scaling multiplications already conducted
+ for computing nodes q and r plus the scaling multiplications done at node p */
+
+ globalScaler[tInfo->pNumber] =
+ globalScaler[tInfo->qNumber] +
+ globalScaler[tInfo->rNumber] +
+ (unsigned int)scalerIncrement;
+
+ /* check that we are not getting an integer overflow ! */
+
+ assert(globalScaler[tInfo->pNumber] < INT_MAX);
+ }
+ } // for model
+ } // omp parallel block
+ } // for traversal
+}
+
+
+/* here is the generic function that could be called from the user program
+ it re-computes the vector at node p (regardless of whether it's orientation is
+ correct and then it also re-computes reciursively the likelihood arrays
+ in the subtrees of p as needed and if needed */
+
+void newviewGeneric (tree *tr, nodeptr p, boolean masked)
+{
+ /* if it's a tip there is nothing to do */
+
+ if(isTip(p->number, tr->mxtips))
+ return;
+
+ /* the first entry of the traversal descriptor is always reserved for evaluate or branch length optimization calls,
+ hence we start filling the array at the second entry with index one. This is not very nice and should be fixed
+ at some point */
+
+ tr->td[0].count = 0;
+
+ /* compute the traversal descriptor */
+ computeTraversalInfo(p, &(tr->td[0].ti[0]), &(tr->td[0].count), tr->mxtips, tr->numBranches, TRUE);
+
+ /* the traversal descriptor has been recomputed -> not sure if it really always changes, something to
+ optimize in the future */
+ tr->td[0].traversalHasChanged = TRUE;
+
+ /* We do a masked newview, i.e., do not execute newvies for each partition, when for example
+ doing a branch length optimization on the entire tree when branches are estimated on a per partition basis.
+
+ you may imagine that for partition 5 the branch length optimization has already converged whereas
+ for partition 6 we still need to go over the tree again.
+
+ This is explained in more detail in:
+
+ A. Stamatakis, M. Ott: "Load Balance in the Phylogenetic Likelihood Kernel". Proceedings of ICPP 2009
+
+ The external boolean array tr->partitionConverged[] contains exactly that information and is copied
+ to executeModel and subsequently to the executeMask of the traversal descriptor
+
+ */
+
+
+ if(masked)
+ {
+ int model;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ if(tr->partitionConverged[model])
+ tr->executeModel[model] = FALSE;
+ else
+ tr->executeModel[model] = TRUE;
+ }
+ }
+
+ /* if there is something to re-compute */
+
+ if(tr->td[0].count > 0)
+ {
+ /* store execute mask in traversal descriptor */
+
+ storeExecuteMaskInTraversalDescriptor(tr);
+ newviewIterative(tr, 0);
+ }
+
+ /* clean up */
+
+ if(masked)
+ {
+ int model;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ tr->executeModel[model] = TRUE;
+ }
+
+ tr->td[0].traversalHasChanged = FALSE;
+}
+
+
+/* optimized function implementations */
+
+#if (defined(_OPTIMIZED_FUNCTIONS) && !defined(__AVX))
+
+static void newviewGTRGAMMA_GAPPED_SAVE(int tipCase,
+ double *x1_start, double *x2_start, double *x3_start,
+ double *EV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ const int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn)
+{
+ int
+ i,
+ j,
+ k,
+ l,
+ addScale = 0,
+ scaleGap = 0;
+
+ double
+ *x1,
+ *x2,
+ *x3,
+ *x1_ptr = x1_start,
+ *x2_ptr = x2_start,
+ max,
+ maxima[2] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ EV_t[16] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ __m128d
+ values[8],
+ EVV[8];
+
+ for(k = 0; k < 4; k++)
+ for (l=0; l < 4; l++)
+ EV_t[4 * l + k] = EV[4 * k + l];
+
+ for(k = 0; k < 8; k++)
+ EVV[k] = _mm_load_pd(&EV_t[k * 2]);
+
+
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ double *uX1, umpX1[256] __attribute__ ((aligned (BYTE_ALIGNMENT))), *uX2, umpX2[256] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+
+ for (i = 1; i < 16; i++)
+ {
+ __m128d x1_1 = _mm_load_pd(&(tipVector[i*4]));
+ __m128d x1_2 = _mm_load_pd(&(tipVector[i*4 + 2]));
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m128d left1 = _mm_load_pd(&left[j*16 + k*4]);
+ __m128d left2 = _mm_load_pd(&left[j*16 + k*4 + 2]);
+
+ __m128d acc = _mm_setzero_pd();
+
+ acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+ acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+
+ acc = _mm_hadd_pd(acc, acc);
+ _mm_storel_pd(&umpX1[i*16 + j*4 + k], acc);
+ }
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m128d left1 = _mm_load_pd(&right[j*16 + k*4]);
+ __m128d left2 = _mm_load_pd(&right[j*16 + k*4 + 2]);
+
+ __m128d acc = _mm_setzero_pd();
+
+ acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+ acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+
+ acc = _mm_hadd_pd(acc, acc);
+ _mm_storel_pd(&umpX2[i*16 + j*4 + k], acc);
+
+ }
+ }
+
+ uX1 = &umpX1[240];
+ uX2 = &umpX2[240];
+
+ for (j = 0; j < 4; j++)
+ {
+ __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+ __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+
+ __m128d uX2_k0_sse = _mm_load_pd( &uX2[j * 4] );
+ __m128d uX2_k2_sse = _mm_load_pd( &uX2[j * 4 + 2] );
+
+ __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, uX2_k0_sse );
+ __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, uX2_k2_sse );
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ _mm_store_pd( &x3_gapColumn[j * 4 + 0], EV_t_l0_k0 );
+ _mm_store_pd( &x3_gapColumn[j * 4 + 2], EV_t_l2_k0 );
+ }
+
+
+ x3 = x3_start;
+
+ for (i = 0; i < n; i++)
+ {
+ if(!(x3_gap[i / 32] & mask32[i % 32]))
+ {
+ uX1 = &umpX1[16 * tipX1[i]];
+ uX2 = &umpX2[16 * tipX2[i]];
+
+ for (j = 0; j < 4; j++)
+ {
+ __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+ __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+
+
+ __m128d uX2_k0_sse = _mm_load_pd( &uX2[j * 4] );
+ __m128d uX2_k2_sse = _mm_load_pd( &uX2[j * 4 + 2] );
+
+
+ //
+ // multiply left * right
+ //
+
+ __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, uX2_k0_sse );
+ __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, uX2_k2_sse );
+
+
+ //
+ // multiply with EV matrix (!?)
+ //
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ _mm_store_pd( &x3[j * 4 + 0], EV_t_l0_k0 );
+ _mm_store_pd( &x3[j * 4 + 2], EV_t_l2_k0 );
+ }
+
+ x3 += 16;
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+ double
+ *uX1,
+ umpX1[256] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ for (i = 1; i < 16; i++)
+ {
+ __m128d x1_1 = _mm_load_pd(&(tipVector[i*4]));
+ __m128d x1_2 = _mm_load_pd(&(tipVector[i*4 + 2]));
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m128d left1 = _mm_load_pd(&left[j*16 + k*4]);
+ __m128d left2 = _mm_load_pd(&left[j*16 + k*4 + 2]);
+
+ __m128d acc = _mm_setzero_pd();
+
+ acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+ acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+
+ acc = _mm_hadd_pd(acc, acc);
+ _mm_storel_pd(&umpX1[i*16 + j*4 + k], acc);
+ }
+ }
+
+ {
+ __m128d maxv =_mm_setzero_pd();
+
+ scaleGap = 0;
+
+ x2 = x2_gapColumn;
+ x3 = x3_gapColumn;
+
+ uX1 = &umpX1[240];
+
+ for (j = 0; j < 4; j++)
+ {
+ double *x2_p = &x2[j*4];
+ double *right_k0_p = &right[j*16];
+ double *right_k1_p = &right[j*16 + 1*4];
+ double *right_k2_p = &right[j*16 + 2*4];
+ double *right_k3_p = &right[j*16 + 3*4];
+ __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+ __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+ __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+ __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+ __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+ __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+ __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+ __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+ __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+ __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+
+ __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, right_k2_0 );
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ values[j * 2] = EV_t_l0_k0;
+ values[j * 2 + 1] = EV_t_l2_k0;
+
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));
+ }
+
+
+ _mm_store_pd(maxima, maxv);
+
+ max = MAX(maxima[0], maxima[1]);
+
+ if(max < minlikelihood)
+ {
+ scaleGap = 1;
+
+ __m128d sv = _mm_set1_pd(twotothe256);
+
+ _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));
+ _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+ _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+ _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+ _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));
+ _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+ _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+ _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));
+ }
+ else
+ {
+ _mm_store_pd(&x3[0], values[0]);
+ _mm_store_pd(&x3[2], values[1]);
+ _mm_store_pd(&x3[4], values[2]);
+ _mm_store_pd(&x3[6], values[3]);
+ _mm_store_pd(&x3[8], values[4]);
+ _mm_store_pd(&x3[10], values[5]);
+ _mm_store_pd(&x3[12], values[6]);
+ _mm_store_pd(&x3[14], values[7]);
+ }
+ }
+
+ x3 = x3_start;
+
+ for (i = 0; i < n; i++)
+ {
+ if((x3_gap[i / 32] & mask32[i % 32]))
+ {
+ if(scaleGap)
+ {
+ addScale += wgt[i];
+ }
+ }
+ else
+ {
+ __m128d maxv =_mm_setzero_pd();
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 16;
+ }
+
+ uX1 = &umpX1[16 * tipX1[i]];
+
+
+ for (j = 0; j < 4; j++)
+ {
+ double *x2_p = &x2[j*4];
+ double *right_k0_p = &right[j*16];
+ double *right_k1_p = &right[j*16 + 1*4];
+ double *right_k2_p = &right[j*16 + 2*4];
+ double *right_k3_p = &right[j*16 + 3*4];
+ __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+ __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+ __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+ __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+ __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+ __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+ __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+ __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+ __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ {
+ //
+ // load left side from tip vector
+ //
+
+ __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+ __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+
+
+ //
+ // multiply left * right
+ //
+
+ __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, right_k2_0 );
+
+
+ //
+ // multiply with EV matrix (!?)
+ //
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ values[j * 2] = EV_t_l0_k0;
+ values[j * 2 + 1] = EV_t_l2_k0;
+
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));
+ }
+ }
+
+
+ _mm_store_pd(maxima, maxv);
+
+ max = MAX(maxima[0], maxima[1]);
+
+ if(max < minlikelihood)
+ {
+ __m128d sv = _mm_set1_pd(twotothe256);
+
+ _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));
+ _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+ _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+ _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+ _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));
+ _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+ _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+ _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));
+
+
+ addScale += wgt[i];
+
+ }
+ else
+ {
+ _mm_store_pd(&x3[0], values[0]);
+ _mm_store_pd(&x3[2], values[1]);
+ _mm_store_pd(&x3[4], values[2]);
+ _mm_store_pd(&x3[6], values[3]);
+ _mm_store_pd(&x3[8], values[4]);
+ _mm_store_pd(&x3[10], values[5]);
+ _mm_store_pd(&x3[12], values[6]);
+ _mm_store_pd(&x3[14], values[7]);
+ }
+
+ x3 += 16;
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ {
+ __m128d maxv =_mm_setzero_pd();
+
+ scaleGap = 0;
+
+ x1 = x1_gapColumn;
+ x2 = x2_gapColumn;
+ x3 = x3_gapColumn;
+
+ for (j = 0; j < 4; j++)
+ {
+
+ double *x1_p = &x1[j*4];
+ double *left_k0_p = &left[j*16];
+ double *left_k1_p = &left[j*16 + 1*4];
+ double *left_k2_p = &left[j*16 + 2*4];
+ double *left_k3_p = &left[j*16 + 3*4];
+
+ __m128d x1_0 = _mm_load_pd( &x1_p[0] );
+ __m128d x1_2 = _mm_load_pd( &x1_p[2] );
+
+ __m128d left_k0_0 = _mm_load_pd( &left_k0_p[0] );
+ __m128d left_k0_2 = _mm_load_pd( &left_k0_p[2] );
+ __m128d left_k1_0 = _mm_load_pd( &left_k1_p[0] );
+ __m128d left_k1_2 = _mm_load_pd( &left_k1_p[2] );
+ __m128d left_k2_0 = _mm_load_pd( &left_k2_p[0] );
+ __m128d left_k2_2 = _mm_load_pd( &left_k2_p[2] );
+ __m128d left_k3_0 = _mm_load_pd( &left_k3_p[0] );
+ __m128d left_k3_2 = _mm_load_pd( &left_k3_p[2] );
+
+ left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+ left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+
+ left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+ left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+ left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+
+ left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+ left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+
+ left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+ left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+ left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+
+
+ double *x2_p = &x2[j*4];
+ double *right_k0_p = &right[j*16];
+ double *right_k1_p = &right[j*16 + 1*4];
+ double *right_k2_p = &right[j*16 + 2*4];
+ double *right_k3_p = &right[j*16 + 3*4];
+ __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+ __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+ __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+ __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+ __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+ __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+ __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+ __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+ __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+
+ values[j * 2] = EV_t_l0_k0;
+ values[j * 2 + 1] = EV_t_l2_k0;
+
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));
+ }
+
+ _mm_store_pd(maxima, maxv);
+
+ max = MAX(maxima[0], maxima[1]);
+
+ if(max < minlikelihood)
+ {
+ __m128d sv = _mm_set1_pd(twotothe256);
+
+ scaleGap = 1;
+
+ _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));
+ _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+ _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+ _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+ _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));
+ _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+ _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+ _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));
+ }
+ else
+ {
+ _mm_store_pd(&x3[0], values[0]);
+ _mm_store_pd(&x3[2], values[1]);
+ _mm_store_pd(&x3[4], values[2]);
+ _mm_store_pd(&x3[6], values[3]);
+ _mm_store_pd(&x3[8], values[4]);
+ _mm_store_pd(&x3[10], values[5]);
+ _mm_store_pd(&x3[12], values[6]);
+ _mm_store_pd(&x3[14], values[7]);
+ }
+ }
+
+
+ x3 = x3_start;
+
+ for (i = 0; i < n; i++)
+ {
+ if(x3_gap[i / 32] & mask32[i % 32])
+ {
+ if(scaleGap)
+ {
+ addScale += wgt[i];
+ }
+ }
+ else
+ {
+ __m128d maxv =_mm_setzero_pd();
+
+ if(x1_gap[i / 32] & mask32[i % 32])
+ x1 = x1_gapColumn;
+ else
+ {
+ x1 = x1_ptr;
+ x1_ptr += 16;
+ }
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2 = x2_gapColumn;
+ else
+ {
+ x2 = x2_ptr;
+ x2_ptr += 16;
+ }
+
+
+ for (j = 0; j < 4; j++)
+ {
+
+ double *x1_p = &x1[j*4];
+ double *left_k0_p = &left[j*16];
+ double *left_k1_p = &left[j*16 + 1*4];
+ double *left_k2_p = &left[j*16 + 2*4];
+ double *left_k3_p = &left[j*16 + 3*4];
+
+ __m128d x1_0 = _mm_load_pd( &x1_p[0] );
+ __m128d x1_2 = _mm_load_pd( &x1_p[2] );
+
+ __m128d left_k0_0 = _mm_load_pd( &left_k0_p[0] );
+ __m128d left_k0_2 = _mm_load_pd( &left_k0_p[2] );
+ __m128d left_k1_0 = _mm_load_pd( &left_k1_p[0] );
+ __m128d left_k1_2 = _mm_load_pd( &left_k1_p[2] );
+ __m128d left_k2_0 = _mm_load_pd( &left_k2_p[0] );
+ __m128d left_k2_2 = _mm_load_pd( &left_k2_p[2] );
+ __m128d left_k3_0 = _mm_load_pd( &left_k3_p[0] );
+ __m128d left_k3_2 = _mm_load_pd( &left_k3_p[2] );
+
+ left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+ left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+
+ left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+ left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+ left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+
+ left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+ left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+
+ left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+ left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+ left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+
+
+ //
+ // multiply/add right side
+ //
+ double *x2_p = &x2[j*4];
+ double *right_k0_p = &right[j*16];
+ double *right_k1_p = &right[j*16 + 1*4];
+ double *right_k2_p = &right[j*16 + 2*4];
+ double *right_k3_p = &right[j*16 + 3*4];
+ __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+ __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+ __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+ __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+ __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+ __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+ __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+ __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+ __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ //
+ // multiply left * right
+ //
+
+ __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+
+
+ //
+ // multiply with EV matrix (!?)
+ //
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+
+ values[j * 2] = EV_t_l0_k0;
+ values[j * 2 + 1] = EV_t_l2_k0;
+
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));
+ }
+
+
+ _mm_store_pd(maxima, maxv);
+
+ max = MAX(maxima[0], maxima[1]);
+
+ if(max < minlikelihood)
+ {
+ __m128d sv = _mm_set1_pd(twotothe256);
+
+ _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));
+ _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+ _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+ _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+ _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));
+ _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+ _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+ _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));
+
+
+ addScale += wgt[i];
+
+ }
+ else
+ {
+ _mm_store_pd(&x3[0], values[0]);
+ _mm_store_pd(&x3[2], values[1]);
+ _mm_store_pd(&x3[4], values[2]);
+ _mm_store_pd(&x3[6], values[3]);
+ _mm_store_pd(&x3[8], values[4]);
+ _mm_store_pd(&x3[10], values[5]);
+ _mm_store_pd(&x3[12], values[6]);
+ _mm_store_pd(&x3[14], values[7]);
+ }
+
+
+
+ x3 += 16;
+
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+}
+
+
+static void newviewGTRGAMMA(int tipCase,
+ double *x1_start, double *x2_start, double *x3_start,
+ double *EV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ const int n, double *left, double *right, int *wgt, int *scalerIncrement
+ )
+{
+ int
+ i,
+ j,
+ k,
+ l,
+ addScale = 0;
+
+ double
+ *x1,
+ *x2,
+ *x3,
+ max,
+ maxima[2] __attribute__ ((aligned (BYTE_ALIGNMENT))),
+ EV_t[16] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ __m128d
+ values[8],
+ EVV[8];
+
+ for(k = 0; k < 4; k++)
+ for (l=0; l < 4; l++)
+ EV_t[4 * l + k] = EV[4 * k + l];
+
+ for(k = 0; k < 8; k++)
+ EVV[k] = _mm_load_pd(&EV_t[k * 2]);
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ double *uX1, umpX1[256] __attribute__ ((aligned (BYTE_ALIGNMENT))), *uX2, umpX2[256] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+
+ for (i = 1; i < 16; i++)
+ {
+ __m128d x1_1 = _mm_load_pd(&(tipVector[i*4]));
+ __m128d x1_2 = _mm_load_pd(&(tipVector[i*4 + 2]));
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m128d left1 = _mm_load_pd(&left[j*16 + k*4]);
+ __m128d left2 = _mm_load_pd(&left[j*16 + k*4 + 2]);
+
+ __m128d acc = _mm_setzero_pd();
+
+ acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+ acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+
+ acc = _mm_hadd_pd(acc, acc);
+ _mm_storel_pd(&umpX1[i*16 + j*4 + k], acc);
+ }
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m128d left1 = _mm_load_pd(&right[j*16 + k*4]);
+ __m128d left2 = _mm_load_pd(&right[j*16 + k*4 + 2]);
+
+ __m128d acc = _mm_setzero_pd();
+
+ acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+ acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+
+ acc = _mm_hadd_pd(acc, acc);
+ _mm_storel_pd(&umpX2[i*16 + j*4 + k], acc);
+
+ }
+ }
+
+ for (i = 0; i < n; i++)
+ {
+ x3 = &x3_start[i * 16];
+
+
+ uX1 = &umpX1[16 * tipX1[i]];
+ uX2 = &umpX2[16 * tipX2[i]];
+
+ for (j = 0; j < 4; j++)
+ {
+ __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+ __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+
+
+ __m128d uX2_k0_sse = _mm_load_pd( &uX2[j * 4] );
+ __m128d uX2_k2_sse = _mm_load_pd( &uX2[j * 4 + 2] );
+
+
+ //
+ // multiply left * right
+ //
+
+ __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, uX2_k0_sse );
+ __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, uX2_k2_sse );
+
+
+ //
+ // multiply with EV matrix (!?)
+ //
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ _mm_store_pd( &x3[j * 4 + 0], EV_t_l0_k0 );
+ _mm_store_pd( &x3[j * 4 + 2], EV_t_l2_k0 );
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+ double *uX1, umpX1[256] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+
+ for (i = 1; i < 16; i++)
+ {
+ __m128d x1_1 = _mm_load_pd(&(tipVector[i*4]));
+ __m128d x1_2 = _mm_load_pd(&(tipVector[i*4 + 2]));
+
+ for (j = 0; j < 4; j++)
+ for (k = 0; k < 4; k++)
+ {
+ __m128d left1 = _mm_load_pd(&left[j*16 + k*4]);
+ __m128d left2 = _mm_load_pd(&left[j*16 + k*4 + 2]);
+
+ __m128d acc = _mm_setzero_pd();
+
+ acc = _mm_add_pd(acc, _mm_mul_pd(left1, x1_1));
+ acc = _mm_add_pd(acc, _mm_mul_pd(left2, x1_2));
+
+ acc = _mm_hadd_pd(acc, acc);
+ _mm_storel_pd(&umpX1[i*16 + j*4 + k], acc);
+ }
+ }
+
+ for (i = 0; i < n; i++)
+ {
+ __m128d maxv =_mm_setzero_pd();
+
+ x2 = &x2_start[i * 16];
+ x3 = &x3_start[i * 16];
+
+ uX1 = &umpX1[16 * tipX1[i]];
+
+ for (j = 0; j < 4; j++)
+ {
+
+ //
+ // multiply/add right side
+ //
+ double *x2_p = &x2[j*4];
+ double *right_k0_p = &right[j*16];
+ double *right_k1_p = &right[j*16 + 1*4];
+ double *right_k2_p = &right[j*16 + 2*4];
+ double *right_k3_p = &right[j*16 + 3*4];
+ __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+ __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+ __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+ __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+ __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+ __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+ __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+ __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+ __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+
+
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ {
+ //
+ // load left side from tip vector
+ //
+
+ __m128d uX1_k0_sse = _mm_load_pd( &uX1[j * 4] );
+ __m128d uX1_k2_sse = _mm_load_pd( &uX1[j * 4 + 2] );
+
+
+ //
+ // multiply left * right
+ //
+
+ __m128d x1px2_k0 = _mm_mul_pd( uX1_k0_sse, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( uX1_k2_sse, right_k2_0 );
+
+
+ //
+ // multiply with EV matrix (!?)
+ //
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ values[j * 2] = EV_t_l0_k0;
+ values[j * 2 + 1] = EV_t_l2_k0;
+
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));
+ }
+ }
+
+
+ _mm_store_pd(maxima, maxv);
+
+ max = MAX(maxima[0], maxima[1]);
+
+ if(max < minlikelihood)
+ {
+ __m128d sv = _mm_set1_pd(twotothe256);
+
+ _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));
+ _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+ _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+ _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+ _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));
+ _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+ _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+ _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));
+
+
+ addScale += wgt[i];
+
+ }
+ else
+ {
+ _mm_store_pd(&x3[0], values[0]);
+ _mm_store_pd(&x3[2], values[1]);
+ _mm_store_pd(&x3[4], values[2]);
+ _mm_store_pd(&x3[6], values[3]);
+ _mm_store_pd(&x3[8], values[4]);
+ _mm_store_pd(&x3[10], values[5]);
+ _mm_store_pd(&x3[12], values[6]);
+ _mm_store_pd(&x3[14], values[7]);
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ __m128d maxv =_mm_setzero_pd();
+
+
+ x1 = &x1_start[i * 16];
+ x2 = &x2_start[i * 16];
+ x3 = &x3_start[i * 16];
+
+ for (j = 0; j < 4; j++)
+ {
+
+ double *x1_p = &x1[j*4];
+ double *left_k0_p = &left[j*16];
+ double *left_k1_p = &left[j*16 + 1*4];
+ double *left_k2_p = &left[j*16 + 2*4];
+ double *left_k3_p = &left[j*16 + 3*4];
+
+ __m128d x1_0 = _mm_load_pd( &x1_p[0] );
+ __m128d x1_2 = _mm_load_pd( &x1_p[2] );
+
+ __m128d left_k0_0 = _mm_load_pd( &left_k0_p[0] );
+ __m128d left_k0_2 = _mm_load_pd( &left_k0_p[2] );
+ __m128d left_k1_0 = _mm_load_pd( &left_k1_p[0] );
+ __m128d left_k1_2 = _mm_load_pd( &left_k1_p[2] );
+ __m128d left_k2_0 = _mm_load_pd( &left_k2_p[0] );
+ __m128d left_k2_2 = _mm_load_pd( &left_k2_p[2] );
+ __m128d left_k3_0 = _mm_load_pd( &left_k3_p[0] );
+ __m128d left_k3_2 = _mm_load_pd( &left_k3_p[2] );
+
+ left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+ left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+
+ left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+ left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+ left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+
+ left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+ left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+
+ left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+ left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+ left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+
+
+ //
+ // multiply/add right side
+ //
+ double *x2_p = &x2[j*4];
+ double *right_k0_p = &right[j*16];
+ double *right_k1_p = &right[j*16 + 1*4];
+ double *right_k2_p = &right[j*16 + 2*4];
+ double *right_k3_p = &right[j*16 + 3*4];
+ __m128d x2_0 = _mm_load_pd( &x2_p[0] );
+ __m128d x2_2 = _mm_load_pd( &x2_p[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &right_k0_p[0] );
+ __m128d right_k0_2 = _mm_load_pd( &right_k0_p[2] );
+ __m128d right_k1_0 = _mm_load_pd( &right_k1_p[0] );
+ __m128d right_k1_2 = _mm_load_pd( &right_k1_p[2] );
+ __m128d right_k2_0 = _mm_load_pd( &right_k2_p[0] );
+ __m128d right_k2_2 = _mm_load_pd( &right_k2_p[2] );
+ __m128d right_k3_0 = _mm_load_pd( &right_k3_p[0] );
+ __m128d right_k3_2 = _mm_load_pd( &right_k3_p[2] );
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ //
+ // multiply left * right
+ //
+
+ __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+
+
+ //
+ // multiply with EV matrix (!?)
+ //
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+
+ values[j * 2] = EV_t_l0_k0;
+ values[j * 2 + 1] = EV_t_l2_k0;
+
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l0_k0, absMask.m));
+ maxv = _mm_max_pd(maxv, _mm_and_pd(EV_t_l2_k0, absMask.m));
+ }
+
+
+ _mm_store_pd(maxima, maxv);
+
+ max = MAX(maxima[0], maxima[1]);
+
+ if(max < minlikelihood)
+ {
+ __m128d sv = _mm_set1_pd(twotothe256);
+
+ _mm_store_pd(&x3[0], _mm_mul_pd(values[0], sv));
+ _mm_store_pd(&x3[2], _mm_mul_pd(values[1], sv));
+ _mm_store_pd(&x3[4], _mm_mul_pd(values[2], sv));
+ _mm_store_pd(&x3[6], _mm_mul_pd(values[3], sv));
+ _mm_store_pd(&x3[8], _mm_mul_pd(values[4], sv));
+ _mm_store_pd(&x3[10], _mm_mul_pd(values[5], sv));
+ _mm_store_pd(&x3[12], _mm_mul_pd(values[6], sv));
+ _mm_store_pd(&x3[14], _mm_mul_pd(values[7], sv));
+
+
+ addScale += wgt[i];
+
+ }
+ else
+ {
+ _mm_store_pd(&x3[0], values[0]);
+ _mm_store_pd(&x3[2], values[1]);
+ _mm_store_pd(&x3[4], values[2]);
+ _mm_store_pd(&x3[6], values[3]);
+ _mm_store_pd(&x3[8], values[4]);
+ _mm_store_pd(&x3[10], values[5]);
+ _mm_store_pd(&x3[12], values[6]);
+ _mm_store_pd(&x3[14], values[7]);
+ }
+ }
+
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+
+}
+static void newviewGTRCAT( int tipCase, double *EV, int *cptr,
+ double *x1_start, double *x2_start, double *x3_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement)
+{
+ double
+ *le,
+ *ri,
+ *x1,
+ *x2,
+ *x3,
+ EV_t[16] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ int
+ i,
+ j,
+ scale,
+ addScale = 0;
+
+ __m128d
+ minlikelihood_sse = _mm_set1_pd( minlikelihood ),
+ sc = _mm_set1_pd(twotothe256),
+ EVV[8];
+
+ for(i = 0; i < 4; i++)
+ for (j=0; j < 4; j++)
+ EV_t[4 * j + i] = EV[4 * i + j];
+
+ for(i = 0; i < 8; i++)
+ EVV[i] = _mm_load_pd(&EV_t[i * 2]);
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &(tipVector[4 * tipX2[i]]);
+
+ x3 = &x3_start[i * 4];
+
+ le = &left[cptr[i] * 16];
+ ri = &right[cptr[i] * 16];
+
+ __m128d x1_0 = _mm_load_pd( &x1[0] );
+ __m128d x1_2 = _mm_load_pd( &x1[2] );
+
+ __m128d left_k0_0 = _mm_load_pd( &le[0] );
+ __m128d left_k0_2 = _mm_load_pd( &le[2] );
+ __m128d left_k1_0 = _mm_load_pd( &le[4] );
+ __m128d left_k1_2 = _mm_load_pd( &le[6] );
+ __m128d left_k2_0 = _mm_load_pd( &le[8] );
+ __m128d left_k2_2 = _mm_load_pd( &le[10] );
+ __m128d left_k3_0 = _mm_load_pd( &le[12] );
+ __m128d left_k3_2 = _mm_load_pd( &le[14] );
+
+ left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+ left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+
+ left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+ left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+ left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+
+ left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+ left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+
+ left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+ left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+ left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+
+ __m128d x2_0 = _mm_load_pd( &x2[0] );
+ __m128d x2_2 = _mm_load_pd( &x2[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+ __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+ __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+ __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+ __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+ __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+ __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+ __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ _mm_store_pd(x3, EV_t_l0_k0);
+ _mm_store_pd(&x3[2], EV_t_l2_k0);
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &x2_start[4 * i];
+ x3 = &x3_start[4 * i];
+
+ le = &left[cptr[i] * 16];
+ ri = &right[cptr[i] * 16];
+
+ __m128d x1_0 = _mm_load_pd( &x1[0] );
+ __m128d x1_2 = _mm_load_pd( &x1[2] );
+
+ __m128d left_k0_0 = _mm_load_pd( &le[0] );
+ __m128d left_k0_2 = _mm_load_pd( &le[2] );
+ __m128d left_k1_0 = _mm_load_pd( &le[4] );
+ __m128d left_k1_2 = _mm_load_pd( &le[6] );
+ __m128d left_k2_0 = _mm_load_pd( &le[8] );
+ __m128d left_k2_2 = _mm_load_pd( &le[10] );
+ __m128d left_k3_0 = _mm_load_pd( &le[12] );
+ __m128d left_k3_2 = _mm_load_pd( &le[14] );
+
+ left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+ left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+
+ left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+ left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+ left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+
+ left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+ left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+
+ left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+ left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+ left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+
+ __m128d x2_0 = _mm_load_pd( &x2[0] );
+ __m128d x2_2 = _mm_load_pd( &x2[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+ __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+ __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+ __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+ __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+ __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+ __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+ __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ scale = 1;
+
+ __m128d v1 = _mm_and_pd(EV_t_l0_k0, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ else
+ {
+ v1 = _mm_and_pd(EV_t_l2_k0, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ _mm_store_pd(&x3[0], _mm_mul_pd(EV_t_l0_k0, sc));
+ _mm_store_pd(&x3[2], _mm_mul_pd(EV_t_l2_k0, sc));
+
+
+ addScale += wgt[i];
+ }
+ else
+ {
+ _mm_store_pd(x3, EV_t_l0_k0);
+ _mm_store_pd(&x3[2], EV_t_l2_k0);
+ }
+
+
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &x1_start[4 * i];
+ x2 = &x2_start[4 * i];
+ x3 = &x3_start[4 * i];
+
+ le = &left[cptr[i] * 16];
+ ri = &right[cptr[i] * 16];
+
+ __m128d x1_0 = _mm_load_pd( &x1[0] );
+ __m128d x1_2 = _mm_load_pd( &x1[2] );
+
+ __m128d left_k0_0 = _mm_load_pd( &le[0] );
+ __m128d left_k0_2 = _mm_load_pd( &le[2] );
+ __m128d left_k1_0 = _mm_load_pd( &le[4] );
+ __m128d left_k1_2 = _mm_load_pd( &le[6] );
+ __m128d left_k2_0 = _mm_load_pd( &le[8] );
+ __m128d left_k2_2 = _mm_load_pd( &le[10] );
+ __m128d left_k3_0 = _mm_load_pd( &le[12] );
+ __m128d left_k3_2 = _mm_load_pd( &le[14] );
+
+ left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+ left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+
+ left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+ left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+ left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+
+ left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+ left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+
+ left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+ left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+ left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+
+ __m128d x2_0 = _mm_load_pd( &x2[0] );
+ __m128d x2_2 = _mm_load_pd( &x2[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+ __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+ __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+ __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+ __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+ __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+ __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+ __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ scale = 1;
+
+ __m128d v1 = _mm_and_pd(EV_t_l0_k0, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ else
+ {
+ v1 = _mm_and_pd(EV_t_l2_k0, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ _mm_store_pd(&x3[0], _mm_mul_pd(EV_t_l0_k0, sc));
+ _mm_store_pd(&x3[2], _mm_mul_pd(EV_t_l2_k0, sc));
+
+
+ addScale += wgt[i];
+ }
+ else
+ {
+ _mm_store_pd(x3, EV_t_l0_k0);
+ _mm_store_pd(&x3[2], EV_t_l2_k0);
+ }
+
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+}
+
+
+
+static void newviewGTRCAT_SAVE( int tipCase, double *EV, int *cptr,
+ double *x1_start, double *x2_start, double *x3_start, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats)
+{
+ double
+ *le,
+ *ri,
+ *x1,
+ *x2,
+ *x3,
+ *x1_ptr = x1_start,
+ *x2_ptr = x2_start,
+ *x3_ptr = x3_start,
+ EV_t[16] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+
+ int
+ i,
+ j,
+ scale,
+ scaleGap = 0,
+ addScale = 0;
+
+ __m128d
+ minlikelihood_sse = _mm_set1_pd( minlikelihood ),
+ sc = _mm_set1_pd(twotothe256),
+ EVV[8];
+
+ for(i = 0; i < 4; i++)
+ for (j=0; j < 4; j++)
+ EV_t[4 * j + i] = EV[4 * i + j];
+
+ for(i = 0; i < 8; i++)
+ EVV[i] = _mm_load_pd(&EV_t[i * 2]);
+
+ {
+ x1 = x1_gapColumn;
+ x2 = x2_gapColumn;
+ x3 = x3_gapColumn;
+
+ le = &left[maxCats * 16];
+ ri = &right[maxCats * 16];
+
+ __m128d x1_0 = _mm_load_pd( &x1[0] );
+ __m128d x1_2 = _mm_load_pd( &x1[2] );
+
+ __m128d left_k0_0 = _mm_load_pd( &le[0] );
+ __m128d left_k0_2 = _mm_load_pd( &le[2] );
+ __m128d left_k1_0 = _mm_load_pd( &le[4] );
+ __m128d left_k1_2 = _mm_load_pd( &le[6] );
+ __m128d left_k2_0 = _mm_load_pd( &le[8] );
+ __m128d left_k2_2 = _mm_load_pd( &le[10] );
+ __m128d left_k3_0 = _mm_load_pd( &le[12] );
+ __m128d left_k3_2 = _mm_load_pd( &le[14] );
+
+ left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+ left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+
+ left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+ left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+ left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+
+ left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+ left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+
+ left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+ left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+ left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+
+ __m128d x2_0 = _mm_load_pd( &x2[0] );
+ __m128d x2_2 = _mm_load_pd( &x2[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+ __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+ __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+ __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+ __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+ __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+ __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+ __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ if(tipCase != TIP_TIP)
+ {
+ scale = 1;
+
+ __m128d v1 = _mm_and_pd(EV_t_l0_k0, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ else
+ {
+ v1 = _mm_and_pd(EV_t_l2_k0, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ _mm_store_pd(&x3[0], _mm_mul_pd(EV_t_l0_k0, sc));
+ _mm_store_pd(&x3[2], _mm_mul_pd(EV_t_l2_k0, sc));
+
+ scaleGap = TRUE;
+ }
+ else
+ {
+ _mm_store_pd(x3, EV_t_l0_k0);
+ _mm_store_pd(&x3[2], EV_t_l2_k0);
+ }
+ }
+ else
+ {
+ _mm_store_pd(x3, EV_t_l0_k0);
+ _mm_store_pd(&x3[2], EV_t_l2_k0);
+ }
+ }
+
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ if(noGap(x3_gap, i))
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+ x2 = &(tipVector[4 * tipX2[i]]);
+
+ x3 = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ le = &left[maxCats * 16];
+ else
+ le = &left[cptr[i] * 16];
+
+ if(isGap(x2_gap, i))
+ ri = &right[maxCats * 16];
+ else
+ ri = &right[cptr[i] * 16];
+
+ __m128d x1_0 = _mm_load_pd( &x1[0] );
+ __m128d x1_2 = _mm_load_pd( &x1[2] );
+
+ __m128d left_k0_0 = _mm_load_pd( &le[0] );
+ __m128d left_k0_2 = _mm_load_pd( &le[2] );
+ __m128d left_k1_0 = _mm_load_pd( &le[4] );
+ __m128d left_k1_2 = _mm_load_pd( &le[6] );
+ __m128d left_k2_0 = _mm_load_pd( &le[8] );
+ __m128d left_k2_2 = _mm_load_pd( &le[10] );
+ __m128d left_k3_0 = _mm_load_pd( &le[12] );
+ __m128d left_k3_2 = _mm_load_pd( &le[14] );
+
+ left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+ left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+
+ left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+ left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+ left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+
+ left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+ left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+
+ left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+ left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+ left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+
+ __m128d x2_0 = _mm_load_pd( &x2[0] );
+ __m128d x2_2 = _mm_load_pd( &x2[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+ __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+ __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+ __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+ __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+ __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+ __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+ __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ _mm_store_pd(x3, EV_t_l0_k0);
+ _mm_store_pd(&x3[2], EV_t_l2_k0);
+
+ x3_ptr += 4;
+ }
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ if(isGap(x3_gap, i))
+ {
+ if(scaleGap)
+ addScale += wgt[i];
+ }
+ else
+ {
+ x1 = &(tipVector[4 * tipX1[i]]);
+
+ x2 = x2_ptr;
+ x3 = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ le = &left[maxCats * 16];
+ else
+ le = &left[cptr[i] * 16];
+
+ if(isGap(x2_gap, i))
+ {
+ ri = &right[maxCats * 16];
+ x2 = x2_gapColumn;
+ }
+ else
+ {
+ ri = &right[cptr[i] * 16];
+ x2 = x2_ptr;
+ x2_ptr += 4;
+ }
+
+ __m128d x1_0 = _mm_load_pd( &x1[0] );
+ __m128d x1_2 = _mm_load_pd( &x1[2] );
+
+ __m128d left_k0_0 = _mm_load_pd( &le[0] );
+ __m128d left_k0_2 = _mm_load_pd( &le[2] );
+ __m128d left_k1_0 = _mm_load_pd( &le[4] );
+ __m128d left_k1_2 = _mm_load_pd( &le[6] );
+ __m128d left_k2_0 = _mm_load_pd( &le[8] );
+ __m128d left_k2_2 = _mm_load_pd( &le[10] );
+ __m128d left_k3_0 = _mm_load_pd( &le[12] );
+ __m128d left_k3_2 = _mm_load_pd( &le[14] );
+
+ left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+ left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+
+ left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+ left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+ left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+
+ left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+ left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+
+ left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+ left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+ left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+
+ __m128d x2_0 = _mm_load_pd( &x2[0] );
+ __m128d x2_2 = _mm_load_pd( &x2[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+ __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+ __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+ __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+ __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+ __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+ __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+ __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ scale = 1;
+
+ __m128d v1 = _mm_and_pd(EV_t_l0_k0, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ else
+ {
+ v1 = _mm_and_pd(EV_t_l2_k0, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ _mm_store_pd(&x3[0], _mm_mul_pd(EV_t_l0_k0, sc));
+ _mm_store_pd(&x3[2], _mm_mul_pd(EV_t_l2_k0, sc));
+
+ addScale += wgt[i];
+ }
+ else
+ {
+ _mm_store_pd(x3, EV_t_l0_k0);
+ _mm_store_pd(&x3[2], EV_t_l2_k0);
+ }
+
+ x3_ptr += 4;
+ }
+
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ if(isGap(x3_gap, i))
+ {
+ if(scaleGap)
+ addScale += wgt[i];
+ }
+ else
+ {
+ x3 = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ {
+ x1 = x1_gapColumn;
+ le = &left[maxCats * 16];
+ }
+ else
+ {
+ le = &left[cptr[i] * 16];
+ x1 = x1_ptr;
+ x1_ptr += 4;
+ }
+
+ if(isGap(x2_gap, i))
+ {
+ x2 = x2_gapColumn;
+ ri = &right[maxCats * 16];
+ }
+ else
+ {
+ ri = &right[cptr[i] * 16];
+ x2 = x2_ptr;
+ x2_ptr += 4;
+ }
+
+ __m128d x1_0 = _mm_load_pd( &x1[0] );
+ __m128d x1_2 = _mm_load_pd( &x1[2] );
+
+ __m128d left_k0_0 = _mm_load_pd( &le[0] );
+ __m128d left_k0_2 = _mm_load_pd( &le[2] );
+ __m128d left_k1_0 = _mm_load_pd( &le[4] );
+ __m128d left_k1_2 = _mm_load_pd( &le[6] );
+ __m128d left_k2_0 = _mm_load_pd( &le[8] );
+ __m128d left_k2_2 = _mm_load_pd( &le[10] );
+ __m128d left_k3_0 = _mm_load_pd( &le[12] );
+ __m128d left_k3_2 = _mm_load_pd( &le[14] );
+
+ left_k0_0 = _mm_mul_pd(x1_0, left_k0_0);
+ left_k0_2 = _mm_mul_pd(x1_2, left_k0_2);
+
+ left_k1_0 = _mm_mul_pd(x1_0, left_k1_0);
+ left_k1_2 = _mm_mul_pd(x1_2, left_k1_2);
+
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k0_2 );
+ left_k1_0 = _mm_hadd_pd( left_k1_0, left_k1_2);
+ left_k0_0 = _mm_hadd_pd( left_k0_0, left_k1_0);
+
+ left_k2_0 = _mm_mul_pd(x1_0, left_k2_0);
+ left_k2_2 = _mm_mul_pd(x1_2, left_k2_2);
+
+ left_k3_0 = _mm_mul_pd(x1_0, left_k3_0);
+ left_k3_2 = _mm_mul_pd(x1_2, left_k3_2);
+
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k2_2);
+ left_k3_0 = _mm_hadd_pd( left_k3_0, left_k3_2);
+ left_k2_0 = _mm_hadd_pd( left_k2_0, left_k3_0);
+
+ __m128d x2_0 = _mm_load_pd( &x2[0] );
+ __m128d x2_2 = _mm_load_pd( &x2[2] );
+
+ __m128d right_k0_0 = _mm_load_pd( &ri[0] );
+ __m128d right_k0_2 = _mm_load_pd( &ri[2] );
+ __m128d right_k1_0 = _mm_load_pd( &ri[4] );
+ __m128d right_k1_2 = _mm_load_pd( &ri[6] );
+ __m128d right_k2_0 = _mm_load_pd( &ri[8] );
+ __m128d right_k2_2 = _mm_load_pd( &ri[10] );
+ __m128d right_k3_0 = _mm_load_pd( &ri[12] );
+ __m128d right_k3_2 = _mm_load_pd( &ri[14] );
+
+ right_k0_0 = _mm_mul_pd( x2_0, right_k0_0);
+ right_k0_2 = _mm_mul_pd( x2_2, right_k0_2);
+
+ right_k1_0 = _mm_mul_pd( x2_0, right_k1_0);
+ right_k1_2 = _mm_mul_pd( x2_2, right_k1_2);
+
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k0_2);
+ right_k1_0 = _mm_hadd_pd( right_k1_0, right_k1_2);
+ right_k0_0 = _mm_hadd_pd( right_k0_0, right_k1_0);
+
+ right_k2_0 = _mm_mul_pd( x2_0, right_k2_0);
+ right_k2_2 = _mm_mul_pd( x2_2, right_k2_2);
+
+ right_k3_0 = _mm_mul_pd( x2_0, right_k3_0);
+ right_k3_2 = _mm_mul_pd( x2_2, right_k3_2);
+
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k2_2);
+ right_k3_0 = _mm_hadd_pd( right_k3_0, right_k3_2);
+ right_k2_0 = _mm_hadd_pd( right_k2_0, right_k3_0);
+
+ __m128d x1px2_k0 = _mm_mul_pd( left_k0_0, right_k0_0 );
+ __m128d x1px2_k2 = _mm_mul_pd( left_k2_0, right_k2_0 );
+
+ __m128d EV_t_l0_k0 = EVV[0];
+ __m128d EV_t_l0_k2 = EVV[1];
+ __m128d EV_t_l1_k0 = EVV[2];
+ __m128d EV_t_l1_k2 = EVV[3];
+ __m128d EV_t_l2_k0 = EVV[4];
+ __m128d EV_t_l2_k2 = EVV[5];
+ __m128d EV_t_l3_k0 = EVV[6];
+ __m128d EV_t_l3_k2 = EVV[7];
+
+
+ EV_t_l0_k0 = _mm_mul_pd( x1px2_k0, EV_t_l0_k0 );
+ EV_t_l0_k2 = _mm_mul_pd( x1px2_k2, EV_t_l0_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l0_k2 );
+
+ EV_t_l1_k0 = _mm_mul_pd( x1px2_k0, EV_t_l1_k0 );
+ EV_t_l1_k2 = _mm_mul_pd( x1px2_k2, EV_t_l1_k2 );
+
+ EV_t_l1_k0 = _mm_hadd_pd( EV_t_l1_k0, EV_t_l1_k2 );
+ EV_t_l0_k0 = _mm_hadd_pd( EV_t_l0_k0, EV_t_l1_k0 );
+
+ EV_t_l2_k0 = _mm_mul_pd( x1px2_k0, EV_t_l2_k0 );
+ EV_t_l2_k2 = _mm_mul_pd( x1px2_k2, EV_t_l2_k2 );
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l2_k2 );
+
+ EV_t_l3_k0 = _mm_mul_pd( x1px2_k0, EV_t_l3_k0 );
+ EV_t_l3_k2 = _mm_mul_pd( x1px2_k2, EV_t_l3_k2 );
+ EV_t_l3_k0 = _mm_hadd_pd( EV_t_l3_k0, EV_t_l3_k2 );
+
+ EV_t_l2_k0 = _mm_hadd_pd( EV_t_l2_k0, EV_t_l3_k0 );
+
+ scale = 1;
+
+ __m128d v1 = _mm_and_pd(EV_t_l0_k0, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ else
+ {
+ v1 = _mm_and_pd(EV_t_l2_k0, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ _mm_store_pd(&x3[0], _mm_mul_pd(EV_t_l0_k0, sc));
+ _mm_store_pd(&x3[2], _mm_mul_pd(EV_t_l2_k0, sc));
+
+ addScale += wgt[i];
+ }
+ else
+ {
+ _mm_store_pd(x3, EV_t_l0_k0);
+ _mm_store_pd(&x3[2], EV_t_l2_k0);
+ }
+
+ x3_ptr += 4;
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+}
+
+static void newviewGTRGAMMAPROT_GAPPED_SAVE(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn
+ )
+{
+ double *uX1, *uX2, *v;
+ double x1px2;
+ int i, j, l, k, scale, addScale = 0,
+ gapScaling = 0;
+ double
+ *vl, *vr, *x1v, *x2v,
+ *x1_ptr = x1,
+ *x2_ptr = x2,
+ *x3_ptr = x3;
+
+
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ double umpX1[1840], umpX2[1840];
+
+ for(i = 0; i < 23; i++)
+ {
+ v = &(tipVector[20 * i]);
+
+ for(k = 0; k < 80; k++)
+ {
+ double *ll = &left[k * 20];
+ double *rr = &right[k * 20];
+
+ __m128d umpX1v = _mm_setzero_pd();
+ __m128d umpX2v = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));
+ umpX2v = _mm_add_pd(umpX2v, _mm_mul_pd(vv, _mm_load_pd(&rr[l])));
+ }
+
+ umpX1v = _mm_hadd_pd(umpX1v, umpX1v);
+ umpX2v = _mm_hadd_pd(umpX2v, umpX2v);
+
+ _mm_storel_pd(&umpX1[80 * i + k], umpX1v);
+ _mm_storel_pd(&umpX2[80 * i + k], umpX2v);
+ }
+ }
+
+ {
+ uX1 = &umpX1[1760];
+ uX2 = &umpX2[1760];
+
+ for(j = 0; j < 4; j++)
+ {
+ v = &x3_gapColumn[j * 20];
+
+ __m128d zero = _mm_setzero_pd();
+ for(k = 0; k < 20; k+=2)
+ _mm_store_pd(&v[k], zero);
+
+ for(k = 0; k < 20; k++)
+ {
+ double *eev = &extEV[k * 20];
+ x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+ __m128d x1px2v = _mm_set1_pd(x1px2);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d ee = _mm_load_pd(&eev[l]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+
+ _mm_store_pd(&v[l], vv);
+ }
+ }
+ }
+ }
+
+ for(i = 0; i < n; i++)
+ {
+ if(!(x3_gap[i / 32] & mask32[i % 32]))
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+ uX2 = &umpX2[80 * tipX2[i]];
+
+ for(j = 0; j < 4; j++)
+ {
+ v = &x3_ptr[j * 20];
+
+
+ __m128d zero = _mm_setzero_pd();
+ for(k = 0; k < 20; k+=2)
+ _mm_store_pd(&v[k], zero);
+
+ for(k = 0; k < 20; k++)
+ {
+ double *eev = &extEV[k * 20];
+ x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+ __m128d x1px2v = _mm_set1_pd(x1px2);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d ee = _mm_load_pd(&eev[l]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+
+ _mm_store_pd(&v[l], vv);
+ }
+ }
+ }
+ x3_ptr += 80;
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+ double umpX1[1840], ump_x2[20];
+
+
+ for(i = 0; i < 23; i++)
+ {
+ v = &(tipVector[20 * i]);
+
+ for(k = 0; k < 80; k++)
+ {
+ double *ll = &left[k * 20];
+
+ __m128d umpX1v = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));
+ }
+
+ umpX1v = _mm_hadd_pd(umpX1v, umpX1v);
+ _mm_storel_pd(&umpX1[80 * i + k], umpX1v);
+
+ }
+ }
+
+ {
+ uX1 = &umpX1[1760];
+
+ for(k = 0; k < 4; k++)
+ {
+ v = &(x2_gapColumn[k * 20]);
+
+ for(l = 0; l < 20; l++)
+ {
+ double *r = &right[k * 400 + l * 20];
+ __m128d ump_x2v = _mm_setzero_pd();
+
+ for(j = 0; j < 20; j+= 2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d rr = _mm_load_pd(&r[j]);
+ ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(vv, rr));
+ }
+
+ ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+
+ _mm_storel_pd(&ump_x2[l], ump_x2v);
+ }
+
+ v = &(x3_gapColumn[20 * k]);
+
+ __m128d zero = _mm_setzero_pd();
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], zero);
+
+ for(l = 0; l < 20; l++)
+ {
+ double *eev = &extEV[l * 20];
+ x1px2 = uX1[k * 20 + l] * ump_x2[l];
+ __m128d x1px2v = _mm_set1_pd(x1px2);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d ee = _mm_load_pd(&eev[j]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+
+ _mm_store_pd(&v[j], vv);
+ }
+ }
+
+ }
+
+ {
+ v = x3_gapColumn;
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+
+
+ if (scale)
+ {
+ gapScaling = 1;
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 80; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+ }
+ }
+
+ for (i = 0; i < n; i++)
+ {
+ if((x3_gap[i / 32] & mask32[i % 32]))
+ {
+ if(gapScaling)
+ {
+ addScale += wgt[i];
+ }
+ }
+ else
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2v = x2_gapColumn;
+ else
+ {
+ x2v = x2_ptr;
+ x2_ptr += 80;
+ }
+
+ for(k = 0; k < 4; k++)
+ {
+ v = &(x2v[k * 20]);
+
+ for(l = 0; l < 20; l++)
+ {
+ double *r = &right[k * 400 + l * 20];
+ __m128d ump_x2v = _mm_setzero_pd();
+
+ for(j = 0; j < 20; j+= 2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d rr = _mm_load_pd(&r[j]);
+ ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(vv, rr));
+ }
+
+ ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+
+ _mm_storel_pd(&ump_x2[l], ump_x2v);
+ }
+
+ v = &x3_ptr[20 * k];
+
+ __m128d zero = _mm_setzero_pd();
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], zero);
+
+ for(l = 0; l < 20; l++)
+ {
+ double *eev = &extEV[l * 20];
+ x1px2 = uX1[k * 20 + l] * ump_x2[l];
+ __m128d x1px2v = _mm_set1_pd(x1px2);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d ee = _mm_load_pd(&eev[j]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+
+ _mm_store_pd(&v[j], vv);
+ }
+ }
+
+ }
+
+
+ {
+ v = x3_ptr;
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+
+
+ if (scale)
+ {
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 80; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+ addScale += wgt[i];
+ }
+
+ x3_ptr += 80;
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ {
+ for(k = 0; k < 4; k++)
+ {
+ vl = &(x1_gapColumn[20 * k]);
+ vr = &(x2_gapColumn[20 * k]);
+ v = &(x3_gapColumn[20 * k]);
+
+ __m128d zero = _mm_setzero_pd();
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], zero);
+
+ for(l = 0; l < 20; l++)
+ {
+ {
+ __m128d al = _mm_setzero_pd();
+ __m128d ar = _mm_setzero_pd();
+
+ double *ll = &left[k * 400 + l * 20];
+ double *rr = &right[k * 400 + l * 20];
+ double *EVEV = &extEV[20 * l];
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d lv = _mm_load_pd(&ll[j]);
+ __m128d rv = _mm_load_pd(&rr[j]);
+ __m128d vll = _mm_load_pd(&vl[j]);
+ __m128d vrr = _mm_load_pd(&vr[j]);
+
+ al = _mm_add_pd(al, _mm_mul_pd(vll, lv));
+ ar = _mm_add_pd(ar, _mm_mul_pd(vrr, rv));
+ }
+
+ al = _mm_hadd_pd(al, al);
+ ar = _mm_hadd_pd(ar, ar);
+
+ al = _mm_mul_pd(al, ar);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d EVV = _mm_load_pd(&EVEV[j]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+ _mm_store_pd(&v[j], vv);
+ }
+ }
+
+ }
+ }
+
+
+ {
+ v = x3_gapColumn;
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+
+ if (scale)
+ {
+ gapScaling = 1;
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 80; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+
+ }
+ }
+
+ for (i = 0; i < n; i++)
+ {
+ if(x3_gap[i / 32] & mask32[i % 32])
+ {
+ if(gapScaling)
+ {
+ addScale += wgt[i];
+ }
+ }
+ else
+ {
+ if(x1_gap[i / 32] & mask32[i % 32])
+ x1v = x1_gapColumn;
+ else
+ {
+ x1v = x1_ptr;
+ x1_ptr += 80;
+ }
+
+ if(x2_gap[i / 32] & mask32[i % 32])
+ x2v = x2_gapColumn;
+ else
+ {
+ x2v = x2_ptr;
+ x2_ptr += 80;
+ }
+
+ for(k = 0; k < 4; k++)
+ {
+ vl = &(x1v[20 * k]);
+ vr = &(x2v[20 * k]);
+ v = &x3_ptr[20 * k];
+
+ __m128d zero = _mm_setzero_pd();
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], zero);
+
+ for(l = 0; l < 20; l++)
+ {
+ {
+ __m128d al = _mm_setzero_pd();
+ __m128d ar = _mm_setzero_pd();
+
+ double *ll = &left[k * 400 + l * 20];
+ double *rr = &right[k * 400 + l * 20];
+ double *EVEV = &extEV[20 * l];
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d lv = _mm_load_pd(&ll[j]);
+ __m128d rv = _mm_load_pd(&rr[j]);
+ __m128d vll = _mm_load_pd(&vl[j]);
+ __m128d vrr = _mm_load_pd(&vr[j]);
+
+ al = _mm_add_pd(al, _mm_mul_pd(vll, lv));
+ ar = _mm_add_pd(ar, _mm_mul_pd(vrr, rv));
+ }
+
+ al = _mm_hadd_pd(al, al);
+ ar = _mm_hadd_pd(ar, ar);
+
+ al = _mm_mul_pd(al, ar);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d EVV = _mm_load_pd(&EVEV[j]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+ _mm_store_pd(&v[j], vv);
+ }
+ }
+
+ }
+ }
+
+
+
+ {
+ v = x3_ptr;
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+
+
+ if (scale)
+ {
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 80; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+ addScale += wgt[i];
+ }
+ x3_ptr += 80;
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+}
+
+
+
+static void newviewGTRGAMMAPROT(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement)
+{
+ double *uX1, *uX2, *v;
+ double x1px2;
+ int i, j, l, k, scale, addScale = 0;
+ double *vl, *vr;
+
+
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ double umpX1[1840], umpX2[1840];
+
+ for(i = 0; i < 23; i++)
+ {
+ v = &(tipVector[20 * i]);
+
+ for(k = 0; k < 80; k++)
+ {
+ double *ll = &left[k * 20];
+ double *rr = &right[k * 20];
+
+ __m128d umpX1v = _mm_setzero_pd();
+ __m128d umpX2v = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));
+ umpX2v = _mm_add_pd(umpX2v, _mm_mul_pd(vv, _mm_load_pd(&rr[l])));
+ }
+
+ umpX1v = _mm_hadd_pd(umpX1v, umpX1v);
+ umpX2v = _mm_hadd_pd(umpX2v, umpX2v);
+
+ _mm_storel_pd(&umpX1[80 * i + k], umpX1v);
+ _mm_storel_pd(&umpX2[80 * i + k], umpX2v);
+
+ }
+ }
+
+ for(i = 0; i < n; i++)
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+ uX2 = &umpX2[80 * tipX2[i]];
+
+ for(j = 0; j < 4; j++)
+ {
+ v = &x3[i * 80 + j * 20];
+
+
+ __m128d zero = _mm_setzero_pd();
+ for(k = 0; k < 20; k+=2)
+ _mm_store_pd(&v[k], zero);
+
+ for(k = 0; k < 20; k++)
+ {
+ double *eev = &extEV[k * 20];
+ x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+ __m128d x1px2v = _mm_set1_pd(x1px2);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d ee = _mm_load_pd(&eev[l]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+
+ _mm_store_pd(&v[l], vv);
+ }
+ }
+
+
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+ double umpX1[1840], ump_x2[20];
+
+
+ for(i = 0; i < 23; i++)
+ {
+ v = &(tipVector[20 * i]);
+
+ for(k = 0; k < 80; k++)
+ {
+ double *ll = &left[k * 20];
+
+ __m128d umpX1v = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));
+ }
+
+ umpX1v = _mm_hadd_pd(umpX1v, umpX1v);
+ _mm_storel_pd(&umpX1[80 * i + k], umpX1v);
+
+
+ }
+ }
+
+ for (i = 0; i < n; i++)
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+
+ for(k = 0; k < 4; k++)
+ {
+ v = &(x2[80 * i + k * 20]);
+
+ for(l = 0; l < 20; l++)
+ {
+ double *r = &right[k * 400 + l * 20];
+ __m128d ump_x2v = _mm_setzero_pd();
+
+ for(j = 0; j < 20; j+= 2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d rr = _mm_load_pd(&r[j]);
+ ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(vv, rr));
+ }
+
+ ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+
+ _mm_storel_pd(&ump_x2[l], ump_x2v);
+ }
+
+ v = &(x3[80 * i + 20 * k]);
+
+ __m128d zero = _mm_setzero_pd();
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], zero);
+
+ for(l = 0; l < 20; l++)
+ {
+ double *eev = &extEV[l * 20];
+ x1px2 = uX1[k * 20 + l] * ump_x2[l];
+ __m128d x1px2v = _mm_set1_pd(x1px2);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d ee = _mm_load_pd(&eev[j]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+
+ _mm_store_pd(&v[j], vv);
+ }
+ }
+
+ }
+
+
+ {
+ v = &(x3[80 * i]);
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+
+
+ if (scale)
+ {
+
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 80; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+
+
+ addScale += wgt[i];
+
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ for(k = 0; k < 4; k++)
+ {
+ vl = &(x1[80 * i + 20 * k]);
+ vr = &(x2[80 * i + 20 * k]);
+ v = &(x3[80 * i + 20 * k]);
+
+
+ __m128d zero = _mm_setzero_pd();
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], zero);
+
+
+ for(l = 0; l < 20; l++)
+ {
+
+ {
+ __m128d al = _mm_setzero_pd();
+ __m128d ar = _mm_setzero_pd();
+
+ double *ll = &left[k * 400 + l * 20];
+ double *rr = &right[k * 400 + l * 20];
+ double *EVEV = &extEV[20 * l];
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d lv = _mm_load_pd(&ll[j]);
+ __m128d rv = _mm_load_pd(&rr[j]);
+ __m128d vll = _mm_load_pd(&vl[j]);
+ __m128d vrr = _mm_load_pd(&vr[j]);
+
+ al = _mm_add_pd(al, _mm_mul_pd(vll, lv));
+ ar = _mm_add_pd(ar, _mm_mul_pd(vrr, rv));
+ }
+
+ al = _mm_hadd_pd(al, al);
+ ar = _mm_hadd_pd(ar, ar);
+
+ al = _mm_mul_pd(al, ar);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d EVV = _mm_load_pd(&EVEV[j]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+ _mm_store_pd(&v[j], vv);
+ }
+ }
+
+ }
+ }
+
+
+
+ {
+ v = &(x3[80 * i]);
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+
+
+ if (scale)
+ {
+
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 80; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+
+
+ addScale += wgt[i];
+
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+
+}
+
+
+
+static void newviewGTRCATPROT(int tipCase, double *extEV,
+ int *cptr,
+ double *x1, double *x2, double *x3, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement )
+{
+ double
+ *le, *ri, *v, *vl, *vr;
+
+ int i, l, j, scale, addScale = 0;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ for (i = 0; i < n; i++)
+ {
+ le = &left[cptr[i] * 400];
+ ri = &right[cptr[i] * 400];
+
+ vl = &(tipVector[20 * tipX1[i]]);
+ vr = &(tipVector[20 * tipX2[i]]);
+ v = &x3[20 * i];
+
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], _mm_setzero_pd());
+
+
+ for(l = 0; l < 20; l++)
+ {
+ __m128d x1v = _mm_setzero_pd();
+ __m128d x2v = _mm_setzero_pd();
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+ for(j = 0; j < 20; j+=2)
+ {
+ x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));
+ x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+ }
+
+ x1v = _mm_hadd_pd(x1v, x1v);
+ x2v = _mm_hadd_pd(x2v, x2v);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+ _mm_store_pd(&v[j], vv);
+ }
+
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+ for (i = 0; i < n; i++)
+ {
+ le = &left[cptr[i] * 400];
+ ri = &right[cptr[i] * 400];
+
+ vl = &(tipVector[20 * tipX1[i]]);
+ vr = &x2[20 * i];
+ v = &x3[20 * i];
+
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], _mm_setzero_pd());
+
+
+
+ for(l = 0; l < 20; l++)
+ {
+
+ __m128d x1v = _mm_setzero_pd();
+ __m128d x2v = _mm_setzero_pd();
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+ for(j = 0; j < 20; j+=2)
+ {
+ x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));
+ x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+ }
+
+ x1v = _mm_hadd_pd(x1v, x1v);
+ x2v = _mm_hadd_pd(x2v, x2v);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+ _mm_store_pd(&v[j], vv);
+ }
+
+ }
+
+ {
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 20); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+
+
+ if(scale)
+ {
+
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+ addScale += wgt[i];
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ for(i = 0; i < n; i++)
+ {
+ le = &left[cptr[i] * 400];
+ ri = &right[cptr[i] * 400];
+
+ vl = &x1[20 * i];
+ vr = &x2[20 * i];
+ v = &x3[20 * i];
+
+
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], _mm_setzero_pd());
+
+
+ for(l = 0; l < 20; l++)
+ {
+
+ __m128d x1v = _mm_setzero_pd();
+ __m128d x2v = _mm_setzero_pd();
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+
+ for(j = 0; j < 20; j+=2)
+ {
+ x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));
+ x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+ }
+
+ x1v = _mm_hadd_pd(x1v, x1v);
+ x2v = _mm_hadd_pd(x2v, x2v);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+ _mm_store_pd(&v[j], vv);
+ }
+
+ }
+
+ {
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 20); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+
+
+ if(scale)
+ {
+
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+
+
+ addScale += wgt[i];
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+
+}
+
+static void newviewGTRCATPROT_SAVE(int tipCase, double *extEV,
+ int *cptr,
+ double *x1, double *x2, double *x3, double *tipVector,
+ unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement,
+ unsigned int *x1_gap, unsigned int *x2_gap, unsigned int *x3_gap,
+ double *x1_gapColumn, double *x2_gapColumn, double *x3_gapColumn, const int maxCats)
+{
+ double
+ *le,
+ *ri,
+ *v,
+ *vl,
+ *vr,
+ *x1_ptr = x1,
+ *x2_ptr = x2,
+ *x3_ptr = x3;
+
+ int
+ i,
+ l,
+ j,
+ scale,
+ scaleGap = 0,
+ addScale = 0;
+
+ {
+ vl = x1_gapColumn;
+ vr = x2_gapColumn;
+ v = x3_gapColumn;
+
+ le = &left[maxCats * 400];
+ ri = &right[maxCats * 400];
+
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], _mm_setzero_pd());
+
+ for(l = 0; l < 20; l++)
+ {
+ __m128d x1v = _mm_setzero_pd();
+ __m128d x2v = _mm_setzero_pd();
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+
+ for(j = 0; j < 20; j+=2)
+ {
+ x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));
+ x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+ }
+
+ x1v = _mm_hadd_pd(x1v, x1v);
+ x2v = _mm_hadd_pd(x2v, x2v);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+ _mm_store_pd(&v[j], vv);
+ }
+ }
+
+ if(tipCase != TIP_TIP)
+ {
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 20); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+ scaleGap = TRUE;
+ }
+ }
+ }
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ for (i = 0; i < n; i++)
+ {
+ if(noGap(x3_gap, i))
+ {
+ vl = &(tipVector[20 * tipX1[i]]);
+ vr = &(tipVector[20 * tipX2[i]]);
+ v = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ le = &left[maxCats * 400];
+ else
+ le = &left[cptr[i] * 400];
+
+ if(isGap(x2_gap, i))
+ ri = &right[maxCats * 400];
+ else
+ ri = &right[cptr[i] * 400];
+
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], _mm_setzero_pd());
+
+ for(l = 0; l < 20; l++)
+ {
+ __m128d x1v = _mm_setzero_pd();
+ __m128d x2v = _mm_setzero_pd();
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+ for(j = 0; j < 20; j+=2)
+ {
+ x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));
+ x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+ }
+
+ x1v = _mm_hadd_pd(x1v, x1v);
+ x2v = _mm_hadd_pd(x2v, x2v);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+ _mm_store_pd(&v[j], vv);
+ }
+ }
+
+ x3_ptr += 20;
+
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+ for (i = 0; i < n; i++)
+ {
+ if(isGap(x3_gap, i))
+ {
+ if(scaleGap)
+ addScale += wgt[i];
+ }
+ else
+ {
+ vl = &(tipVector[20 * tipX1[i]]);
+
+ vr = x2_ptr;
+ v = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ le = &left[maxCats * 400];
+ else
+ le = &left[cptr[i] * 400];
+
+ if(isGap(x2_gap, i))
+ {
+ ri = &right[maxCats * 400];
+ vr = x2_gapColumn;
+ }
+ else
+ {
+ ri = &right[cptr[i] * 400];
+ vr = x2_ptr;
+ x2_ptr += 20;
+ }
+
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], _mm_setzero_pd());
+
+ for(l = 0; l < 20; l++)
+ {
+ __m128d x1v = _mm_setzero_pd();
+ __m128d x2v = _mm_setzero_pd();
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+ for(j = 0; j < 20; j+=2)
+ {
+ x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));
+ x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+ }
+
+ x1v = _mm_hadd_pd(x1v, x1v);
+ x2v = _mm_hadd_pd(x2v, x2v);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+ _mm_store_pd(&v[j], vv);
+ }
+ }
+
+ {
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 20); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+
+
+ if(scale)
+ {
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+ addScale += wgt[i];
+ }
+ x3_ptr += 20;
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ for(i = 0; i < n; i++)
+ {
+ if(isGap(x3_gap, i))
+ {
+ if(scaleGap)
+ addScale += wgt[i];
+ }
+ else
+ {
+ v = x3_ptr;
+
+ if(isGap(x1_gap, i))
+ {
+ vl = x1_gapColumn;
+ le = &left[maxCats * 400];
+ }
+ else
+ {
+ le = &left[cptr[i] * 400];
+ vl = x1_ptr;
+ x1_ptr += 20;
+ }
+
+ if(isGap(x2_gap, i))
+ {
+ vr = x2_gapColumn;
+ ri = &right[maxCats * 400];
+ }
+ else
+ {
+ ri = &right[cptr[i] * 400];
+ vr = x2_ptr;
+ x2_ptr += 20;
+ }
+
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], _mm_setzero_pd());
+
+ for(l = 0; l < 20; l++)
+ {
+ __m128d x1v = _mm_setzero_pd();
+ __m128d x2v = _mm_setzero_pd();
+ double
+ *ev = &extEV[l * 20],
+ *lv = &le[l * 20],
+ *rv = &ri[l * 20];
+
+ for(j = 0; j < 20; j+=2)
+ {
+ x1v = _mm_add_pd(x1v, _mm_mul_pd(_mm_load_pd(&vl[j]), _mm_load_pd(&lv[j])));
+ x2v = _mm_add_pd(x2v, _mm_mul_pd(_mm_load_pd(&vr[j]), _mm_load_pd(&rv[j])));
+ }
+
+ x1v = _mm_hadd_pd(x1v, x1v);
+ x2v = _mm_hadd_pd(x2v, x2v);
+
+ x1v = _mm_mul_pd(x1v, x2v);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1v, _mm_load_pd(&ev[j])));
+ _mm_store_pd(&v[j], vv);
+ }
+
+ }
+
+ {
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 20); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+
+ if(scale)
+ {
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+ addScale += wgt[i];
+ }
+ x3_ptr += 20;
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+
+ *scalerIncrement = addScale;
+
+}
+
+static void newviewGTRGAMMAPROT_LG4(int tipCase,
+ double *x1, double *x2, double *x3, double *extEV[4], double *tipVector[4],
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling)
+{
+ double *uX1, *uX2, *v;
+ double x1px2;
+ int i, j, l, k, scale, addScale = 0;
+ double *vl, *vr;
+#ifndef __SIM_SSE3
+ double al, ar;
+#endif
+
+
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ double umpX1[1840], umpX2[1840];
+
+ for(i = 0; i < 23; i++)
+ {
+
+
+ for(k = 0; k < 80; k++)
+ {
+
+ v = &(tipVector[k / 20][20 * i]);
+#ifdef __SIM_SSE3
+ double *ll = &left[k * 20];
+ double *rr = &right[k * 20];
+
+ __m128d umpX1v = _mm_setzero_pd();
+ __m128d umpX2v = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));
+ umpX2v = _mm_add_pd(umpX2v, _mm_mul_pd(vv, _mm_load_pd(&rr[l])));
+ }
+
+ umpX1v = _mm_hadd_pd(umpX1v, umpX1v);
+ umpX2v = _mm_hadd_pd(umpX2v, umpX2v);
+
+ _mm_storel_pd(&umpX1[80 * i + k], umpX1v);
+ _mm_storel_pd(&umpX2[80 * i + k], umpX2v);
+#else
+ umpX1[80 * i + k] = 0.0;
+ umpX2[80 * i + k] = 0.0;
+
+ for(l = 0; l < 20; l++)
+ {
+ umpX1[80 * i + k] += v[l] * left[k * 20 + l];
+ umpX2[80 * i + k] += v[l] * right[k * 20 + l];
+ }
+#endif
+ }
+ }
+
+ for(i = 0; i < n; i++)
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+ uX2 = &umpX2[80 * tipX2[i]];
+
+ for(j = 0; j < 4; j++)
+ {
+ v = &x3[i * 80 + j * 20];
+
+#ifdef __SIM_SSE3
+ __m128d zero = _mm_setzero_pd();
+ for(k = 0; k < 20; k+=2)
+ _mm_store_pd(&v[k], zero);
+
+ for(k = 0; k < 20; k++)
+ {
+ double *eev = &extEV[j][k * 20];
+ x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+ __m128d x1px2v = _mm_set1_pd(x1px2);
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d ee = _mm_load_pd(&eev[l]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+
+ _mm_store_pd(&v[l], vv);
+ }
+ }
+
+#else
+
+ for(k = 0; k < 20; k++)
+ v[k] = 0.0;
+
+ for(k = 0; k < 20; k++)
+ {
+ x1px2 = uX1[j * 20 + k] * uX2[j * 20 + k];
+
+ for(l = 0; l < 20; l++)
+ v[l] += x1px2 * extEV[j][20 * k + l];
+ }
+#endif
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+ double umpX1[1840], ump_x2[20];
+
+
+ for(i = 0; i < 23; i++)
+ {
+
+
+ for(k = 0; k < 80; k++)
+ {
+ v = &(tipVector[k / 20][20 * i]);
+#ifdef __SIM_SSE3
+ double *ll = &left[k * 20];
+
+ __m128d umpX1v = _mm_setzero_pd();
+
+ for(l = 0; l < 20; l+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ umpX1v = _mm_add_pd(umpX1v, _mm_mul_pd(vv, _mm_load_pd(&ll[l])));
+ }
+
+ umpX1v = _mm_hadd_pd(umpX1v, umpX1v);
+ _mm_storel_pd(&umpX1[80 * i + k], umpX1v);
+#else
+ umpX1[80 * i + k] = 0.0;
+
+ for(l = 0; l < 20; l++)
+ umpX1[80 * i + k] += v[l] * left[k * 20 + l];
+#endif
+
+ }
+ }
+
+ for (i = 0; i < n; i++)
+ {
+ uX1 = &umpX1[80 * tipX1[i]];
+
+ for(k = 0; k < 4; k++)
+ {
+ v = &(x2[80 * i + k * 20]);
+#ifdef __SIM_SSE3
+ for(l = 0; l < 20; l++)
+ {
+ double *r = &right[k * 400 + l * 20];
+ __m128d ump_x2v = _mm_setzero_pd();
+
+ for(j = 0; j < 20; j+= 2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d rr = _mm_load_pd(&r[j]);
+ ump_x2v = _mm_add_pd(ump_x2v, _mm_mul_pd(vv, rr));
+ }
+
+ ump_x2v = _mm_hadd_pd(ump_x2v, ump_x2v);
+
+ _mm_storel_pd(&ump_x2[l], ump_x2v);
+ }
+
+ v = &(x3[80 * i + 20 * k]);
+
+ __m128d zero = _mm_setzero_pd();
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], zero);
+
+ for(l = 0; l < 20; l++)
+ {
+ double *eev = &extEV[k][l * 20];
+ x1px2 = uX1[k * 20 + l] * ump_x2[l];
+ __m128d x1px2v = _mm_set1_pd(x1px2);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d ee = _mm_load_pd(&eev[j]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(x1px2v,ee));
+
+ _mm_store_pd(&v[j], vv);
+ }
+ }
+#else
+ for(l = 0; l < 20; l++)
+ {
+ ump_x2[l] = 0.0;
+
+ for(j = 0; j < 20; j++)
+ ump_x2[l] += v[j] * right[k * 400 + l * 20 + j];
+ }
+
+ v = &(x3[80 * i + 20 * k]);
+
+ for(l = 0; l < 20; l++)
+ v[l] = 0;
+
+ for(l = 0; l < 20; l++)
+ {
+ x1px2 = uX1[k * 20 + l] * ump_x2[l];
+ for(j = 0; j < 20; j++)
+ v[j] += x1px2 * extEV[k][l * 20 + j];
+ }
+#endif
+ }
+
+#ifdef __SIM_SSE3
+ {
+ v = &(x3[80 * i]);
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+#else
+ v = &x3[80 * i];
+ scale = 1;
+ for(l = 0; scale && (l < 80); l++)
+ scale = (ABS(v[l]) < minlikelihood);
+#endif
+
+ if (scale)
+ {
+#ifdef __SIM_SSE3
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 80; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+#else
+ for(l = 0; l < 80; l++)
+ v[l] *= twotothe256;
+#endif
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ for(k = 0; k < 4; k++)
+ {
+ vl = &(x1[80 * i + 20 * k]);
+ vr = &(x2[80 * i + 20 * k]);
+ v = &(x3[80 * i + 20 * k]);
+
+#ifdef __SIM_SSE3
+ __m128d zero = _mm_setzero_pd();
+ for(l = 0; l < 20; l+=2)
+ _mm_store_pd(&v[l], zero);
+#else
+ for(l = 0; l < 20; l++)
+ v[l] = 0;
+#endif
+
+ for(l = 0; l < 20; l++)
+ {
+#ifdef __SIM_SSE3
+ {
+ __m128d al = _mm_setzero_pd();
+ __m128d ar = _mm_setzero_pd();
+
+ double *ll = &left[k * 400 + l * 20];
+ double *rr = &right[k * 400 + l * 20];
+ double *EVEV = &extEV[k][20 * l];
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d lv = _mm_load_pd(&ll[j]);
+ __m128d rv = _mm_load_pd(&rr[j]);
+ __m128d vll = _mm_load_pd(&vl[j]);
+ __m128d vrr = _mm_load_pd(&vr[j]);
+
+ al = _mm_add_pd(al, _mm_mul_pd(vll, lv));
+ ar = _mm_add_pd(ar, _mm_mul_pd(vrr, rv));
+ }
+
+ al = _mm_hadd_pd(al, al);
+ ar = _mm_hadd_pd(ar, ar);
+
+ al = _mm_mul_pd(al, ar);
+
+ for(j = 0; j < 20; j+=2)
+ {
+ __m128d vv = _mm_load_pd(&v[j]);
+ __m128d EVV = _mm_load_pd(&EVEV[j]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+ _mm_store_pd(&v[j], vv);
+ }
+ }
+#else
+ al = 0.0;
+ ar = 0.0;
+
+ for(j = 0; j < 20; j++)
+ {
+ al += vl[j] * left[k * 400 + l * 20 + j];
+ ar += vr[j] * right[k * 400 + l * 20 + j];
+ }
+
+ x1px2 = al * ar;
+
+ for(j = 0; j < 20; j++)
+ v[j] += x1px2 * extEV[k][20 * l + j];
+#endif
+ }
+ }
+
+
+#ifdef __SIM_SSE3
+ {
+ v = &(x3[80 * i]);
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 80); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&v[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+ }
+#else
+ v = &(x3[80 * i]);
+ scale = 1;
+ for(l = 0; scale && (l < 80); l++)
+ scale = ((ABS(v[l]) < minlikelihood));
+#endif
+
+ if (scale)
+ {
+#ifdef __SIM_SSE3
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 80; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&v[l]);
+ _mm_store_pd(&v[l], _mm_mul_pd(ex3v,twoto));
+ }
+#else
+ for(l = 0; l < 80; l++)
+ v[l] *= twotothe256;
+#endif
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ if(useFastScaling)
+ *scalerIncrement = addScale;
+
+}
+
+#endif
+
+#ifdef _OPTIMIZED_FUNCTIONS
+
+/*** BINARY DATA functions *****/
+
+static void newviewGTRCAT_BINARY( int tipCase, double *EV, int *cptr,
+ double *x1_start, double *x2_start, double *x3_start, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling)
+{
+ double
+ *le,
+ *ri,
+ *x1, *x2, *x3;
+ int i, l, scale, addScale = 0;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ {
+ for(i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[2 * tipX1[i]]);
+ x2 = &(tipVector[2 * tipX2[i]]);
+ x3 = &x3_start[2 * i];
+
+ le = &left[cptr[i] * 4];
+ ri = &right[cptr[i] * 4];
+
+ _mm_store_pd(x3, _mm_setzero_pd());
+
+ for(l = 0; l < 2; l++)
+ {
+ __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&le[l * 2]));
+ __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&ri[l * 2]));
+
+ al = _mm_hadd_pd(al, al);
+ ar = _mm_hadd_pd(ar, ar);
+
+ al = _mm_mul_pd(al, ar);
+
+ __m128d vv = _mm_load_pd(x3);
+ __m128d EVV = _mm_load_pd(&EV[2 * l]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+ _mm_store_pd(x3, vv);
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ {
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[2 * tipX1[i]]);
+ x2 = &x2_start[2 * i];
+ x3 = &x3_start[2 * i];
+
+ le = &left[cptr[i] * 4];
+ ri = &right[cptr[i] * 4];
+
+ _mm_store_pd(x3, _mm_setzero_pd());
+
+ for(l = 0; l < 2; l++)
+ {
+ __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&le[l * 2]));
+ __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&ri[l * 2]));
+
+ al = _mm_hadd_pd(al, al);
+ ar = _mm_hadd_pd(ar, ar);
+
+ al = _mm_mul_pd(al, ar);
+
+ __m128d vv = _mm_load_pd(x3);
+ __m128d EVV = _mm_load_pd(&EV[2 * l]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+ _mm_store_pd(x3, vv);
+ }
+
+ __m128d minlikelihood_sse = _mm_set1_pd(minlikelihood);
+
+ scale = 1;
+
+ __m128d v1 = _mm_and_pd(_mm_load_pd(x3), absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+
+ if(scale)
+ {
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ __m128d ex3v = _mm_load_pd(x3);
+ _mm_store_pd(x3, _mm_mul_pd(ex3v,twoto));
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &x1_start[2 * i];
+ x2 = &x2_start[2 * i];
+ x3 = &x3_start[2 * i];
+
+ le = &left[cptr[i] * 4];
+ ri = &right[cptr[i] * 4];
+
+ _mm_store_pd(x3, _mm_setzero_pd());
+
+ for(l = 0; l < 2; l++)
+ {
+ __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&le[l * 2]));
+ __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&ri[l * 2]));
+
+ al = _mm_hadd_pd(al, al);
+ ar = _mm_hadd_pd(ar, ar);
+
+ al = _mm_mul_pd(al, ar);
+
+ __m128d vv = _mm_load_pd(x3);
+ __m128d EVV = _mm_load_pd(&EV[2 * l]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+ _mm_store_pd(x3, vv);
+ }
+
+ __m128d minlikelihood_sse = _mm_set1_pd(minlikelihood);
+
+ scale = 1;
+
+ __m128d v1 = _mm_and_pd(_mm_load_pd(x3), absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+
+ if(scale)
+ {
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ __m128d ex3v = _mm_load_pd(x3);
+ _mm_store_pd(x3, _mm_mul_pd(ex3v,twoto));
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ if(useFastScaling)
+ *scalerIncrement = addScale;
+
+}
+
+static void newviewGTRGAMMA_BINARY(int tipCase,
+ double *x1_start, double *x2_start, double *x3_start,
+ double *EV, double *tipVector,
+ int *ex3, unsigned char *tipX1, unsigned char *tipX2,
+ const int n, double *left, double *right, int *wgt, int *scalerIncrement, const boolean useFastScaling
+ )
+{
+ double
+ *x1, *x2, *x3;
+
+ int i, k, l, scale, addScale = 0;
+
+ switch(tipCase)
+ {
+ case TIP_TIP:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[2 * tipX1[i]]);
+ x2 = &(tipVector[2 * tipX2[i]]);
+
+ for(k = 0; k < 4; k++)
+ {
+ x3 = &(x3_start[8 * i + 2 * k]);
+
+ _mm_store_pd(x3, _mm_setzero_pd());
+
+ for(l = 0; l < 2; l++)
+ {
+ __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&left[k * 4 + l * 2]));
+ __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&right[k * 4 + l * 2]));
+
+ al = _mm_hadd_pd(al, al);
+ ar = _mm_hadd_pd(ar, ar);
+
+ al = _mm_mul_pd(al, ar);
+
+ __m128d vv = _mm_load_pd(x3);
+ __m128d EVV = _mm_load_pd(&EV[2 * l]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+ _mm_store_pd(x3, vv);
+ }
+ }
+ }
+ break;
+ case TIP_INNER:
+ for (i = 0; i < n; i++)
+ {
+ x1 = &(tipVector[2 * tipX1[i]]);
+
+ for(k = 0; k < 4; k++)
+ {
+ x2 = &(x2_start[8 * i + 2 * k]);
+ x3 = &(x3_start[8 * i + 2 * k]);
+
+ _mm_store_pd(x3, _mm_setzero_pd());
+
+ for(l = 0; l < 2; l++)
+ {
+ __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&left[k * 4 + l * 2]));
+ __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&right[k * 4 + l * 2]));
+
+ al = _mm_hadd_pd(al, al);
+ ar = _mm_hadd_pd(ar, ar);
+
+ al = _mm_mul_pd(al, ar);
+
+ __m128d vv = _mm_load_pd(x3);
+ __m128d EVV = _mm_load_pd(&EV[2 * l]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+ _mm_store_pd(x3, vv);
+ }
+ }
+
+ x3 = &(x3_start[8 * i]);
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 8); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&x3[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 8; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&x3[l]);
+ _mm_store_pd(&x3[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ break;
+ case INNER_INNER:
+ for (i = 0; i < n; i++)
+ {
+ for(k = 0; k < 4; k++)
+ {
+ x1 = &(x1_start[8 * i + 2 * k]);
+ x2 = &(x2_start[8 * i + 2 * k]);
+ x3 = &(x3_start[8 * i + 2 * k]);
+
+ _mm_store_pd(x3, _mm_setzero_pd());
+
+ for(l = 0; l < 2; l++)
+ {
+ __m128d al = _mm_mul_pd(_mm_load_pd(x1), _mm_load_pd(&left[k * 4 + l * 2]));
+ __m128d ar = _mm_mul_pd(_mm_load_pd(x2), _mm_load_pd(&right[k * 4 + l * 2]));
+
+ al = _mm_hadd_pd(al, al);
+ ar = _mm_hadd_pd(ar, ar);
+
+ al = _mm_mul_pd(al, ar);
+
+ __m128d vv = _mm_load_pd(x3);
+ __m128d EVV = _mm_load_pd(&EV[2 * l]);
+
+ vv = _mm_add_pd(vv, _mm_mul_pd(al, EVV));
+
+ _mm_store_pd(x3, vv);
+ }
+ }
+
+ x3 = &(x3_start[8 * i]);
+ __m128d minlikelihood_sse = _mm_set1_pd( minlikelihood );
+
+ scale = 1;
+ for(l = 0; scale && (l < 8); l += 2)
+ {
+ __m128d vv = _mm_load_pd(&x3[l]);
+ __m128d v1 = _mm_and_pd(vv, absMask.m);
+ v1 = _mm_cmplt_pd(v1, minlikelihood_sse);
+ if(_mm_movemask_pd( v1 ) != 3)
+ scale = 0;
+ }
+
+ if(scale)
+ {
+ __m128d twoto = _mm_set_pd(twotothe256, twotothe256);
+
+ for(l = 0; l < 8; l+=2)
+ {
+ __m128d ex3v = _mm_load_pd(&x3[l]);
+ _mm_store_pd(&x3[l], _mm_mul_pd(ex3v,twoto));
+ }
+
+ if(useFastScaling)
+ addScale += wgt[i];
+ else
+ ex3[i] += 1;
+ }
+ }
+ break;
+
+ default:
+ assert(0);
+ }
+
+ if(useFastScaling)
+ *scalerIncrement = addScale;
+
+}
+
+
+/**** BINARY DATA functions end ****/
+
+
+
+#endif
+
+
diff --git a/examl/optimizeModel.c b/examl/optimizeModel.c
new file mode 100644
index 0000000..a9b5ebb
--- /dev/null
+++ b/examl/optimizeModel.c
@@ -0,0 +1,3134 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands
+ * of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include "axml.h"
+
+
+static const double MNBRAK_GOLD = 1.618034;
+static const double MNBRAK_TINY = 1.e-20;
+static const double MNBRAK_GLIMIT = 100.0;
+static const double BRENT_ZEPS = 1.e-5;
+static const double BRENT_CGOLD = 0.3819660;
+
+extern int optimizeRatesInvocations;
+extern int optimizeRateCategoryInvocations;
+extern int optimizeAlphaInvocations;
+extern int optimizeInvarInvocations;
+extern double masterTime;
+extern char ratesFileName[1024];
+extern char workdir[1024];
+extern char run_id[128];
+extern char lengthFileName[1024];
+extern char lengthFileNameModel[1024];
+extern char *protModels[NUM_PROT_MODELS];
+
+extern checkPointState ckp;
+
+extern int processes;
+extern int processID;
+
+static void optParamGeneric(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels, int rateNumber, double lim_inf, double lim_sup, int whichParameterType);
+
+// FLAG for easier debugging of model parameter optimization routines
+
+//#define _DEBUG_MOD_OPT
+
+
+/*********************FUNCTIONS FOOR EXACT MODEL OPTIMIZATION UNDER GTRGAMMA ***************************************/
+
+
+static void setRateModel(tree *tr, int model, double rate, int position)
+{
+ int
+ states = tr->partitionData[model].states,
+ numRates = (states * states - states) / 2;
+
+ if(tr->partitionData[model].dataType == DNA_DATA)
+ assert(position >= 0 && position < (numRates - 1));
+ else
+ assert(position >= 0 && position < numRates);
+
+ assert(tr->partitionData[model].dataType != BINARY_DATA);
+
+ assert(rate >= RATE_MIN && rate <= RATE_MAX);
+
+ if(tr->partitionData[model].nonGTR)
+ {
+ int
+ i,
+ index = tr->partitionData[model].symmetryVector[position],
+ lastRate = tr->partitionData[model].symmetryVector[numRates - 1];
+
+
+
+ for(i = 0; i < numRates; i++)
+ {
+ if(tr->partitionData[model].symmetryVector[i] == index)
+ {
+ if(index == lastRate)
+ tr->partitionData[model].substRates[i] = 1.0;
+ else
+ tr->partitionData[model].substRates[i] = rate;
+ }
+
+ //printf("%f ", tr->partitionData[model].substRates[i]);
+ }
+ //printf("\n");
+ }
+ else
+ tr->partitionData[model].substRates[position] = rate;
+}
+
+
+//LIBRARY: the only thing that we will need to do here is to
+//replace linkList by a string and also add some error correction
+//code
+
+
+static linkageList* initLinkageList(int *linkList, tree *tr)
+{
+ int
+ k,
+ partitions,
+ numberOfModels = 0,
+ i,
+ pos;
+
+ linkageList
+ *ll = (linkageList*)malloc(sizeof(linkageList));
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ {
+ assert(linkList[i] >= 0 && linkList[i] < tr->NumberOfModels);
+
+ if(linkList[i] > numberOfModels)
+ numberOfModels = linkList[i];
+ }
+
+ numberOfModels++;
+
+ ll->entries = numberOfModels;
+ ll->ld = (linkageData*)malloc(sizeof(linkageData) * numberOfModels);
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ ll->ld[i].valid = TRUE;
+
+ partitions = 0;
+
+ for(k = 0; k < tr->NumberOfModels; k++)
+ if(linkList[k] == i)
+ partitions++;
+
+ ll->ld[i].partitions = partitions;
+ ll->ld[i].partitionList = (int*)malloc(sizeof(int) * partitions);
+
+ for(k = 0, pos = 0; k < tr->NumberOfModels; k++)
+ if(linkList[k] == i)
+ ll->ld[i].partitionList[pos++] = k;
+ }
+
+ return ll;
+}
+
+static linkageList* initLinkageListString(char *linkageString, tree *tr)
+{
+ int
+ *list = (int*)malloc(sizeof(int) * tr->NumberOfModels),
+ j;
+
+ linkageList
+ *l;
+
+ char
+ *str1,
+ *saveptr,
+ *ch = (char *)calloc(strlen(linkageString), sizeof(char)),
+ *token;
+ strncpy(ch, linkageString, strlen(linkageString));
+
+ for(j = 0, str1 = ch; ;j++, str1 = (char *)NULL)
+ {
+ token = strtok_r(str1, ",", &saveptr);
+ if(token == (char *)NULL)
+ break;
+ assert(j < tr->NumberOfModels);
+ list[j] = atoi(token);
+ //printf("%d: %s\n", j, token);
+ }
+
+ free(ch);
+
+ l = initLinkageList(list, tr);
+
+ free(list);
+
+ return l;
+}
+
+static void init_Q_MatrixSymmetries(char *linkageString, tree *tr, int model)
+{
+ int
+ states = tr->partitionData[model].states,
+ numberOfRates = ((states * states - states) / 2),
+ *list = (int *)malloc(sizeof(int) * numberOfRates),
+ j,
+ max = -1;
+
+ char
+ *str1,
+ *saveptr,
+ *ch = (char*)calloc(strlen(linkageString), sizeof(char)),
+ *token;
+
+ strncpy(ch, linkageString, strlen(linkageString));
+
+ for(j = 0, str1 = ch; ;j++, str1 = (char *)NULL)
+ {
+ token = strtok_r(str1, ",", &saveptr);
+ if(token == (char *)NULL)
+ break;
+ assert(j < numberOfRates);
+ list[j] = atoi(token);
+ }
+
+ free(ch);
+
+ for(j = 0; j < numberOfRates; j++)
+ {
+ assert(list[j] <= j);
+ assert(list[j] <= max + 1);
+
+ if(list[j] > max)
+ max = list[j];
+ }
+
+ assert(numberOfRates == 6);
+
+ for(j = 0; j < numberOfRates; j++)
+ tr->partitionData[model].symmetryVector[j] = list[j];
+
+ //less than the maximum possible number of rate parameters
+
+ if(max < numberOfRates - 1)
+ tr->partitionData[model].nonGTR = TRUE;
+
+ free(list);
+}
+
+
+
+static linkageList* initLinkageListGTR(tree *tr)
+{
+ int
+ i,
+ *links = (int*)malloc(sizeof(int) * tr->NumberOfModels),
+ firstAA = tr->NumberOfModels + 2,
+ countGTR = 0,
+ countOtherModel = 0;
+ linkageList* ll;
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ {
+ if(tr->partitionData[i].dataType == AA_DATA)
+ {
+ if(tr->partitionData[i].protModels == GTR)
+ {
+ if(i < firstAA)
+ firstAA = i;
+ countGTR++;
+ }
+ else
+ countOtherModel++;
+ }
+ }
+
+ assert((countGTR > 0 && countOtherModel == 0) || (countGTR == 0 && countOtherModel > 0) || (countGTR == 0 && countOtherModel == 0));
+
+ if(countGTR == 0)
+ {
+ for(i = 0; i < tr->NumberOfModels; i++)
+ links[i] = i;
+ }
+ else
+ {
+ for(i = 0; i < tr->NumberOfModels; i++)
+ {
+ switch(tr->partitionData[i].dataType)
+ {
+ case DNA_DATA:
+ case BINARY_DATA:
+ case GENERIC_32:
+ case GENERIC_64:
+ case SECONDARY_DATA:
+ case SECONDARY_DATA_6:
+ case SECONDARY_DATA_7:
+ links[i] = i;
+ break;
+ case AA_DATA:
+ links[i] = firstAA;
+ break;
+ default:
+ assert(0);
+ }
+ }
+ }
+
+
+ ll = initLinkageList(links, tr);
+
+ free(links);
+
+ return ll;
+}
+
+
+
+static void freeLinkageList( linkageList* ll)
+{
+ int i;
+
+ for(i = 0; i < ll->entries; i++)
+ free(ll->ld[i].partitionList);
+
+ free(ll->ld);
+ free(ll);
+}
+
+#define ALPHA_F 0
+#define RATE_F 1
+#define FREQ_F 2
+#define LXRATE_F 3
+#define LXWEIGHT_F 4
+
+void scaleLG4X_EIGN(tree *tr, int model)
+{
+ double
+ acc = 0.0;
+
+ int
+ i,
+ l;
+
+ for(i = 0; i < 4; i++)
+ acc += tr->partitionData[model].weights[i] * tr->partitionData[model].gammaRates[i];
+
+ acc = 1.0 / acc;
+
+ /*
+ printf("update %f %f %f %f %f\n", acc, tr->partitionData[model].gammaRates[0], tr->partitionData[model].gammaRates[1], tr->partitionData[model].gammaRates[2],
+ tr->partitionData[model].gammaRates[3]);
+
+ printf("weigths: %f %f %f %f\n", tr->partitionData[model].weights[0], tr->partitionData[model].weights[1], tr->partitionData[model].weights[2],
+ tr->partitionData[model].weights[3]);
+ */
+
+ for(i = 0; i < 4; i++)
+ for(l = 0; l < 20; l++)
+ tr->partitionData[model].EIGN_LG4[i][l] = tr->partitionData[model].rawEIGN_LG4[i][l] * acc;
+}
+
+
+static void updateWeights(tree *tr, int model, int rate, double value)
+{
+ int
+ j;
+
+ double
+ w = 0.0;
+
+ assert(rate >= 0 && rate < 4);
+
+ tr->partitionData[model].weightExponents[rate] = value;
+
+ for(j = 0; j < 4; j++)
+ w += exp(tr->partitionData[model].weightExponents[j]);
+
+ for(j = 0; j < 4; j++)
+ tr->partitionData[model].weights[j] = exp(tr->partitionData[model].weightExponents[j]) / w;
+}
+
+static void optimizeWeights(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels)
+{
+ int
+ i;
+
+ double
+ initialLH = 0.0,
+ finalLH = 0.0;
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ initialLH = tr->likelihood;
+ //printf("W: %f %f [%f] ->", tr->perPartitionLH[0], tr->perPartitionLH[1], initialLH);
+
+ for(i = 0; i < 4; i++)
+ optParamGeneric(tr, modelEpsilon, ll, numberOfModels, i, -1000000.0, 200.0, LXWEIGHT_F);
+ //optLG4X_Weights(tr, ll, numberOfModels, i, modelEpsilon);
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ finalLH = tr->likelihood;
+
+ if(finalLH < initialLH)
+ printf("Final: %f initial: %f\n", finalLH, initialLH);
+ assert(finalLH >= initialLH);
+
+ //printf("%f %f [%f]\n", tr->perPartitionLH[0], tr->perPartitionLH[1], finalLH);
+}
+
+
+static void changeModelParameters(int index, int rateNumber, double value, int whichParameterType, tree *tr)
+{
+ switch(whichParameterType)
+ {
+ case RATE_F:
+ setRateModel(tr, index, value, rateNumber);
+ initReversibleGTR(tr, index);
+ break;
+ case ALPHA_F:
+ tr->partitionData[index].alpha = value;
+ makeGammaCats(tr->partitionData[index].alpha, tr->partitionData[index].gammaRates, 4, tr->useMedian);
+ break;
+ case FREQ_F:
+ {
+ int
+ states = tr->partitionData[index].states,
+ j;
+
+ double
+ w = 0.0;
+
+ tr->partitionData[index].freqExponents[rateNumber] = value;
+
+ for(j = 0; j < states; j++)
+ w += exp(tr->partitionData[index].freqExponents[j]);
+
+ for(j = 0; j < states; j++)
+ tr->partitionData[index].frequencies[j] = exp(tr->partitionData[index].freqExponents[j]) / w;
+
+ initReversibleGTR(tr, index);
+ }
+ break;
+ case LXRATE_F:
+ tr->partitionData[index].gammaRates[rateNumber] = value;
+ scaleLG4X_EIGN(tr, index);
+ break;
+ case LXWEIGHT_F:
+ updateWeights(tr, index, rateNumber, value);
+ scaleLG4X_EIGN(tr, index);
+ break;
+ default:
+ assert(0);
+ }
+}
+
+static void evaluateChange(tree *tr, int rateNumber, double *value, double *result, boolean* converged, int whichFunction, int numberOfModels, linkageList *ll, double modelEpsilon)
+{
+ int
+ i,
+ k,
+ pos;
+
+ for(i = 0, pos = 0; i < ll->entries; i++)
+ {
+ if(ll->ld[i].valid)
+ {
+ if(converged[pos])
+ {
+ //if parameter optimizations for this specific model have converged
+ //set executeModel to FALSE
+
+ for(k = 0; k < ll->ld[i].partitions; k++)
+ tr->executeModel[ll->ld[i].partitionList[k]] = FALSE;
+ }
+ else
+ {
+ for(k = 0; k < ll->ld[i].partitions; k++)
+ {
+ int
+ index = ll->ld[i].partitionList[k];
+
+ changeModelParameters(index, rateNumber, value[pos], whichFunction, tr);
+ }
+ }
+ pos++;
+ }
+ else
+ {
+ // if this partition is not being optimized anyway (e.g., we may be optimizing GTR rates for all DNA partitions,
+ // but there are also a couple of Protein partitions with fixed models like WAG, JTT, etc.) set executeModel to FALSE
+
+ for(k = 0; k < ll->ld[i].partitions; k++)
+ tr->executeModel[ll->ld[i].partitionList[k]] = FALSE;
+ }
+ }
+
+ assert(pos == numberOfModels);
+
+ //some error checks for individual model parameters
+
+ switch(whichFunction)
+ {
+ case RATE_F:
+ assert(rateNumber != -1);
+ break;
+ case ALPHA_F:
+ break;
+ case LXRATE_F:
+ assert(rateNumber != -1);
+ case LXWEIGHT_F:
+ assert(rateNumber != -1);
+ break;
+ case FREQ_F:
+ break;
+ default:
+ assert(0);
+ }
+
+ switch(whichFunction)
+ {
+ case RATE_F:
+ case ALPHA_F:
+ case LXRATE_F:
+ case FREQ_F:
+ case LXWEIGHT_F:
+ evaluateGeneric(tr, tr->start, TRUE);
+ break;
+ default:
+ assert(0);
+ }
+
+
+ //LIBRARY: need to switch over parallel regions here either call
+ //the one for the rates or for alpha!
+
+ //commented out evaluate below in the course of the LG4X integration
+ //evaluateGeneric(tr, tr->start, TRUE);
+
+ for(i = 0, pos = 0; i < ll->entries; i++)
+ {
+ if(ll->ld[i].valid)
+ {
+ result[pos] = 0.0;
+
+ for(k = 0; k < ll->ld[i].partitions; k++)
+ {
+ int
+ index = ll->ld[i].partitionList[k];
+
+ assert(tr->perPartitionLH[index] <= 0.0);
+
+ result[pos] -= tr->perPartitionLH[index];
+
+ }
+ pos++;
+ }
+
+ //set execute model for ALL partitions to true again
+ //for consistency
+
+ for(k = 0; k < ll->ld[i].partitions; k++)
+ {
+ int
+ index = ll->ld[i].partitionList[k];
+ tr->executeModel[index] = TRUE;
+ }
+ }
+
+ assert(pos == numberOfModels);
+}
+
+
+
+static void brentGeneric(double *ax, double *bx, double *cx, double *fb, double tol, double *xmin, double *result, int numberOfModels,
+ int whichFunction, int rateNumber, tree *tr, linkageList *ll, double *lim_inf, double *lim_sup)
+{
+ int iter, i;
+ double
+ *a = (double *)malloc(sizeof(double) * numberOfModels),
+ *b = (double *)malloc(sizeof(double) * numberOfModels),
+ *d = (double *)malloc(sizeof(double) * numberOfModels),
+ *etemp = (double *)malloc(sizeof(double) * numberOfModels),
+ *fu = (double *)malloc(sizeof(double) * numberOfModels),
+ *fv = (double *)malloc(sizeof(double) * numberOfModels),
+ *fw = (double *)malloc(sizeof(double) * numberOfModels),
+ *fx = (double *)malloc(sizeof(double) * numberOfModels),
+ *p = (double *)malloc(sizeof(double) * numberOfModels),
+ *q = (double *)malloc(sizeof(double) * numberOfModels),
+ *r = (double *)malloc(sizeof(double) * numberOfModels),
+ *tol1 = (double *)malloc(sizeof(double) * numberOfModels),
+ *tol2 = (double *)malloc(sizeof(double) * numberOfModels),
+ *u = (double *)malloc(sizeof(double) * numberOfModels),
+ *v = (double *)malloc(sizeof(double) * numberOfModels),
+ *w = (double *)malloc(sizeof(double) * numberOfModels),
+ *x = (double *)malloc(sizeof(double) * numberOfModels),
+ *xm = (double *)malloc(sizeof(double) * numberOfModels),
+ *e = (double *)malloc(sizeof(double) * numberOfModels);
+ boolean *converged = (boolean *)malloc(sizeof(boolean) * numberOfModels);
+ boolean allConverged;
+
+ for(i = 0; i < numberOfModels; i++)
+ converged[i] = FALSE;
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ e[i] = 0.0;
+ d[i] = 0.0;
+ }
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ a[i]=((ax[i] < cx[i]) ? ax[i] : cx[i]);
+ b[i]=((ax[i] > cx[i]) ? ax[i] : cx[i]);
+ x[i] = w[i] = v[i] = bx[i];
+ fw[i] = fv[i] = fx[i] = fb[i];
+ }
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ assert(a[i] >= lim_inf[i] && a[i] <= lim_sup[i]);
+ assert(b[i] >= lim_inf[i] && b[i] <= lim_sup[i]);
+ assert(x[i] >= lim_inf[i] && x[i] <= lim_sup[i]);
+ assert(v[i] >= lim_inf[i] && v[i] <= lim_sup[i]);
+ assert(w[i] >= lim_inf[i] && w[i] <= lim_sup[i]);
+ }
+
+
+
+ for(iter = 1; iter <= ITMAX; iter++)
+ {
+ allConverged = TRUE;
+
+ for(i = 0; i < numberOfModels && allConverged; i++)
+ allConverged = allConverged && converged[i];
+
+ if(allConverged)
+ {
+ free(converged);
+ free(a);
+ free(b);
+ free(d);
+ free(etemp);
+ free(fu);
+ free(fv);
+ free(fw);
+ free(fx);
+ free(p);
+ free(q);
+ free(r);
+ free(tol1);
+ free(tol2);
+ free(u);
+ free(v);
+ free(w);
+ free(x);
+ free(xm);
+ free(e);
+ return;
+ }
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ if(!converged[i])
+ {
+ assert(a[i] >= lim_inf[i] && a[i] <= lim_sup[i]);
+ assert(b[i] >= lim_inf[i] && b[i] <= lim_sup[i]);
+ assert(x[i] >= lim_inf[i] && x[i] <= lim_sup[i]);
+ assert(v[i] >= lim_inf[i] && v[i] <= lim_sup[i]);
+ assert(w[i] >= lim_inf[i] && w[i] <= lim_sup[i]);
+
+ xm[i] = 0.5 * (a[i] + b[i]);
+ tol2[i] = 2.0 * (tol1[i] = tol * fabs(x[i]) + BRENT_ZEPS);
+
+ if(fabs(x[i] - xm[i]) <= (tol2[i] - 0.5 * (b[i] - a[i])))
+ {
+ result[i] = -fx[i];
+ xmin[i] = x[i];
+ converged[i] = TRUE;
+ }
+ else
+ {
+ if(fabs(e[i]) > tol1[i])
+ {
+ r[i] = (x[i] - w[i]) * (fx[i] - fv[i]);
+ q[i] = (x[i] - v[i]) * (fx[i] - fw[i]);
+ p[i] = (x[i] - v[i]) * q[i] - (x[i] - w[i]) * r[i];
+ q[i] = 2.0 * (q[i] - r[i]);
+ if(q[i] > 0.0)
+ p[i] = -p[i];
+ q[i] = fabs(q[i]);
+ etemp[i] = e[i];
+ e[i] = d[i];
+ if((fabs(p[i]) >= fabs(0.5 * q[i] * etemp[i])) || (p[i] <= q[i] * (a[i]-x[i])) || (p[i] >= q[i] * (b[i] - x[i])))
+ d[i] = BRENT_CGOLD * (e[i] = (x[i] >= xm[i] ? a[i] - x[i] : b[i] - x[i]));
+ else
+ {
+ d[i] = p[i] / q[i];
+ u[i] = x[i] + d[i];
+ if( u[i] - a[i] < tol2[i] || b[i] - u[i] < tol2[i])
+ d[i] = SIGN(tol1[i], xm[i] - x[i]);
+ }
+ }
+ else
+ {
+ d[i] = BRENT_CGOLD * (e[i] = (x[i] >= xm[i] ? a[i] - x[i]: b[i] - x[i]));
+ }
+ u[i] = ((fabs(d[i]) >= tol1[i]) ? (x[i] + d[i]): (x[i] +SIGN(tol1[i], d[i])));
+ }
+
+ if(!converged[i])
+ assert(u[i] >= lim_inf[i] && u[i] <= lim_sup[i]);
+ }
+ }
+
+ evaluateChange(tr, rateNumber, u, fu, converged, whichFunction, numberOfModels, ll, tol);
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ if(!converged[i])
+ {
+ if(fu[i] <= fx[i])
+ {
+ if(u[i] >= x[i])
+ a[i] = x[i];
+ else
+ b[i] = x[i];
+
+ SHFT(v[i],w[i],x[i],u[i]);
+ SHFT(fv[i],fw[i],fx[i],fu[i]);
+ }
+ else
+ {
+ if(u[i] < x[i])
+ a[i] = u[i];
+ else
+ b[i] = u[i];
+
+ if(fu[i] <= fw[i] || w[i] == x[i])
+ {
+ v[i] = w[i];
+ w[i] = u[i];
+ fv[i] = fw[i];
+ fw[i] = fu[i];
+ }
+ else
+ {
+ if(fu[i] <= fv[i] || v[i] == x[i] || v[i] == w[i])
+ {
+ v[i] = u[i];
+ fv[i] = fu[i];
+ }
+ }
+ }
+
+ assert(a[i] >= lim_inf[i] && a[i] <= lim_sup[i]);
+ assert(b[i] >= lim_inf[i] && b[i] <= lim_sup[i]);
+ assert(x[i] >= lim_inf[i] && x[i] <= lim_sup[i]);
+ assert(v[i] >= lim_inf[i] && v[i] <= lim_sup[i]);
+ assert(w[i] >= lim_inf[i] && w[i] <= lim_sup[i]);
+ assert(u[i] >= lim_inf[i] && u[i] <= lim_sup[i]);
+ }
+ }
+ }
+
+ free(converged);
+ free(a);
+ free(b);
+ free(d);
+ free(etemp);
+ free(fu);
+ free(fv);
+ free(fw);
+ free(fx);
+ free(p);
+ free(q);
+ free(r);
+ free(tol1);
+ free(tol2);
+ free(u);
+ free(v);
+ free(w);
+ free(x);
+ free(xm);
+ free(e);
+
+ printf("\n. Too many iterations in BRENT !");
+ assert(0);
+}
+
+
+
+static int brakGeneric(double *param, double *ax, double *bx, double *cx, double *fa, double *fb,
+ double *fc, double *lim_inf, double *lim_sup,
+ int numberOfModels, int rateNumber, int whichFunction, tree *tr, linkageList *ll, double modelEpsilon)
+{
+ double
+ *ulim = (double *)malloc(sizeof(double) * numberOfModels),
+ *u = (double *)malloc(sizeof(double) * numberOfModels),
+ *r = (double *)malloc(sizeof(double) * numberOfModels),
+ *q = (double *)malloc(sizeof(double) * numberOfModels),
+ *fu = (double *)malloc(sizeof(double) * numberOfModels),
+ *dum = (double *)malloc(sizeof(double) * numberOfModels),
+ *temp = (double *)malloc(sizeof(double) * numberOfModels);
+
+ int
+ i,
+ *state = (int *)malloc(sizeof(int) * numberOfModels),
+ *endState = (int *)malloc(sizeof(int) * numberOfModels);
+
+ boolean *converged = (boolean *)malloc(sizeof(boolean) * numberOfModels);
+ boolean allConverged;
+
+ for(i = 0; i < numberOfModels; i++)
+ converged[i] = FALSE;
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ state[i] = 0;
+ endState[i] = 0;
+
+ u[i] = 0.0;
+
+ param[i] = ax[i];
+
+ if(param[i] > lim_sup[i])
+ param[i] = ax[i] = lim_sup[i];
+
+ if(param[i] < lim_inf[i])
+ param[i] = ax[i] = lim_inf[i];
+
+ assert(param[i] >= lim_inf[i] && param[i] <= lim_sup[i]);
+ }
+
+
+ evaluateChange(tr, rateNumber, param, fa, converged, whichFunction, numberOfModels, ll, modelEpsilon);
+
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ param[i] = bx[i];
+ if(param[i] > lim_sup[i])
+ param[i] = bx[i] = lim_sup[i];
+ if(param[i] < lim_inf[i])
+ param[i] = bx[i] = lim_inf[i];
+
+ assert(param[i] >= lim_inf[i] && param[i] <= lim_sup[i]);
+ }
+
+ evaluateChange(tr, rateNumber, param, fb, converged, whichFunction, numberOfModels, ll, modelEpsilon);
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ if (fb[i] > fa[i])
+ {
+ SHFT(dum[i],ax[i],bx[i],dum[i]);
+ SHFT(dum[i],fa[i],fb[i],dum[i]);
+ }
+
+ cx[i] = bx[i] + MNBRAK_GOLD * (bx[i] - ax[i]);
+
+ param[i] = cx[i];
+
+ if(param[i] > lim_sup[i])
+ param[i] = cx[i] = lim_sup[i];
+ if(param[i] < lim_inf[i])
+ param[i] = cx[i] = lim_inf[i];
+
+ assert(param[i] >= lim_inf[i] && param[i] <= lim_sup[i]);
+ }
+
+
+ evaluateChange(tr, rateNumber, param, fc, converged, whichFunction, numberOfModels, ll, modelEpsilon);
+
+ while(1)
+ {
+ allConverged = TRUE;
+
+ for(i = 0; i < numberOfModels && allConverged; i++)
+ allConverged = allConverged && converged[i];
+
+ if(allConverged)
+ {
+ for(i = 0; i < numberOfModels; i++)
+ {
+ if(ax[i] > lim_sup[i])
+ ax[i] = lim_sup[i];
+ if(ax[i] < lim_inf[i])
+ ax[i] = lim_inf[i];
+
+ if(bx[i] > lim_sup[i])
+ bx[i] = lim_sup[i];
+ if(bx[i] < lim_inf[i])
+ bx[i] = lim_inf[i];
+
+ if(cx[i] > lim_sup[i])
+ cx[i] = lim_sup[i];
+ if(cx[i] < lim_inf[i])
+ cx[i] = lim_inf[i];
+ }
+
+ free(converged);
+ free(ulim);
+ free(u);
+ free(r);
+ free(q);
+ free(fu);
+ free(dum);
+ free(temp);
+ free(state);
+ free(endState);
+ return 0;
+
+ }
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ if(!converged[i])
+ {
+ switch(state[i])
+ {
+ case 0:
+ endState[i] = 0;
+ if(!(fb[i] > fc[i]))
+ converged[i] = TRUE;
+ else
+ {
+
+ if(ax[i] > lim_sup[i])
+ ax[i] = lim_sup[i];
+ if(ax[i] < lim_inf[i])
+ ax[i] = lim_inf[i];
+ if(bx[i] > lim_sup[i])
+ bx[i] = lim_sup[i];
+ if(bx[i] < lim_inf[i])
+ bx[i] = lim_inf[i];
+ if(cx[i] > lim_sup[i])
+ cx[i] = lim_sup[i];
+ if(cx[i] < lim_inf[i])
+ cx[i] = lim_inf[i];
+
+ r[i]=(bx[i]-ax[i])*(fb[i]-fc[i]);
+ q[i]=(bx[i]-cx[i])*(fb[i]-fa[i]);
+ u[i]=(bx[i])-((bx[i]-cx[i])*q[i]-(bx[i]-ax[i])*r[i])/
+ (2.0*SIGN(MAX(fabs(q[i]-r[i]),MNBRAK_TINY),q[i]-r[i]));
+
+ ulim[i]=(bx[i])+MNBRAK_GLIMIT*(cx[i]-bx[i]);
+
+ if(u[i] > lim_sup[i])
+ u[i] = lim_sup[i];
+ if(u[i] < lim_inf[i])
+ u[i] = lim_inf[i];
+ if(ulim[i] > lim_sup[i])
+ ulim[i] = lim_sup[i];
+ if(ulim[i] < lim_inf[i])
+ ulim[i] = lim_inf[i];
+
+ if ((bx[i]-u[i])*(u[i]-cx[i]) > 0.0)
+ {
+ param[i] = u[i];
+ if(param[i] > lim_sup[i])
+ param[i] = u[i] = lim_sup[i];
+ if(param[i] < lim_inf[i])
+ param[i] = u[i] = lim_inf[i];
+ endState[i] = 1;
+ }
+ else
+ {
+ if ((cx[i]-u[i])*(u[i]-ulim[i]) > 0.0)
+ {
+ param[i] = u[i];
+ if(param[i] > lim_sup[i])
+ param[i] = u[i] = lim_sup[i];
+ if(param[i] < lim_inf[i])
+ param[i] = u[i] = lim_inf[i];
+ endState[i] = 2;
+ }
+ else
+ {
+ if ((u[i]-ulim[i])*(ulim[i]-cx[i]) >= 0.0)
+ {
+ u[i] = ulim[i];
+ param[i] = u[i];
+ if(param[i] > lim_sup[i])
+ param[i] = u[i] = ulim[i] = lim_sup[i];
+ if(param[i] < lim_inf[i])
+ param[i] = u[i] = ulim[i] = lim_inf[i];
+ endState[i] = 0;
+ }
+ else
+ {
+ u[i]=(cx[i])+MNBRAK_GOLD*(cx[i]-bx[i]);
+ param[i] = u[i];
+ endState[i] = 0;
+ if(param[i] > lim_sup[i])
+ param[i] = u[i] = lim_sup[i];
+ if(param[i] < lim_inf[i])
+ param[i] = u[i] = lim_inf[i];
+ }
+ }
+ }
+ }
+ break;
+ case 1:
+ endState[i] = 0;
+ break;
+ case 2:
+ endState[i] = 3;
+ break;
+ default:
+ assert(0);
+ }
+ assert(param[i] >= lim_inf[i] && param[i] <= lim_sup[i]);
+ }
+ }
+
+ evaluateChange(tr, rateNumber, param, temp, converged, whichFunction, numberOfModels, ll, modelEpsilon);
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ if(!converged[i])
+ {
+ switch(endState[i])
+ {
+ case 0:
+ fu[i] = temp[i];
+ SHFT(ax[i],bx[i],cx[i],u[i]);
+ SHFT(fa[i],fb[i],fc[i],fu[i]);
+ state[i] = 0;
+ break;
+ case 1:
+ fu[i] = temp[i];
+ if (fu[i] < fc[i])
+ {
+ ax[i]=(bx[i]);
+ bx[i]=u[i];
+ fa[i]=(fb[i]);
+ fb[i]=fu[i];
+ converged[i] = TRUE;
+ }
+ else
+ {
+ if (fu[i] > fb[i])
+ {
+ assert(u[i] >= lim_inf[i] && u[i] <= lim_sup[i]);
+ cx[i]=u[i];
+ fc[i]=fu[i];
+ converged[i] = TRUE;
+ }
+ else
+ {
+ u[i]=(cx[i])+MNBRAK_GOLD*(cx[i]-bx[i]);
+ param[i] = u[i];
+ if(param[i] > lim_sup[i]) {param[i] = u[i] = lim_sup[i];}
+ if(param[i] < lim_inf[i]) {param[i] = u[i] = lim_inf[i];}
+ state[i] = 1;
+ }
+ }
+ break;
+ case 2:
+ fu[i] = temp[i];
+ if (fu[i] < fc[i])
+ {
+ SHFT(bx[i],cx[i],u[i], cx[i]+MNBRAK_GOLD*(cx[i]-bx[i]));
+ state[i] = 2;
+ }
+ else
+ {
+ state[i] = 0;
+ SHFT(ax[i],bx[i],cx[i],u[i]);
+ SHFT(fa[i],fb[i],fc[i],fu[i]);
+ }
+ break;
+ case 3:
+ SHFT(fb[i],fc[i],fu[i], temp[i]);
+ SHFT(ax[i],bx[i],cx[i],u[i]);
+ SHFT(fa[i],fb[i],fc[i],fu[i]);
+ state[i] = 0;
+ break;
+ default:
+ assert(0);
+ }
+ }
+ }
+ }
+
+
+ assert(0);
+ free(converged);
+ free(ulim);
+ free(u);
+ free(r);
+ free(q);
+ free(fu);
+ free(dum);
+ free(temp);
+ free(state);
+ free(endState);
+
+
+
+ return(0);
+}
+
+
+/*******************************************************************************************************/
+/******** LG4X ***************************************************************************************/
+
+static void optLG4X(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels)
+{
+ int
+ i;
+
+ for(i = 0; i < 4; i++)
+ {
+ optParamGeneric(tr, modelEpsilon, ll, numberOfModels, i, LG4X_RATE_MIN, LG4X_RATE_MAX, LXRATE_F);
+ optimizeWeights(tr, modelEpsilon, ll, numberOfModels);
+ }
+}
+
+
+/**********************************************************************************************************/
+/* ALPHA PARAM ********************************************************************************************/
+
+
+
+//this function is required for implementing the LG4X model later-on
+
+static void optAlphasGeneric(tree *tr, double modelEpsilon, linkageList *ll)
+{
+ int
+ i,
+ non_LG4X_Partitions = 0,
+ LG4X_Partitions = 0;
+
+ /* assumes homogeneous super-partitions, that either contain DNA or AA partitions !*/
+ /* does not check whether AA are all linked */
+
+ /* first do non-LG4X partitions */
+
+ for(i = 0; i < ll->entries; i++)
+ {
+ switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+ {
+ case DNA_DATA:
+ case BINARY_DATA:
+ case SECONDARY_DATA:
+ case SECONDARY_DATA_6:
+ case SECONDARY_DATA_7:
+ case GENERIC_32:
+ case GENERIC_64:
+ ll->ld[i].valid = TRUE;
+ non_LG4X_Partitions++;
+ break;
+ case AA_DATA:
+ //to be implemented later-on
+ if(tr->partitionData[ll->ld[i].partitionList[0]].protModels == LG4X)
+ {
+ LG4X_Partitions++;
+ ll->ld[i].valid = FALSE;
+ }
+ else
+ {
+ ll->ld[i].valid = TRUE;
+ non_LG4X_Partitions++;
+ }
+ break;
+ default:
+ assert(0);
+ }
+ }
+
+
+
+ if(non_LG4X_Partitions > 0)
+ optParamGeneric(tr, modelEpsilon, ll, non_LG4X_Partitions, -1, ALPHA_MIN, ALPHA_MAX, ALPHA_F);
+
+
+
+
+ /* then LG4x partitions */
+
+ for(i = 0; i < ll->entries; i++)
+ {
+ switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+ {
+ case DNA_DATA:
+ case BINARY_DATA:
+ case SECONDARY_DATA:
+ case SECONDARY_DATA_6:
+ case SECONDARY_DATA_7:
+ case GENERIC_32:
+ case GENERIC_64:
+ ll->ld[i].valid = FALSE;
+ break;
+ case AA_DATA:
+ if(tr->partitionData[ll->ld[i].partitionList[0]].protModels == LG4X)
+ ll->ld[i].valid = TRUE;
+ else
+ ll->ld[i].valid = FALSE;
+ break;
+ default:
+ assert(0);
+ }
+ }
+
+ if(LG4X_Partitions > 0)
+ optLG4X(tr, modelEpsilon, ll, LG4X_Partitions);
+
+ for(i = 0; i < ll->entries; i++)
+ ll->ld[i].valid = TRUE;
+}
+
+
+static double minFreq(int index, int whichFreq, tree *tr, double absoluteMin)
+{
+ double
+ min = 0.0,
+ *w = tr->partitionData[index].freqExponents,
+ c = 0.0;
+
+ int
+ states = tr->partitionData[index].states,
+ i;
+
+ for(i = 0; i < states; i++)
+ if(i != whichFreq)
+ c += exp(w[i]);
+
+ min = log(FREQ_MIN) + log(c) - log (1.0 - FREQ_MIN);
+
+ if(0)
+ {
+ double
+ check = exp(min) / (exp(min) + c);
+
+ printf("check %f\n", check);
+
+ printf("min: %f \n", min);
+ }
+
+ return MAX(min, absoluteMin);
+}
+
+static double maxFreq(int index, int whichFreq, tree *tr, double absoluteMax)
+{
+ double
+ max = 0.0,
+ *w = tr->partitionData[index].freqExponents,
+ c = 0.0;
+
+ int
+ states = tr->partitionData[index].states,
+ i;
+
+ for(i = 0; i < states; i++)
+ if(i != whichFreq)
+ c += exp(w[i]);
+
+ max = log(1.0 - ((double)(states - 1) * FREQ_MIN)) + log(c) - log ((double)(states - 1) * FREQ_MIN);
+
+ if(0)
+ {
+ double
+ check = exp(max) / (exp(max) + c);
+
+ printf("check max %f\n", check);
+
+ printf("max: %f \n", max);
+ }
+
+ return MIN(max, absoluteMax);
+}
+
+
+static void optParamGeneric(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels, int rateNumber, double _lim_inf, double _lim_sup, int whichParameterType)
+{
+ int
+ l,
+ k,
+ j,
+ pos;
+
+ double
+ *startRates = (double *)malloc(sizeof(double) * numberOfModels * 4),
+ *startWeights = (double *)malloc(sizeof(double) * numberOfModels * 4),
+ *startExponents = (double *)malloc(sizeof(double) * numberOfModels * 4),
+ *startValues = (double *)malloc(sizeof(double) * numberOfModels),
+ *startLH = (double *)malloc(sizeof(double) * numberOfModels),
+ *endLH = (double *)malloc(sizeof(double) * numberOfModels),
+ *_a = (double *)malloc(sizeof(double) * numberOfModels),
+ *_b = (double *)malloc(sizeof(double) * numberOfModels),
+ *_c = (double *)malloc(sizeof(double) * numberOfModels),
+ *_fa = (double *)malloc(sizeof(double) * numberOfModels),
+ *_fb = (double *)malloc(sizeof(double) * numberOfModels),
+ *_fc = (double *)malloc(sizeof(double) * numberOfModels),
+ *_param = (double *)malloc(sizeof(double) * numberOfModels),
+ *_x = (double *)malloc(sizeof(double) * numberOfModels),
+ *lim_inf = (double *)malloc(sizeof(double) * numberOfModels),
+ *lim_sup = (double *)malloc(sizeof(double) * numberOfModels);
+
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+
+
+#ifdef _DEBUG_MOD_OPT
+ double
+ initialLH = tr->likelihood;
+#endif
+
+ /*
+ at this point here every worker has the traversal data it needs for the
+ search
+ */
+
+ for(l = 0, pos = 0; l < ll->entries; l++)
+ {
+ if(ll->ld[l].valid)
+ {
+ endLH[pos] = unlikely;
+ startLH[pos] = 0.0;
+
+ for(j = 0; j < ll->ld[l].partitions; j++)
+ {
+ int
+ index = ll->ld[l].partitionList[j];
+
+ startLH[pos] += tr->perPartitionLH[index];
+
+ switch(whichParameterType)
+ {
+ case ALPHA_F:
+ lim_inf[pos] = _lim_inf;
+ lim_sup[pos] = _lim_sup;
+ startValues[pos] = tr->partitionData[index].alpha;
+ break;
+ case RATE_F:
+ lim_inf[pos] = _lim_inf;
+ lim_sup[pos] = _lim_sup;
+ startValues[pos] = tr->partitionData[index].substRates[rateNumber];
+ break;
+ case FREQ_F:
+ lim_inf[pos] = minFreq(index, rateNumber, tr, _lim_inf);
+ lim_sup[pos] = maxFreq(index, rateNumber, tr, _lim_sup);
+ startValues[pos] = tr->partitionData[index].freqExponents[rateNumber];
+ break;
+ case LXRATE_F:
+ lim_inf[pos] = _lim_inf;
+ lim_sup[pos] = _lim_sup;
+ assert(rateNumber >= 0 && rateNumber < 4);
+ startValues[pos] = tr->partitionData[index].gammaRates[rateNumber];
+ memcpy(&startRates[pos * 4], tr->partitionData[index].gammaRates, 4 * sizeof(double));
+ memcpy(&startExponents[pos * 4], tr->partitionData[index].weightExponents, 4 * sizeof(double));
+ memcpy(&startWeights[pos * 4], tr->partitionData[index].weights, 4 * sizeof(double));
+ break;
+ case LXWEIGHT_F:
+ lim_inf[pos] = _lim_inf;
+ lim_sup[pos] = _lim_sup;
+ assert(rateNumber >= 0 && rateNumber < 4);
+ startValues[pos] = tr->partitionData[index].weightExponents[rateNumber];
+ break;
+ default:
+ assert(0);
+ }
+
+ }
+ pos++;
+ }
+ }
+
+ assert(pos == numberOfModels);
+
+ for(k = 0, pos = 0; k < ll->entries; k++)
+ {
+ if(ll->ld[k].valid)
+ {
+ _a[pos] = startValues[pos] + 0.1;
+ _b[pos] = startValues[pos] - 0.1;
+
+ if(_a[pos] < lim_inf[pos])
+ _a[pos] = lim_inf[pos];
+
+ if(_a[pos] > lim_sup[pos])
+ _a[pos] = lim_sup[pos];
+
+ if(_b[pos] < lim_inf[pos])
+ _b[pos] = lim_inf[pos];
+
+ if(_b[pos] > lim_sup[pos])
+ _b[pos] = lim_sup[pos];
+
+ pos++;
+ }
+ }
+
+ assert(pos == numberOfModels);
+
+ brakGeneric(_param, _a, _b, _c, _fa, _fb, _fc, lim_inf, lim_sup, numberOfModels, rateNumber, whichParameterType, tr, ll, modelEpsilon);
+
+ for(k = 0; k < numberOfModels; k++)
+ {
+ assert(_a[k] >= lim_inf[k] && _a[k] <= lim_sup[k]);
+ assert(_b[k] >= lim_inf[k] && _b[k] <= lim_sup[k]);
+ assert(_c[k] >= lim_inf[k] && _c[k] <= lim_sup[k]);
+ }
+
+ brentGeneric(_a, _b, _c, _fb, modelEpsilon, _x, endLH, numberOfModels, whichParameterType, rateNumber, tr, ll, lim_inf, lim_sup);
+
+ for(k = 0, pos = 0; k < ll->entries; k++)
+ {
+ if(ll->ld[k].valid)
+ {
+ if(startLH[pos] > endLH[pos])
+ {
+ //if the initial likelihood was better than the likelihodo after optimization, we set the values back
+ //to their original values
+
+ for(j = 0; j < ll->ld[k].partitions; j++)
+ {
+ int
+ index = ll->ld[k].partitionList[j];
+
+ changeModelParameters(index, rateNumber, startValues[pos], whichParameterType, tr);
+ }
+ }
+ else
+ {
+ //otherwise we set the value to the optimized value
+ //this used to be a bug in standard RAxML, before I fixed it
+ //I was not using _x[pos] as value that needs to be set
+
+ for(j = 0; j < ll->ld[k].partitions; j++)
+ {
+ int
+ index = ll->ld[k].partitionList[j];
+
+ changeModelParameters(index, rateNumber, _x[pos], whichParameterType, tr);
+ }
+ }
+ pos++;
+ }
+ }
+
+
+ //LIBRARY call the barrier here in the LIBRARY to update model params at all threads/processes !
+
+ assert(pos == numberOfModels);
+
+ free(startLH);
+ free(endLH);
+ free(_a);
+ free(_b);
+ free(_c);
+ free(_fa);
+ free(_fb);
+ free(_fc);
+ free(_param);
+ free(_x);
+ free(startValues);
+ free(startRates);
+ free(startWeights);
+ free(startExponents);
+ free(lim_inf);
+ free(lim_sup);
+
+#ifdef _DEBUG_MOD_OPT
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ if(tr->likelihood < initialLH)
+ printf("%f %f\n", tr->likelihood, initialLH);
+ assert(tr->likelihood >= initialLH);
+#endif
+
+}
+
+
+
+//******************** rate optimization functions ***************************************************/
+
+static void optFreqs(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels, int states)
+{
+ int
+ rateNumber;
+
+ double
+ freqMin = -1000000.0,
+ freqMax = 200.0;
+
+ for(rateNumber = 0; rateNumber < states; rateNumber++)
+ optParamGeneric(tr, modelEpsilon, ll, numberOfModels, rateNumber, freqMin, freqMax, FREQ_F);
+}
+
+static void optBaseFreqs(tree *tr, double modelEpsilon, linkageList *ll)
+{
+ int
+ i,
+ states,
+ dnaPartitions = 0,
+ aaPartitions = 0,
+ binaryPartitions = 0;
+
+ /* first do DNA */
+
+ for(i = 0; i < ll->entries; i++)
+ {
+ switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+ {
+ case DNA_DATA:
+ states = tr->partitionData[ll->ld[i].partitionList[0]].states;
+ if(tr->partitionData[ll->ld[i].partitionList[0]].optimizeBaseFrequencies)
+ {
+ ll->ld[i].valid = TRUE;
+ dnaPartitions++;
+ }
+ else
+ ll->ld[i].valid = FALSE;
+ break;
+ case AA_DATA:
+ case BINARY_DATA:
+ ll->ld[i].valid = FALSE;
+ break;
+ default:
+ assert(0);
+ }
+ }
+
+ if(dnaPartitions > 0)
+ optFreqs(tr, modelEpsilon, ll, dnaPartitions, states);
+
+ /* then AA */
+
+
+ for(i = 0; i < ll->entries; i++)
+ {
+ switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+ {
+ case AA_DATA:
+ states = tr->partitionData[ll->ld[i].partitionList[0]].states;
+ if(tr->partitionData[ll->ld[i].partitionList[0]].optimizeBaseFrequencies)
+ {
+ ll->ld[i].valid = TRUE;
+ aaPartitions++;
+ }
+ else
+ ll->ld[i].valid = FALSE;
+ break;
+ case DNA_DATA:
+ case BINARY_DATA:
+ ll->ld[i].valid = FALSE;
+ break;
+ default:
+ assert(0);
+ }
+ }
+
+ if(aaPartitions > 0)
+ optFreqs(tr, modelEpsilon, ll, aaPartitions, states);
+
+
+ //then binary
+
+ for(i = 0; i < ll->entries; i++)
+ {
+ switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+ {
+ case BINARY_DATA:
+ states = tr->partitionData[ll->ld[i].partitionList[0]].states;
+ if(tr->partitionData[ll->ld[i].partitionList[0]].optimizeBaseFrequencies)
+ {
+ ll->ld[i].valid = TRUE;
+ binaryPartitions++;
+ }
+ else
+ ll->ld[i].valid = FALSE;
+ break;
+ case DNA_DATA:
+ case AA_DATA:
+ ll->ld[i].valid = FALSE;
+ break;
+ default:
+ assert(0);
+ }
+ }
+
+ if(binaryPartitions > 0)
+ optFreqs(tr, modelEpsilon, ll, binaryPartitions, states);
+
+ for(i = 0; i < ll->entries; i++)
+ ll->ld[i].valid = TRUE;
+}
+
+
+//new version for optimizing rates, an external loop that iterates over the rates
+
+static void optRates(tree *tr, double modelEpsilon, linkageList *ll, int numberOfModels, int states)
+{
+ int
+ rateNumber,
+ numberOfRates = ((states * states - states) / 2) - 1;
+
+ for(rateNumber = 0; rateNumber < numberOfRates; rateNumber++)
+ optParamGeneric(tr, modelEpsilon, ll, numberOfModels, rateNumber, RATE_MIN, RATE_MAX, RATE_F);
+}
+
+
+static boolean AAisGTR(tree *tr)
+{
+ int i, count = 0;
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ {
+ if(tr->partitionData[i].dataType == AA_DATA)
+ {
+ count++;
+ if(tr->partitionData[i].protModels != GTR)
+ return FALSE;
+ }
+ }
+
+ if(count == 0)
+ return FALSE;
+
+ return TRUE;
+}
+
+static void optRatesGeneric(tree *tr, double modelEpsilon, linkageList *ll)
+{
+ int
+ i,
+ dnaPartitions = 0,
+ aaPartitions = 0,
+ states = -1;
+
+ /* assumes homogeneous super-partitions, that either contain DNA or AA partitions !*/
+ /* does not check whether AA are all linked */
+
+ /* first do DNA */
+
+ for(i = 0; i < ll->entries; i++)
+ {
+ switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+ {
+ case DNA_DATA:
+ states = tr->partitionData[ll->ld[i].partitionList[0]].states;
+ ll->ld[i].valid = TRUE;
+ dnaPartitions++;
+ break;
+ case BINARY_DATA:
+ case AA_DATA:
+ case SECONDARY_DATA:
+ case SECONDARY_DATA_6:
+ case SECONDARY_DATA_7:
+ case GENERIC_32:
+ case GENERIC_64:
+ ll->ld[i].valid = FALSE;
+ break;
+ default:
+ assert(0);
+ }
+ }
+
+ if(dnaPartitions > 0)
+ optRates(tr, modelEpsilon, ll, dnaPartitions, states);
+
+ /* then AA for GTR */
+
+ if(AAisGTR(tr))
+ {
+ for(i = 0; i < ll->entries; i++)
+ {
+ switch(tr->partitionData[ll->ld[i].partitionList[0]].dataType)
+ {
+ case AA_DATA:
+ states = tr->partitionData[ll->ld[i].partitionList[0]].states;
+ ll->ld[i].valid = TRUE;
+ aaPartitions++;
+ break;
+ case DNA_DATA:
+ case BINARY_DATA:
+ case SECONDARY_DATA:
+ case SECONDARY_DATA_6:
+ case SECONDARY_DATA_7:
+ ll->ld[i].valid = FALSE;
+ break;
+ default:
+ assert(0);
+ }
+ }
+
+ assert(aaPartitions == 1);
+
+ optRates(tr, modelEpsilon, ll, aaPartitions, states);
+ }
+
+ for(i = 0; i < ll->entries; i++)
+ ll->ld[i].valid = TRUE;
+}
+
+
+
+
+
+/*********************FUNCTIONS FOR APPROXIMATE MODEL OPTIMIZATION ***************************************/
+
+
+
+
+
+
+static int catCompare(const void *p1, const void *p2)
+{
+ rateCategorize *rc1 = (rateCategorize *)p1;
+ rateCategorize *rc2 = (rateCategorize *)p2;
+
+ double i = rc1->accumulatedSiteLikelihood;
+ double j = rc2->accumulatedSiteLikelihood;
+
+ if (i > j)
+ return (1);
+ if (i < j)
+ return (-1);
+ return (0);
+}
+
+
+static void categorizePartition(tree *tr, rateCategorize *rc, int model, int lower, int upper, double *patrat,
+ int *rateCategory /* temporary; used to be tr->rateCategory */
+ )
+{
+
+
+ int
+ zeroCounter,
+ i,
+ k;
+
+ double
+ diff,
+ min;
+
+ for (i = lower, zeroCounter = 0; i < upper; i++, zeroCounter++)
+ {
+ double
+ temp = patrat[i];
+
+ int
+ found = 0;
+
+ for(k = 0; k < tr->partitionData[model].numberOfCategories; k++)
+ {
+ if(temp == rc[k].rate || (fabs(temp - rc[k].rate) < 0.001))
+ {
+ found = 1;
+ rateCategory[i] = k;
+ break;
+ }
+ }
+
+ if(!found)
+ {
+ min = fabs(temp - rc[0].rate);
+ rateCategory[i] = 0;
+
+ for(k = 1; k < tr->partitionData[model].numberOfCategories; k++)
+ {
+ diff = fabs(temp - rc[k].rate);
+
+ if(diff < min)
+ {
+ min = diff;
+ rateCategory[i] = k;
+ }
+ }
+ }
+ }
+
+ for(k = 0; k < tr->partitionData[model].numberOfCategories; k++)
+ tr->partitionData[model].perSiteRates[k] = rc[k].rate;
+}
+
+
+
+
+static void optRateCatPthreads(tree *tr, double lower_spacing, double upper_spacing)
+{
+#ifdef _USE_OMP
+#pragma omp parallel
+#endif
+ {
+ int
+ m,
+ model,
+ maxModel;
+
+#ifdef _USE_OMP
+ maxModel = tr->maxModelsPerThread;
+#else
+ maxModel = tr->NumberOfModels;
+#endif
+
+ for(m = 0; m < maxModel; m++)
+ {
+ /* just defaults -> if partion wasn't assigned to this thread, it will be ignored later on */
+ size_t
+ width = 0,
+ offset = 0;
+
+#ifdef _USE_OMP
+ int
+ tid = omp_get_thread_num();
+
+ /* check if this thread should process this partition */
+ Assign*
+ pAss = tr->threadPartAssigns[tid * tr->maxModelsPerThread + m];
+
+ if(pAss)
+ {
+ model = pAss->partitionId;
+ width = pAss->width;
+ offset = pAss->offset;
+
+ assert(model < tr->NumberOfModels);
+ }
+ else
+ break;
+
+#else
+ model = m;
+
+ /* number of sites in this partition */
+ width = (size_t)tr->partitionData[model].width;
+ offset = 0;
+#endif
+
+ size_t
+ i;
+
+ pInfo
+ *partition = &(tr->partitionData[model]);
+
+ for( i = offset; i < offset + width; ++i)
+ {
+ double
+ initialRate,
+ initialLikelihood,
+ leftLH,
+ rightLH,
+ leftRate,
+ rightRate,
+ v;
+
+ const double
+ epsilon = 0.00001;
+
+ int
+ k;
+
+ initialRate = partition->patrat[i];
+
+ initialLikelihood = evaluatePartialGeneric(tr, i, initialRate, model); /* i is real i ??? */
+
+ leftLH = rightLH = initialLikelihood;
+ leftRate = rightRate = initialRate;
+
+ k = 1;
+
+ while((initialRate - k * lower_spacing > 0.0001) &&
+ ((v = evaluatePartialGeneric(tr, i, initialRate - k * lower_spacing, model))
+ > leftLH) &&
+ (fabs(leftLH - v) > epsilon))
+ {
+#ifndef WIN32
+ if(isnan(v))
+ assert(0);
+#endif
+
+ leftLH = v;
+ leftRate = initialRate - k * lower_spacing;
+ k++;
+ }
+
+ k = 1;
+
+ while(((v = evaluatePartialGeneric(tr, i, initialRate + k * upper_spacing, model)) > rightLH) &&
+ (fabs(rightLH - v) > epsilon))
+ {
+#ifndef WIN32
+ if(isnan(v))
+ assert(0);
+#endif
+ rightLH = v;
+ rightRate = initialRate + k * upper_spacing;
+ k++;
+ }
+
+ if(rightLH > initialLikelihood || leftLH > initialLikelihood)
+ {
+ if(rightLH > leftLH)
+ {
+ partition->patrat[i] = rightRate;
+ partition->lhs[i] = rightLH;
+ }
+ else
+ {
+ partition->patrat[i] = leftRate;
+ partition->lhs[i] = leftLH;
+ }
+ }
+ else
+ partition->lhs[i] = initialLikelihood;
+ }
+ }
+ }
+}
+
+
+
+
+
+/**
+ determines the weighted rates for each partition. Intended for use
+ with normalization of the CAT model rates.
+
+ Since information about rates and weights is distributed (each
+ process only has the respective info for the data assigned to it),
+ we have to communicate with peer processes. Notice, that
+ weightPerPart could actually be stored in a variable, since the
+ result does not change...
+
+ output:
+ weightPerPart_result -- the sum of weights per partition
+ weightedRates_result -- sum of rates per partition weighted by site weight
+
+*/
+static void getWeightsAndWeightedRates(const tree * const tr, int **weightPerPart_result, double **weightedRates_result )
+{
+ int
+ i,
+ *weightPerPart = (int *)NULL;
+
+ double
+ *weightedRates = (double *)NULL;
+
+ *weightPerPart_result = (int*)calloc((size_t)tr->NumberOfModels, sizeof(int));
+ *weightedRates_result = (double*) calloc((size_t)tr->NumberOfModels, sizeof(double));
+
+
+ weightedRates = *weightedRates_result;
+ weightPerPart = *weightPerPart_result;
+
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ {
+ size_t
+ j;
+
+ pInfo
+ *partition = &(tr->partitionData[i]);
+
+ for(j = 0; j < partition->width; ++j)
+ {
+ int
+ c = partition->rateCategory[j];
+
+ weightPerPart[i] += partition->wgt[j];
+ assert(0 <= c && c < tr->maxCategories);
+ weightedRates[i] += ((double)partition->wgt[j]) * partition->perSiteRates[c];
+ }
+ }
+
+ MPI_Allreduce(MPI_IN_PLACE, weightPerPart, tr->NumberOfModels, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
+ MPI_Allreduce(MPI_IN_PLACE, weightedRates, tr->NumberOfModels, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
+
+ for( i = 0; i < tr->NumberOfModels; ++i)
+ {
+ assert(weightPerPart[i] > 0 );
+ assert(weightedRates[i] > 0.0 );
+ }
+}
+
+
+/*
+ this used to be updatePerSiteRates without scaling. Previously,
+ updatePerSiteRates without scaling only conducted a check about
+ whether sites are scaled correctly.
+ */
+
+//Andre but isn't this checking that the rates have been scaled correctly?
+//shouldn't the assertions fail in this case, i.e., without scaling ?
+void checkPerSiteRates(const tree *const tr )
+{
+ int
+ i,
+ *weightPerPart = (int *)NULL;
+
+ double
+ *weightedRates = (double *)NULL;
+
+ /*
+ determine the sum of weights (weightPerPart) and the sum of all
+ rates of a partition weighted by site weights
+ */
+ getWeightsAndWeightedRates(tr, &weightPerPart, &weightedRates);
+
+ if(tr->numBranches > 1 )
+ {
+ /* check if the mean of rates of each partition is 1 */
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ {
+ double accRat = weightedRates[i] / (double)weightPerPart[i];
+ assert(fabs(accRat - 1.0) < 1e-5);
+ }
+ }
+ else
+ {
+ /* check, if the overall mean of rates is 1 */
+
+ double
+ accRat = 0.0,
+ accWgt = 0.0;
+
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ {
+ accRat += weightedRates[i];
+ accWgt += weightPerPart[i];
+ }
+ accRat /= (double)accWgt;
+
+ assert(fabs(accRat - 1.0) < 1e-5);
+ }
+
+ free(weightedRates);
+ free(weightPerPart);
+}
+
+
+/**
+ updatePerSiteRates is called after the master has categorized
+ rates into several categories and every process has obtained the
+ categorization for only the data assigned to it. Now, we still
+ have to scale the rates, s.t. they are 1 on average.
+
+ Thus, some communication is still needed to determine the total
+ weight and the weighted rates (because this information is
+ destributed).
+
+ Notice that this function previously had two modes (scaleRates =
+ {TRUE,FALSE}). Previously, scaleRates = FALSE, only performed a
+ check on whether rates are scaled correctly such that the average
+ rate is 1. For clarity, this functionality is now in a separate
+ function called checkPerSiteRates.
+*/
+static void updatePerSiteRates(tree *tr)
+{
+ int
+ i,
+ *weightPerPart = (int *)NULL;
+
+ double
+ *weightedRates = (double *)NULL;
+
+ getWeightsAndWeightedRates(tr, &weightPerPart, &weightedRates);
+
+ if(tr->numBranches > 1 )
+ {
+ /* scale each partition, s.t. average rate within the partition is 1 */
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ {
+ int j;
+ double scaler = weightedRates[i] / (double)weightPerPart[i];
+ scaler = 1.0 / scaler;
+ for(j = 0; j < tr->partitionData[i].numberOfCategories; ++j)
+ tr->partitionData[i].perSiteRates[j] *= scaler;
+ }
+ }
+ else
+ {
+ /* scale, s.t. average rate is 1 */
+
+ double
+ scaler = 0.0,
+ accWgt = 0.0;
+
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ {
+ scaler += weightedRates[i];
+ accWgt += weightPerPart[i];
+ }
+ scaler /= (double)accWgt;
+ scaler = 1.0 / scaler;
+
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ {
+ pInfo
+ *partition = &(tr->partitionData[i]);
+
+ int
+ j;
+
+ for(j = 0; j < partition->numberOfCategories; ++j)
+ partition->perSiteRates[j] *= scaler;
+ }
+ }
+
+ free(weightedRates);
+ free(weightPerPart);
+
+ /*
+ finally check, whether the rates are scaled correctly, s.t. their
+ mean is 1
+ */
+ checkPerSiteRates(tr);
+}
+
+
+
+/*
+ gathers optimized rates and the associated persite-lnls from all
+ processes at the master.
+
+ Notice that for instance tr->patrat_basePtr already contain all rate
+ data of a single process.
+
+ Output:
+ optRates_result -- (only at master) a pointer to an array of optimized rates (corresponds to what used to be tr->patratStored)
+ lnls_result -- (only at master) a pointer to an array with persite-lnls that correspond to the newly proposed rate (used to be tr->lhs)
+ */
+static void gatherOptimizedRates(tree *tr, double **optRates_result, double **lnls_result)
+{
+ /* determine counts and displacement for data for each processor */
+ int
+ *numPerProc = (int *)NULL,
+ *displPerProc = (int *)NULL;
+
+ calculateLengthAndDisplPerProcess(tr, &numPerProc, &displPerProc);
+
+ gatherDistributedArray( tr, (void**) optRates_result, tr->patrat_basePtr, MPI_DOUBLE, numPerProc, displPerProc);
+ gatherDistributedArray(tr , (void**) lnls_result, tr->lhs_basePtr, MPI_DOUBLE, numPerProc, displPerProc);
+
+ free(numPerProc);
+ free(displPerProc);
+}
+
+
+/*
+ The master creates rate categories and assigns the rate categories
+ to processes. Only executed by the master to assure consistent
+ categorization.
+
+ This code used to be the first part of of optimizeRateCategories()
+ and has only slightly been modified.
+
+ Input:
+ patrat -- a global array of optimized rates (used to be tr->patratStored)
+ lnls -- a global array of per-site lnls (used to be tr->lhs)
+
+ Output:
+ rateCategory_result -- a pointer to a global array of rate categories (used to be tr->rateCategory)
+
+ side effect:
+ tr->partitionData[i].perSiteRates gets computed in categorizePartition.
+
+ */
+static void categorizeTheRates(tree *tr, double *patrat, double *lnls, int maxCategories, int **rateCategory_result)
+{
+ int
+ model, i;
+
+ *rateCategory_result = (int*) calloc((size_t)tr->originalCrunchedLength, sizeof(int));
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ double
+ temp = 0.0;
+
+ int
+ where = 1,
+ found = 0,
+ width = tr->partitionData[model].upper - tr->partitionData[model].lower,
+ upper = tr->partitionData[model].upper,
+ lower = tr->partitionData[model].lower;
+
+ rateCategorize
+ *rc = (rateCategorize *)malloc(sizeof(rateCategorize) * width);
+
+ for (i = 0; i < width; i++)
+ {
+ rc[i].accumulatedSiteLikelihood = 0.0;
+ rc[i].rate = 0.0;
+ }
+
+ rc[0].accumulatedSiteLikelihood = lnls[lower];
+ rc[0].rate = patrat[lower];
+
+ for (i = lower + 1; i < upper; i++)
+ {
+ int k;
+
+ temp = patrat[i];
+ found = 0;
+
+ for(k = 0; k < where; k++)
+ {
+ if(temp == rc[k].rate || (fabs(temp - rc[k].rate) < 0.001))
+ {
+ found = 1;
+ rc[k].accumulatedSiteLikelihood += lnls[i];
+ break;
+ }
+ }
+
+ if(!found)
+ {
+ rc[where].rate = temp;
+ rc[where].accumulatedSiteLikelihood += lnls[i];
+ where++;
+ }
+ }
+
+ qsort(rc, where, sizeof(rateCategorize), catCompare);
+
+ if(where < maxCategories)
+ {
+ tr->partitionData[model].numberOfCategories = where;
+ categorizePartition(tr, rc, model, lower, upper, patrat, *rateCategory_result);
+ }
+ else
+ {
+ tr->partitionData[model].numberOfCategories = maxCategories;
+ categorizePartition(tr, rc, model, lower, upper, patrat, *rateCategory_result);
+ }
+
+ free(rc);
+ }
+}
+
+
+/* #define PRINT_RAT_CAT */
+
+/**
+ informs all peer processes about
+ * rateCategory
+ * numberOfCategories
+ * perSiteRates
+ of their data
+*/
+static void scatterProcessedRates(tree *tr, int *rateCategory)
+{
+ int
+ i,
+ *countPerProc = (int *)NULL,
+ *displPerProc = (int *)NULL,
+ *numCatPerPart = (int*) calloc((size_t)tr->NumberOfModels, sizeof(int));
+
+ if(processID == 0)
+ {
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ numCatPerPart[i] = tr->partitionData[i].numberOfCategories;
+ }
+ MPI_Bcast(numCatPerPart, tr->NumberOfModels, MPI_INT, 0,MPI_COMM_WORLD);
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ tr->partitionData[i].numberOfCategories = numCatPerPart[i];
+ free(numCatPerPart);
+
+ /* for simplicity, broad cast all peSiteRates */
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ MPI_Bcast(tr->partitionData[i].perSiteRates, tr->maxCategories, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+
+
+ /* prepare for scattering */
+ calculateLengthAndDisplPerProcess(tr, &countPerProc, &displPerProc);
+
+
+#ifdef PRINT_RAT_CAT
+ if(processID == 0)
+ {
+ printf("rates BEFORE: ");
+ for(i = 0; i < tr->originalCrunchedLength; ++i)
+ printf("%d,", rateCategory[i]);
+ printf("\n");
+ }
+#endif
+
+ scatterDistrbutedArray(tr, rateCategory, tr->rateCategory_basePtr, MPI_INT, countPerProc, displPerProc);
+
+#ifdef PRINT_RAT_CAT
+ int len = getMyCharacterLength(tr);
+ printf("basepointer AFTER: ");
+ for(i = 0; i < len ; ++i)
+ printf("%d,", tr->rateCategory_basePtr[i]);
+ printf("\n");
+#endif
+
+ free(countPerProc);
+ free(displPerProc);
+}
+
+
+
+/* backup for one partition */
+typedef struct
+{
+ double *patrat;
+ int *rateCategory;
+ double *perSiteRates;
+ int numberOfCategories;
+} RateBackup;
+
+
+/**
+ This function creates a backup of all data relevant for CAT-rate
+ assignment.
+
+ Previously, the backup info has been stored in patratStored. Or
+ maybe it is the otherway around and patrat was the backup, while
+ patratStored contained the actual optimized rates.
+
+ Output:
+ resultPtr -- contains the backup
+ */
+static void backupRates(tree *tr, RateBackup** resultPtr)
+{
+ int
+ i,
+ numCat = tr->maxCategories;
+
+ RateBackup
+ *backup;
+
+ *resultPtr = (RateBackup* ) calloc((size_t)tr->NumberOfModels, sizeof(RateBackup));
+
+ backup = *resultPtr;
+
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ {
+ pInfo
+ *partition = &(tr->partitionData[i]);
+ RateBackup
+ *bk = backup + i;
+
+ bk->patrat = (double*)calloc((size_t)partition->width, sizeof(double));
+ bk->perSiteRates = (double*) calloc((size_t)numCat, sizeof(double));
+ bk->rateCategory = (int*) calloc((size_t)partition->width, sizeof(int)) ;
+ bk->numberOfCategories = partition->numberOfCategories;
+
+ memcpy(bk->patrat, partition->patrat, sizeof(double) * (size_t)partition->width);
+ memcpy(bk->perSiteRates, partition->perSiteRates, sizeof(double) * (size_t)numCat);
+ memcpy(bk->rateCategory, partition->rateCategory, sizeof(int) * (size_t)partition->width);
+ }
+}
+
+
+static void restoreBackupRates(tree *tr , RateBackup *rb)
+{
+ int
+ numCat = tr->maxCategories,
+ i;
+
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ {
+ pInfo
+ *partition = &(tr->partitionData[i]);
+
+ RateBackup
+ *bk = rb + i;
+
+ partition->numberOfCategories = bk->numberOfCategories;
+
+ memcpy(partition->patrat, bk->patrat, sizeof(double) * (size_t)partition->width);
+ memcpy(partition->perSiteRates, bk->perSiteRates, sizeof(double) * (size_t)numCat);
+ memcpy(partition->rateCategory, bk->rateCategory, sizeof(int) * (size_t)partition->width);
+ }
+}
+
+
+
+static void deleteBackupRates(tree *tr, RateBackup** rbPtr)
+{
+ int i ;
+
+ for(i = 0; i< tr->NumberOfModels; ++i)
+ {
+ RateBackup
+ *rb = &((*rbPtr)[i]);
+
+ free(rb->patrat);
+ free(rb->perSiteRates);
+ free(rb->rateCategory);
+ }
+
+ free(*rbPtr);
+ rbPtr = (RateBackup **)NULL;
+}
+
+
+static void optimizeRateCategories(tree *tr, int _maxCategories)
+{
+ assert(_maxCategories > 0);
+
+ if(_maxCategories == 1)
+ return;
+
+
+ double
+ lower_spacing,
+ upper_spacing,
+ initialLH = tr->likelihood,
+ *optRates = (double*)NULL,
+ *lnls = (double*)NULL;
+
+ int
+ *rateCategory = (int *)NULL,
+ maxCategories = _maxCategories ;
+
+ RateBackup
+ *rateBackup = (RateBackup *)NULL;
+
+ assert(isTip(tr->start->number, tr->mxtips));
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ if(optimizeRateCategoryInvocations == 1)
+ {
+ lower_spacing = 0.5 / ((double)optimizeRateCategoryInvocations);
+ upper_spacing = 1.0 / ((double)optimizeRateCategoryInvocations);
+ }
+ else
+ {
+ lower_spacing = 0.05 / ((double)optimizeRateCategoryInvocations);
+ upper_spacing = 0.1 / ((double)optimizeRateCategoryInvocations);
+ }
+
+ if(lower_spacing < 0.001)
+ lower_spacing = 0.001;
+
+ if(upper_spacing < 0.001)
+ upper_spacing = 0.001;
+
+ optimizeRateCategoryInvocations++;
+
+ //store old rate category assignment
+ backupRates(tr, &rateBackup);
+
+ /* process specific: each process optimizes rates for data
+ assigned to it */
+ optRateCatPthreads(tr, lower_spacing, upper_spacing);
+
+ /* gather rates and lnls at the master */
+ gatherOptimizedRates(tr, &optRates, &lnls);
+
+ /* master has all necessary info now and can categorize the rates */
+ if(processID == 0)
+ {
+ categorizeTheRates(tr, optRates, lnls, maxCategories, &rateCategory );
+
+ /* only allocated at master */
+ free(optRates);
+ free(lnls);
+ }
+
+ scatterProcessedRates(tr, rateCategory );
+ if(processID == 0)
+ free(rateCategory);
+
+ /* every process has now new rates and a new category
+ assignment. However, we still have to scale the rates, such their
+ weighted mean rate is 1. */
+ updatePerSiteRates(tr);
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ if(tr->likelihood < initialLH)
+ {
+ restoreBackupRates(tr, rateBackup);
+
+ //Andre I don't understand the comment below ...
+ //can per-site rate scaling still be dis-abled in this version of the code?
+
+ /*
+ => Andre: I am afraid neither do I. Comparing it to the
+ original code, I think everything should be fine: we restore
+ the previous state and check, whether rates are scaled
+ correctly.
+ */
+
+ /* cannot do that any more here */
+ checkPerSiteRates(tr);
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ assert(initialLH == tr->likelihood);
+ }
+
+ deleteBackupRates(tr,&rateBackup);
+}
+
+
+
+
+/*****************************************************************************************************/
+
+void resetBranches(tree *tr)
+{
+ nodeptr p, q;
+ int nodes, i;
+
+ nodes = tr->mxtips + 3 * (tr->mxtips - 2);
+ p = tr->nodep[1];
+ while (nodes-- > 0)
+ {
+ for(i = 0; i < tr->numBranches; i++)
+ p->z[i] = defaultz;
+
+ q = p->next;
+ while(q != p)
+ {
+ for(i = 0; i < tr->numBranches; i++)
+ q->z[i] = defaultz;
+ q = q->next;
+ }
+ p++;
+ }
+}
+
+
+static void printAAmatrix(tree *tr, double epsilon)
+{
+ if(AAisGTR(tr))
+ {
+ int model;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ if(tr->partitionData[model].dataType == AA_DATA)
+ {
+ char gtrFileName[1024];
+ char epsilonStr[1024];
+ FILE *gtrFile;
+ double *rates = tr->partitionData[model].substRates;
+ double *f = tr->partitionData[model].frequencies;
+ double q[20][20];
+ int r = 0;
+ int i, j;
+
+ assert(tr->partitionData[model].protModels == GTR);
+
+ sprintf(epsilonStr, "%f", epsilon);
+
+ strcpy(gtrFileName, workdir);
+ strcat(gtrFileName, "RAxML_proteinGTRmodel.");
+ strcat(gtrFileName, run_id);
+ strcat(gtrFileName, "_");
+ strcat(gtrFileName, epsilonStr);
+
+ gtrFile = myfopen(gtrFileName, "wb");
+
+ for(i = 0; i < 20; i++)
+ for(j = 0; j < 20; j++)
+ q[i][j] = 0.0;
+
+ for(i = 0; i < 19; i++)
+ for(j = i + 1; j < 20; j++)
+ q[i][j] = rates[r++];
+
+ for(i = 0; i < 20; i++)
+ for(j = 0; j <= i; j++)
+ {
+ if(i == j)
+ q[i][j] = 0.0;
+ else
+ q[i][j] = q[j][i];
+ }
+
+ for(i = 0; i < 20; i++)
+ {
+ for(j = 0; j < 20; j++)
+ fprintf(gtrFile, "%1.80f ", q[i][j]);
+
+ fprintf(gtrFile, "\n");
+ }
+ for(i = 0; i < 20; i++)
+ fprintf(gtrFile, "%1.80f ", f[i]);
+ fprintf(gtrFile, "\n");
+
+ fclose(gtrFile);
+
+ printBothOpen("\nPrinted intermediate AA substitution matrix to file %s\n\n", gtrFileName);
+
+ break;
+ }
+
+ }
+ }
+}
+
+
+
+
+static void optModel(tree *tr, int numProteinModels, int *bestIndex, double *bestScores, boolean empiricalFreqs)
+{
+ int
+ i,
+ model;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ bestIndex[model] = -1;
+ bestScores[model] = unlikely;
+ }
+
+ for(i = 0; i < numProteinModels; i++)
+ {
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ if(tr->partitionData[model].protModels == AUTO)
+ {
+ if(empiricalFreqs)
+ tr->partitionData[model].protFreqs = 0;
+ else
+ tr->partitionData[model].protFreqs = 1;
+
+ assert(!tr->partitionData[model].optimizeBaseFrequencies);
+
+ tr->partitionData[model].autoProtModels = i;
+ initReversibleGTR(tr, model);
+ }
+ }
+
+ resetBranches(tr);
+ evaluateGeneric(tr, tr->start, TRUE);
+ treeEvaluate(tr, 0.5);
+
+ //if(processID == 0)
+ //printf("Subst Model %d Freqs: %s like %f %f\n", i, (empiricalFreqs == TRUE)?"empirical":"fixed", tr->likelihood, tr->perPartitionLH[0]);
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ if(tr->partitionData[model].protModels == AUTO)
+ {
+ /*
+ if(processID == 0)
+ {
+
+ int k;
+
+ for(k = 0; k < 20; k++)
+ printf("%f ", tr->partitionData[model].frequencies[k]);
+ printf("\n");
+ }
+ */
+
+ if(tr->perPartitionLH[model] > bestScores[model])
+ {
+ bestScores[model] = tr->perPartitionLH[model];
+ bestIndex[model] = i;
+ }
+ }
+ }
+ }
+}
+
+static void autoProtein(tree *tr, analdef *adef)
+{
+ int
+ countAutos = 0,
+ model;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ if(tr->partitionData[model].protModels == AUTO)
+ countAutos++;
+
+ if(countAutos > 0)
+ {
+ int
+ numProteinModels = AUTO,
+ *bestIndex = (int*)malloc(sizeof(int) * tr->NumberOfModels),
+ *oldIndex = (int*)malloc(sizeof(int) * tr->NumberOfModels),
+ *bestIndexEmpFreqs = (int*)malloc(sizeof(int) * tr->NumberOfModels);
+
+ boolean
+ *oldFreqs = (boolean*)malloc(sizeof(boolean) * tr->NumberOfModels);
+
+ double
+ startLH,
+ *bestScores = (double*)malloc(sizeof(double) * tr->NumberOfModels),
+ *bestScoresEmpFreqs = (double*)malloc(sizeof(double) * tr->NumberOfModels);
+
+ topolRELL_LIST
+ *rl = (topolRELL_LIST *)malloc(sizeof(topolRELL_LIST));
+
+ char
+ *autoModels[4] = {"ML", "BIC", "AIC", "AICc"};
+
+ initTL(rl, tr, 1);
+ saveTL(rl, tr, 0);
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ startLH = tr->likelihood;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ oldIndex[model] = tr->partitionData[model].autoProtModels;
+ oldFreqs[model] = tr->partitionData[model].protFreqs;
+ }
+
+ optModel(tr, numProteinModels, bestIndex, bestScores, FALSE);
+
+ optModel(tr, numProteinModels, bestIndexEmpFreqs, bestScoresEmpFreqs, TRUE);
+
+ printBothOpen("Automatic protein model assignment algorithm using %s criterion:\n\n", autoModels[tr->autoProteinSelectionType]);
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ if(tr->partitionData[model].protModels == AUTO)
+ {
+ int
+ bestIndexFixed = bestIndex[model],
+ bestIndexEmp = bestIndexEmpFreqs[model];
+
+ double
+ bestLhFixed = bestScores[model],
+ bestLhEmp = bestScoresEmpFreqs[model],
+ samples = 0.0,
+ freeParamsFixed = 0.0,
+ freeParamsEmp = 0.0;
+
+ samples = tr->partitionWeights[model];
+ //printf("Sample size %f\n", samples);
+ assert(samples != -1.0 && samples > 0.0);
+
+
+
+ //we always deal with comprehensive trees in ExaML
+ assert(tr->ntips == tr->mxtips);
+ freeParamsFixed = freeParamsEmp = (2 * tr->ntips - 3);
+ freeParamsEmp += 19.0;
+
+ switch(tr->rateHetModel)
+ {
+ case CAT:
+ freeParamsFixed += (double)tr->partitionData[model].numberOfCategories;
+ freeParamsEmp += (double)tr->partitionData[model].numberOfCategories;
+ break;
+ case GAMMA:
+ freeParamsFixed += 1.0;
+ freeParamsEmp += 1.0;
+ break;
+ case GAMMA_I:
+ freeParamsFixed += 2.0;
+ freeParamsEmp += 2.0;
+ break;
+ default:
+ assert(0);
+ }
+
+ switch(tr->autoProteinSelectionType)
+ {
+ case AUTO_ML:
+ if(bestLhFixed > bestLhEmp)
+ {
+ tr->partitionData[model].autoProtModels = bestIndexFixed;
+ tr->partitionData[model].protFreqs = 1;
+ }
+ else
+ {
+ tr->partitionData[model].autoProtModels = bestIndexEmp;
+ tr->partitionData[model].protFreqs = 0;
+ }
+ break;
+ case AUTO_BIC:
+ {
+ //BIC: -2 * lnL + k * ln(n)
+ double
+ bicFixed = -2.0 * bestLhFixed + freeParamsFixed * log(samples),
+ bicEmp = -2.0 * bestLhEmp + freeParamsEmp * log(samples);
+
+ if(bicFixed < bicEmp)
+ {
+ tr->partitionData[model].autoProtModels = bestIndexFixed;
+ tr->partitionData[model].protFreqs = 1;
+ }
+ else
+ {
+ tr->partitionData[model].autoProtModels = bestIndexEmp;
+ tr->partitionData[model].protFreqs = 0;
+ }
+ }
+ break;
+ case AUTO_AIC:
+ {
+ //AIC: 2 * (k - lnL)
+ double
+ aicFixed = 2.0 * (freeParamsFixed - bestLhFixed),
+ aicEmp = 2.0 * (freeParamsEmp - bestLhEmp);
+
+ if(aicFixed < aicEmp)
+ {
+ tr->partitionData[model].autoProtModels = bestIndexFixed;
+ tr->partitionData[model].protFreqs = 1;
+ }
+ else
+ {
+ tr->partitionData[model].autoProtModels = bestIndexEmp;
+ tr->partitionData[model].protFreqs = 0;
+ }
+ }
+ break;
+ case AUTO_AICC:
+ {
+ //AICc: AIC + (2 * k * (k + 1))/(n - k - 1)
+ double
+ aiccFixed,
+ aiccEmp;
+
+ /*
+ * Even though samples and freeParamsFixed are fp variables, they are actually integers.
+ * That's why we are comparing with a 0.5 threshold.
+ */
+
+ if(fabs(samples - freeParamsFixed - 1.0) < 0.5)
+ aiccFixed = 0.0;
+ else
+ aiccFixed = (2.0 * (freeParamsFixed - bestLhFixed)) + ((2.0 * freeParamsFixed * (freeParamsFixed + 1.0)) / (samples - freeParamsFixed - 1.0));
+
+ if(fabs(samples - freeParamsEmp - 1.0) < 0.5)
+ aiccEmp = 0.0;
+ else
+ aiccEmp = (2.0 * (freeParamsEmp - bestLhEmp)) + ((2.0 * freeParamsEmp * (freeParamsEmp + 1.0)) / (samples - freeParamsEmp - 1.0));
+
+ if(aiccFixed < aiccEmp)
+ {
+ tr->partitionData[model].autoProtModels = bestIndexFixed;
+ tr->partitionData[model].protFreqs = 1;
+ }
+ else
+ {
+ tr->partitionData[model].autoProtModels = bestIndexEmp;
+ tr->partitionData[model].protFreqs = 0;
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ initReversibleGTR(tr, model);
+ printBothOpen("\tPartition: %d best-scoring AA model: %s likelihood %f with %s base frequencies\n",
+ model, protModels[tr->partitionData[model].autoProtModels],
+ (tr->partitionData[model].protFreqs == 1)?bestLhFixed:bestLhEmp,
+ (tr->partitionData[model].protFreqs == 1)?"fixed":"empirical");
+
+ }
+ }
+
+ printBothOpen("\n\n");
+
+ resetBranches(tr);
+ evaluateGeneric(tr, tr->start, TRUE);
+ treeEvaluate(tr, 2.0);
+
+ //printf("exit %f\n", tr->likelihood);
+
+ if(tr->likelihood < startLH)
+ {
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ if(tr->partitionData[model].protModels == AUTO)
+ {
+ tr->partitionData[model].autoProtModels = oldIndex[model];
+ tr->partitionData[model].protFreqs = oldFreqs[model] ;
+ initReversibleGTR(tr, model);
+ }
+ }
+
+
+ restoreTL(rl, tr, 0);
+ evaluateGeneric(tr, tr->start, TRUE);
+ }
+
+ assert(tr->likelihood >= startLH);
+
+ freeTL(rl);
+ free(rl);
+
+ free(oldIndex);
+ free(bestIndex);
+ free(bestScores);
+ free(bestIndexEmpFreqs);
+ free(bestScoresEmpFreqs);
+ free(oldFreqs);
+ }
+}
+
+
+static void checkMatrixSymnmetriesAndLinkage(tree *tr, linkageList *ll)
+{
+ int
+ i;
+
+ for(i = 0; i < ll->entries; i++)
+ {
+ int
+ partitions = ll->ld[i].partitions;
+
+ if(partitions > 1)
+ {
+ int
+ k,
+ reference = ll->ld[i].partitionList[0];
+
+ for(k = 1; k < partitions; k++)
+ {
+ int
+ index = ll->ld[i].partitionList[k];
+
+ int
+ states = tr->partitionData[index].states,
+ rates = ((states * states - states) / 2);
+
+ if(tr->partitionData[reference].nonGTR != tr->partitionData[index].nonGTR)
+ assert(0);
+
+ if(tr->partitionData[reference].nonGTR)
+ {
+ int
+ j;
+
+ for(j = 0; j < rates; j++)
+ {
+ if(tr->partitionData[reference].symmetryVector[j] != tr->partitionData[index].symmetryVector[j])
+ assert(0);
+ }
+ }
+ }
+ }
+ }
+}
+
+
+static void checkTolerance(double l1, double l2)
+{
+ if(l1 < l2)
+ {
+ double
+ tolerance = fabs(MAX(l1, l2) * 0.000000000001);
+
+ if(fabs(l1 - l2) > MIN(0.1, tolerance))
+ {
+ printf("Likelihood problem in model optimization l1: %1.40f l2: %1.40f tolerance: %1.40f\n", l1, l2, tolerance);
+ assert(0);
+ }
+ }
+}
+
+void modOpt(tree *tr, double likelihoodEpsilon, analdef *adef, int treeIteration)
+{
+ int
+ i,
+ catOpt = 0,
+ *unlinked = (int *)malloc(sizeof(int) * tr->NumberOfModels);
+
+ double
+ inputLikelihood,
+ currentLikelihood,
+ modelEpsilon = 0.0001;
+
+ linkageList
+ *alphaList,
+ *rateList,
+ *freqList;
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ unlinked[i] = i;
+
+ //test code for library
+ if(0)
+ {
+ //assuming that we have three partitions for testing here
+
+ alphaList = initLinkageListString("0,1,2", tr);
+ rateList = initLinkageListString("0,1,1", tr);
+
+ init_Q_MatrixSymmetries("0,1,2,3,4,5", tr, 0);
+ init_Q_MatrixSymmetries("0,1,2,3,4,4", tr, 1);
+ init_Q_MatrixSymmetries("0,1,1,2,3,4", tr, 2);
+
+ //function that checks that partitions that have linked Q matrices as in our example above
+ //will not have different configurations of the Q matrix as set by the init_Q_MatrixSymmetries() function
+ //e.g., on would have HKY and one would have GTR, while the user claimes that they are linked
+ //in our example, the Q matrices of partitions 1 and 2 are linked
+ //but we set different matrix symmetries via
+ // init_Q_MatrixSymmetries("0,1,2,3,4,4", tr, 1);
+ // and
+ // init_Q_MatrixSymmetries("0,1,1,2,3,4", tr, 2);
+ //
+ //the function just let's assertions fail for the time being .....
+
+ checkMatrixSymnmetriesAndLinkage(tr, rateList);
+ }
+ else
+ {
+ alphaList = initLinkageList(unlinked, tr);
+ freqList = initLinkageList(unlinked, tr);
+ rateList = initLinkageListGTR(tr);
+ }
+
+ tr->start = tr->nodep[1];
+
+ if(adef->useCheckpoint && adef->mode == TREE_EVALUATION)
+ {
+ assert(ckp.state == MOD_OPT);
+
+ catOpt = ckp.catOpt;
+ }
+
+ inputLikelihood = tr->likelihood;
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+
+
+ assert(inputLikelihood == tr->likelihood);
+
+ do
+ {
+ if(adef->mode == TREE_EVALUATION)
+ {
+ ckp.state = MOD_OPT;
+
+ ckp.catOpt = catOpt;
+
+ ckp.treeIteration = treeIteration;
+
+ writeCheckpoint(tr, adef);
+ }
+
+ currentLikelihood = tr->likelihood;
+
+#ifdef _DEBUG_MOD_OPT
+ printf("start: %f\n", currentLikelihood);
+#endif
+
+ optRatesGeneric(tr, modelEpsilon, rateList);
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+#ifdef _DEBUG_MOD_OPT
+ printf("after rates %f\n", tr->likelihood);
+#endif
+
+ autoProtein(tr, adef);
+
+ treeEvaluate(tr, 0.0625);
+
+#ifdef _DEBUG_MOD_OPT
+ evaluateGeneric(tr, tr->start, TRUE);
+ printf("after br-len 1 %f\n", tr->likelihood);
+#endif
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ optBaseFreqs(tr, modelEpsilon, freqList);
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ treeEvaluate(tr, 0.0625);
+
+#ifdef _DEBUG_MOD_OPT
+ evaluateGeneric(tr, tr->start, TRUE);
+ printf("after optBaseFreqs 1 %f\n", tr->likelihood);
+#endif
+
+ switch(tr->rateHetModel)
+ {
+ case GAMMA:
+ optAlphasGeneric(tr, modelEpsilon, alphaList);
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+#ifdef _DEBUG_MOD_OPT
+ printf("after alphas %f\n", tr->likelihood);
+#endif
+ treeEvaluate(tr, 0.1);
+
+#ifdef _DEBUG_MOD_OPT
+ evaluateGeneric(tr, tr->start, TRUE);
+ printf("after br-len 2 %f\n", tr->likelihood);
+#endif
+
+ break;
+ case CAT:
+ if(catOpt < 3)
+ {
+ evaluateGeneric(tr, tr->start, TRUE);
+ optimizeRateCategories(tr, tr->categories);
+
+#ifdef _DEBUG_MOD_OPT
+ evaluateGeneric(tr, tr->start, TRUE);
+ printf("after cat-opt %f\n", tr->likelihood);
+#endif
+
+ catOpt++;
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ checkTolerance(tr->likelihood, currentLikelihood);
+
+ /*
+ if(tr->likelihood < currentLikelihood)
+ printf("%f %f\n", tr->likelihood, currentLikelihood);
+ assert(tr->likelihood >= currentLikelihood);
+ */
+
+ printAAmatrix(tr, fabs(currentLikelihood - tr->likelihood));
+ }
+ while(fabs(currentLikelihood - tr->likelihood) > likelihoodEpsilon);
+
+ free(unlinked);
+ freeLinkageList(freqList);
+ freeLinkageList(alphaList);
+ freeLinkageList(rateList);
+}
+
diff --git a/examl/partitionAssignment.c b/examl/partitionAssignment.c
new file mode 100644
index 0000000..3432ae2
--- /dev/null
+++ b/examl/partitionAssignment.c
@@ -0,0 +1,693 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <math.h>
+
+#include "partitionAssignment.h"
+
+extern int processID;
+
+
+void initializePartitionAssignment( PartitionAssignment **pAssPtr, pInfo **partitions, int numPart, int numProc)
+{
+ int
+ i;
+
+ PartitionAssignment
+ *pAss;
+
+ *pAssPtr = (PartitionAssignment*)calloc(1, sizeof(PartitionAssignment));
+
+ pAss = *pAssPtr;
+
+ pAss->numProc = numProc;
+ pAss->numPartitions = numPart;
+
+ pAss->partitions = (Partition *)calloc((size_t)pAss->numPartitions, sizeof(Partition));
+
+ for(i = 0; i < numPart; ++i)
+ {
+ Partition
+ *p = pAss->partitions + i;
+ p->id = i;
+ p->width = partitions[i]->upper - partitions[i]->lower;
+ p->type = partitions[i]->states;
+ }
+
+ pAss->assignPerProc = (Assignment **)calloc((size_t)pAss->numProc , sizeof(Assignment*));
+ pAss->numAssignPerProc = (int *)calloc((size_t)pAss->numProc, sizeof(int));
+}
+
+
+void deletePartitionAssignment(PartitionAssignment *pAss)
+{
+ int
+ i;
+
+ free(pAss->partitions);
+ for(i = 0; i < pAss->numProc; ++i)
+ free(pAss->assignPerProc[i]);
+ free(pAss->assignPerProc);
+ free(pAss);
+}
+
+
+static int partSort(const void *a, const void *b )
+{
+ return ((const Partition*)a)->width - ((const Partition*) b)->width ;
+}
+
+
+/**
+ helper function that executes the assignment of a partial assignment (only numElem character are assigned)
+ */
+static void assignPartitionPartial(PartitionAssignment *pa, Partition* p, int procId, int *numAssigned, size_t *sizeAssigned, size_t offset, size_t numElem)
+{
+ Assignment
+ *a;
+
+ int
+ newArrayLen;
+
+ ++numAssigned[procId];
+ ++pa->numAssignPerProc[procId];
+
+ newArrayLen = pa->numAssignPerProc[procId];
+
+ pa->assignPerProc[procId] = (Assignment*)realloc(pa->assignPerProc[procId], newArrayLen * sizeof(Assignment));
+
+ a = pa->assignPerProc[procId] + (newArrayLen-1);
+
+ a->offset = offset;
+ a->partId = p->id;
+ a->width = numElem;
+ sizeAssigned[procId] += numElem;
+}
+
+
+/**
+ helper function that executes the assignment of a full partition
+ */
+static void assignPartitionFull(PartitionAssignment* pa, Partition* p, int procId, int *numAssigned, size_t *sizeAssigned)
+{
+ assignPartitionPartial(pa, p, procId, numAssigned, sizeAssigned, 0, p->width);
+}
+
+
+/**
+ Request a process to which a part of the partition should
+ be assigned to.
+
+ At this stage there are processes that have one more partition
+ assignment than others. These have been categorized into two
+ stacks (high and low). For each stack, we have in iter variable
+ (of type int**) that gets decremented, if an element is removed
+ from the stack and a the start of the array that allows us to
+ determine when the stack is empty.
+
+ popAndYield tries to satisfy the request for a process with more
+ or less assignments (see wantLow), but will return any process, if
+ the request cannot be fulfilled.
+
+ If both queues are empty, it will return -1.
+ */
+static int popAndYield(int **procsHighIter, int *procsHighStart, int **procsLowIter, int *procsLowStart, boolean wantLow)
+{
+ boolean
+ fromHigh = FALSE,
+ fromLow = FALSE;
+
+ int
+ result = -1;
+
+ if(wantLow)
+ {
+ if(*procsLowIter - procsLowStart > 0)
+ fromLow = TRUE;
+ else
+ if(*procsHighIter - procsHighStart > 0)
+ fromHigh = TRUE;
+ }
+ else
+ {
+ if(*procsHighIter - procsHighStart > 0 )
+ fromHigh = TRUE;
+ else
+ if(*procsLowIter - procsLowStart > 0)
+ fromLow = TRUE;
+ }
+
+ if(fromHigh)
+ {
+ result = **procsHighIter;
+ --(*procsHighIter);
+ }
+ else
+ if(fromLow)
+ {
+ result = **procsLowIter;
+ --(*procsLowIter);
+ }
+
+ return result;
+}
+
+
+
+static void assignThesePartitions(PartitionAssignment* pa, Partition *partitions, int numCur)
+{
+ int
+ proc,
+ remainder, /* number of processes that receive 1 character less than other s */
+ i,
+ numFull = 0, /* number of processes that cannot take any more */
+ numLow = 0, /* */
+ *numAssigned = (int *)NULL, /* number of characters assigned to a process */
+ *procsHighIter = (int *)NULL, /* stack of processes that have one assignment more than others */
+ *procsLowIter = (int *)NULL, /* stack of processes that have one assignment less than others */
+ *procsLowStart = (int *)NULL, /* start of stack */
+ *procsHighStart = (int *)NULL, /* start of stack */
+ highProc, /* id of a process that potentially has more partitions assigned than others */
+ lowProc; /* id of a process that potentially has less partitions assigned than others */
+
+ size_t
+ totalElems = 0,
+ *sizeAssigned = (size_t *)NULL,
+ toAdd,
+ cap; /* defines a cap: once we have
+ assigned this many characters, the
+ remaining processes will get one
+ character less */
+
+ boolean
+ iterate = TRUE;
+
+ Partition
+ *partIter = partitions,
+ *partEnd = partitions +numCur;
+
+ /* The following implements Kassian's algorithm. Originally, his
+ algorithm consists of 5 phases */
+
+ /*
+ Sorts partitions according to their size. According to Kassians
+ algorithm, this step is NOT obligatory. However, if we do it, then
+ phase 3 (called "top-up" phase) is not necessary (this has been
+ clarified with Kassian).
+ */
+ qsort(partitions, numCur, sizeof(Partition), partSort);
+
+ for(i = 0; i < numCur; ++i)
+ totalElems += partitions[i].width;
+
+ cap = ceil( (double)totalElems / (double)pa->numProc );
+
+ remainder = cap * pa->numProc - totalElems;
+
+ assert(remainder >= 0 );
+
+ numAssigned = (int *)calloc((size_t)pa->numProc, sizeof(int));
+
+ sizeAssigned = (size_t *)calloc((size_t)pa->numProc, sizeof(size_t));
+
+ /* phase 2: initial distribution of full partitions to procesess. We
+ distribute full partitions until for the first time, we cannot
+ assign an entire partition any more, because this would exceed
+ the number of characters we want to assign to this process */
+ while(iterate)
+ {
+ for(proc = 0; proc < pa->numProc;++proc)
+ {
+ if(partIter < partEnd && sizeAssigned[proc] + partIter->width <= cap)
+ {
+ assignPartitionFull(pa, partIter, proc, numAssigned, sizeAssigned);
+
+ if(sizeAssigned[proc] == cap)
+ {
+ ++numFull;
+ if(numFull == pa->numProc - remainder)
+ --cap;
+ }
+
+ ++partIter;
+ }
+ else
+ {
+ numLow = numAssigned[proc];
+ iterate = FALSE;
+ break;
+ }
+ }
+ }
+
+ /* phase 3: top-up => not necessary because of previous sorting */
+
+
+ /*
+ phase 4: stick breaking
+
+ Here we partially assign the remaining partitions to processes
+ until every process has as many characters as it can take.
+ */
+
+
+ /* first categorize processes into two stacks, dependent on whether
+ they have gotton one more partition than others */
+ procsHighIter = (int*)calloc((size_t)pa->numProc + 1, sizeof(int));
+ procsLowIter = (int *)calloc((size_t)pa->numProc + 1, sizeof(int));
+ procsLowStart = procsLowIter;
+ procsHighStart = procsHighIter;
+
+
+ numFull = 0;
+
+ for(proc = 0; proc < pa->numProc; ++proc)
+ {
+ if(sizeAssigned[proc] < cap)
+ {
+ if(numAssigned[proc] == numLow)
+ {
+ ++procsLowIter;
+ *procsLowIter = proc;
+ }
+ else
+ {
+ ++procsHighIter;
+ *procsHighIter = proc;
+ }
+ }
+ else
+ ++numFull;
+ }
+
+ assert((procsHighIter - procsHighStart) + (procsLowIter - procsLowStart) + numFull == pa->numProc);
+
+ toAdd = (partIter < partEnd) ? partIter->width : 0 ;
+ highProc = popAndYield(&procsHighIter, procsHighStart, &procsLowIter, procsLowStart, FALSE);
+ lowProc = popAndYield(&procsHighIter, procsHighStart, &procsLowIter, procsLowStart, TRUE);
+
+
+ /*
+ now assign as long as there is something to assign. Once both
+ stacks are empty, popAndYield yields -1. This then breaks the
+ loop condition here.
+ */
+ while( ! (highProc == -1 && lowProc == -1
+ && (procsHighIter - procsHighStart <= 0 ) && (procsLowIter - procsLowStart <= 0 )) )
+ {
+ /* try to finish a assignments for a process that has many partitions */
+ if(highProc != -1 && sizeAssigned[highProc] + toAdd >= cap)
+ {
+ size_t
+ toTransfer = cap - sizeAssigned[highProc],
+ offset = partIter->width - toAdd;
+
+ assignPartitionPartial( pa, partIter, highProc, numAssigned, sizeAssigned, offset, toTransfer);
+
+ toAdd -= toTransfer;
+
+ if(toAdd == 0 && partIter < partEnd )
+ {
+ ++partIter;
+ toAdd = partIter < partEnd ? partIter->width : 0;
+ }
+ ++numFull;
+
+ if(numFull == pa->numProc - remainder)
+ --cap;
+
+ highProc = popAndYield(&procsHighIter, procsHighStart, &procsLowIter, procsLowStart, FALSE);
+ }
+ else
+ if(lowProc != -1)
+ {
+ /* assign the enitre remaining portion to a process that
+ still has fewer partitions */
+ if(sizeAssigned[lowProc] + toAdd < cap)
+ {
+ size_t
+ offset = partIter->width - toAdd;
+
+ assignPartitionPartial(pa, partIter, lowProc, numAssigned, sizeAssigned, offset, toAdd);
+
+ if(highProc != -1 )
+ {
+ ++procsHighIter;
+ *procsHighIter = highProc;
+ }
+
+ highProc = lowProc;
+
+ toAdd = 0;
+
+ if( partIter != partEnd )
+ {
+ ++partIter;
+ toAdd = partIter < partEnd ? partIter->width : 0 ;
+ }
+
+ lowProc = popAndYield(&procsHighIter, procsHighStart, &procsLowIter, procsLowStart, TRUE);
+ }
+ else
+ {
+ /* assign as much as possible to a process with less
+ partitions (the rest probably needs to be assigned
+ to the next process) */
+
+ size_t
+ toTransfer = cap - sizeAssigned[lowProc],
+ offset = partIter->width - toAdd;
+
+ assignPartitionPartial(pa, partIter, lowProc, numAssigned, sizeAssigned, offset, toTransfer);
+
+ toAdd -= toTransfer;
+
+ if(toAdd == 0 && partIter < partEnd)
+ {
+ ++partIter;
+ toAdd = partIter < partEnd ? partIter->width : 0;
+ }
+
+ ++numFull;
+ if(numFull == pa->numProc - remainder)
+ --cap ;
+
+ lowProc = popAndYield(&procsHighIter, procsHighStart, &procsLowIter, procsLowStart, FALSE);
+ }
+ }
+ else
+ {
+ /* should not occurr, but I am not entirely happly with
+ this assert. */
+ assert(0);
+ }
+ }
+
+ assert(toAdd == 0 );
+ assert(partIter == partEnd);
+
+ free(numAssigned);
+ free(sizeAssigned);
+}
+
+/**
+ Assigns all partitions. Notice that for each data type (currently
+ only AA and DNA), we execute the algorithm separately. Thus, in
+ the worst case imbalances for AA and DNA could hit the same
+ processes (but this is probably not worth bothering).
+ */
+void assign(PartitionAssignment *pa)
+{
+ int
+ partitionsHandled = 0,
+ curType = -1,
+ j,
+ i;
+
+ /*
+ only handling 3 types (BIN, DNA, AA) at the moment. Please adapt,
+ when the number of types increases.
+ */
+ int
+ types[3] = { 2, 4, 20 };
+
+ for(j = 0; j < 3 ; ++j)
+ {
+ size_t
+ cnt;
+
+ Partition
+ *curPartitions = (Partition *)NULL;
+
+ curType = types[j];
+
+ /* count number of type */
+ cnt = 0;
+ for(i = 0; i < pa->numPartitions; ++i)
+ {
+ if(pa->partitions[i].type == curType)
+ ++cnt;
+ }
+
+ if(cnt == 0 )
+ continue;
+
+ curPartitions = (Partition*)calloc((size_t)cnt, sizeof(Partition));
+
+ cnt = 0;
+ for(i = 0; i< pa->numPartitions; ++i)
+ {
+ if(pa->partitions[i].type == curType)
+ curPartitions[cnt++] = pa->partitions[i];
+ }
+
+ assignThesePartitions(pa, curPartitions, cnt);
+ free(curPartitions);
+
+ partitionsHandled += cnt;
+ }
+
+ assert(partitionsHandled == pa->numPartitions);
+}
+
+
+
+
+void printAssignment(Assignment a, int procid)
+{
+ printf("p: %d\t(%lu,%lu) -> proc %d\n", a.partId, a.offset, a.width , procid);
+}
+
+
+void printAssignments(PartitionAssignment *pa)
+{
+ int i,j;
+ printf("proc\toffset\tlength\tpart\n");
+ for(i = 0; i < pa->numProc; ++i)
+ {
+ for(j = 0; j < pa->numAssignPerProc[i] ; ++j)
+ {
+ Assignment a = pa->assignPerProc[i][j];
+ printf("%d\t%lu\t%lu\t%d\n", i,a.offset, a.width, a.partId);
+ }
+ }
+}
+
+
+void printLoad(PartitionAssignment *pa)
+{
+ int
+ i,
+ j,
+ *numsPerProc = (int *)calloc((size_t)pa->numProc, sizeof(int));
+
+ size_t
+ *sitesPerProc = (size_t *)calloc((size_t)pa->numProc, sizeof(size_t));
+
+ for(i = 0; i< pa->numProc; ++i)
+ {
+ for(j = 0; j < pa->numAssignPerProc[i]; ++j)
+ {
+ Assignment a = pa->assignPerProc[i][j];
+ sitesPerProc[i] += a.width;
+ ++numsPerProc[i];
+ }
+ }
+
+ printf("#proc\t#part\t#sites\n");
+ for( i = 0; i < pa->numProc ; ++i)
+ printf("%d\t%d\t%lu\n", i, numsPerProc[i], sitesPerProc[i]);
+
+ free(numsPerProc);
+ free(sitesPerProc);
+}
+
+
+
+/**
+ allocates global arrays for CAT and sets the pointers in each pInfo
+ instance.
+ */
+static void setupBasePointersInTree(tree *tr)
+{
+ size_t
+ len = 0;
+ int
+ i;
+
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ len += tr->partitionData[i].width ;
+
+ tr->patrat_basePtr = (double*) calloc((size_t)len, sizeof(double));
+ tr->rateCategory_basePtr = (int*) calloc((size_t)len, sizeof(int));
+ tr->lhs_basePtr = (double*) calloc((size_t)len, sizeof(double));
+
+ len = 0;
+ for(i = 0; i < tr->NumberOfModels; ++i)
+ {
+ if(tr->partitionData[i].width > 0)
+ {
+ tr->partitionData[i].rateCategory = tr->rateCategory_basePtr + len;
+ tr->partitionData[i].patrat = tr->patrat_basePtr + len;
+ tr->partitionData[i].lhs = tr->lhs_basePtr + len ;
+ }
+ else
+ {
+ tr->partitionData[i].rateCategory = (int *)NULL;
+ tr->partitionData[i].patrat = (double*)NULL;
+ tr->partitionData[i].lhs = (double*)NULL;
+ }
+
+ len += tr->partitionData[i].width;
+ }
+}
+
+
+
+static int sortById(const void *a, const void *b)
+{
+ return ((Assign*) a)->partitionId - ((Assign*) b)->partitionId ;
+}
+
+
+
+void copyAssignmentInfoToTree(PartitionAssignment *pa, tree *tr)
+{
+ int
+ i,
+ numAssign = 0;
+
+ Assign
+ *assIter;
+
+ for(i = 0; i < pa->numProc; ++i)
+ numAssign += pa->numAssignPerProc[i];
+
+ /* copy the partition assignment to the tree structure */
+
+ tr->numAssignments = numAssign;
+ tr->partAssigns = (Assign *)calloc((size_t)numAssign, sizeof(Assign));
+
+ assIter = tr->partAssigns;
+
+ for(i = 0; i < pa->numProc; ++i)
+ {
+ int j;
+ for( j = 0 ; j < pa->numAssignPerProc[i]; ++j)
+ {
+ Assignment *ass = pa->assignPerProc[i] + j;
+ assIter->procId = i;
+ assIter->offset = ass->offset;
+ assIter->width = ass->width;
+ assIter->partitionId = ass->partId;
+ ++assIter;
+ }
+ }
+
+ /*
+ the sorting makes it easier to deal with the assignments later in
+ case of a gather/scatter at the master. Thus, we do not need to
+ jump around in the array that be obtained or send to a process,
+ because we are sure that the data is ordered the same we as we
+ obtained.
+ */
+ qsort(tr->partAssigns, tr->numAssignments, sizeof(Assign), sortById);
+
+ if(tr->rateHetModel == CAT)
+ setupBasePointersInTree( tr);
+}
+
+#ifdef _USE_OMP
+void copyThreadAssignmentInfoToTree(PartitionAssignment *pa, tree *tr)
+{
+ int
+ i, j;
+
+ /* we want to know max number of partitions assigned to a single thread -> mainly for memory allocation */
+
+ int
+ *numsPerProc = (int *)calloc((size_t)pa->numProc, sizeof(int)),
+ *numsPerPart = (int *)calloc((size_t)pa->numPartitions, sizeof(int));
+
+ for(i = 0; i< pa->numProc; ++i)
+ {
+ for(j = 0; j < pa->numAssignPerProc[i]; ++j)
+ {
+ Assignment *a = &pa->assignPerProc[i][j];
+ ++numsPerProc[i];
+ ++numsPerPart[a->partId];
+ }
+ }
+
+ int
+ pmax = 0;
+
+ for(i = 1; i< pa->numProc; ++i)
+ {
+ if (numsPerProc[i] > numsPerProc[pmax])
+ pmax = i;
+ }
+
+ /* save max partition count to the tree structure */
+ tr->maxModelsPerThread = numsPerProc[pmax];
+
+ assert(tr->maxModelsPerThread > 0 && tr->maxModelsPerThread <= pa->numPartitions);
+
+ pmax = 0;
+ for(i = 1; i< pa->numPartitions; ++i)
+ {
+ if (numsPerPart[i] > numsPerPart[pmax])
+ pmax = i;
+ }
+
+ /* save max threads count to the tree structure */
+ tr->maxThreadsPerModel = numsPerPart[pmax];
+
+ assert(tr->maxThreadsPerModel > 0 && tr->maxThreadsPerModel <= pa->numProc);
+
+ free(numsPerProc);
+ free(numsPerPart);
+
+ printf("\n maxModelsPerThread: %d, maxThreadsPerModel: %d\n", tr->maxModelsPerThread, tr->maxThreadsPerModel);
+
+ /* copy the partition assignment to the tree structure */
+
+ int
+ threadPartSize = pa->numProc * tr->maxModelsPerThread,
+ partThreadSize = pa->numPartitions * tr->maxThreadsPerModel;
+
+ tr->threadPartAssigns = (Assign **)calloc((size_t)threadPartSize, sizeof(Assign*));
+ tr->partThreadAssigns = (Assign **)calloc((size_t)partThreadSize, sizeof(Assign*));
+
+ for(i = 0; i < pa->numProc; ++i)
+ {
+ int
+ partCount = 0;
+
+ for( j = 0 ; j < pa->numAssignPerProc[i]; ++j)
+ {
+ Assignment *ass = pa->assignPerProc[i] + j;
+ Assign* pTreeAss = (Assign *)calloc(1, sizeof(Assign));
+ pTreeAss->procId = i;
+ pTreeAss->offset = ass->offset;
+ pTreeAss->width = ass->width;
+ pTreeAss->partitionId = ass->partId;
+
+ size_t
+ ind = i * tr->maxModelsPerThread + partCount;
+
+ assert(ind < (i+1) * tr->maxModelsPerThread);
+
+ tr->threadPartAssigns[ind] = pTreeAss;
+ ++partCount;
+
+ ind = ass->partId * tr->maxThreadsPerModel;
+ while (tr->partThreadAssigns[ind])
+ ++ind;
+
+ assert( ind < (ass->partId+1) * tr->maxThreadsPerModel);
+
+ tr->partThreadAssigns[ind] = pTreeAss;
+ }
+ }
+}
+#endif
diff --git a/examl/partitionAssignment.h b/examl/partitionAssignment.h
new file mode 100644
index 0000000..22f74d4
--- /dev/null
+++ b/examl/partitionAssignment.h
@@ -0,0 +1,64 @@
+#ifndef _PARTITIION_ASSIGNMENT
+#define _PARTITIION_ASSIGNMENT
+
+#include "axml.h"
+
+#define not !
+
+
+typedef struct
+{
+ int partId;
+ size_t width;
+ size_t offset;
+} Assignment;
+
+
+typedef struct
+{
+ int id;
+ size_t width;
+ int type;
+} Partition;
+
+
+typedef struct
+{
+ int numProc;
+ int numPartitions;
+ Partition *partitions;
+ Assignment **assignPerProc; /* procid -> array of assignments */
+ int *numAssignPerProc; /* procid -> size of above array */
+} PartitionAssignment;
+
+/*
+ constructor
+*/
+void initializePartitionAssignment( PartitionAssignment **pAssPtr, pInfo **partitions, int numPart, int numProc);
+/*
+ deletor
+ */
+void deletePartitionAssignment(PartitionAssignment *pAss);
+/*
+ assign partitions to all proceses
+ */
+void assign(PartitionAssignment *pa);
+/*
+ prints a single assignment
+ */
+void printAssignment(Assignment a, int procid);
+/*
+ calculates and prints the load (number of partitions and number of
+ sites) for each process
+ */
+void printLoad(PartitionAssignment *pa);
+
+void copyAssignmentInfoToTree(PartitionAssignment *pa, tree *tr);
+
+#ifdef _USE_OMP
+void copyThreadAssignmentInfoToTree(PartitionAssignment *pa, tree *tr);
+#endif
+
+void printAssignments(PartitionAssignment *pa);
+
+#endif
diff --git a/examl/quartets.c b/examl/quartets.c
new file mode 100644
index 0000000..cabea69
--- /dev/null
+++ b/examl/quartets.c
@@ -0,0 +1,615 @@
+/* RAxML-HPC, a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright March 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * stamatak at ics.forth.gr
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis: "An Efficient Program for phylogenetic Inference Using Simulated Annealing".
+ * Proceedings of IPDPS2005, Denver, Colorado, April 2005.
+ *
+ * AND
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#include <limits.h>
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include "axml.h"
+
+extern double masterTime;
+extern char workdir[1024];
+extern char run_id[128];
+extern char quartetGroupingFileName[1024];
+extern char quartetFileName[1024];
+extern checkPointState ckp;
+extern int processID;
+
+/* a parser error function */
+
+static void parseError(int c)
+{
+ printBothOpen("Quartet grouping parser expecting symbol: %c\n", c);
+ assert(0);
+}
+
+/* parser for the taxon grouping format, one has to specify 4 groups in a newick-like
+ format from which quartets (a substantially smaller number compared to ungrouped quartets)
+ will be drawn */
+
+static void groupingParser(char *quartetGroupFileName, int *groups[4], int groupSize[4], tree *tr)
+{
+ FILE
+ *f = myfopen(quartetGroupFileName, "r");
+
+ int
+ taxonCounter = 0,
+ n,
+ state = 0,
+ groupCounter = 0,
+ ch,
+ i;
+
+ printBothOpen("%s\n", quartetGroupFileName);
+
+ for(i = 0; i < 4; i++)
+ {
+ groups[i] = (int*)malloc(sizeof(int) * (tr->mxtips + 1));
+ groupSize[i] = 0;
+ }
+
+ while((ch = getc(f)) != EOF)
+ {
+ if(!whitechar(ch))
+ {
+ switch(state)
+ {
+ case 0:
+ if(ch != '(')
+ parseError('(');
+ state = 1;
+ break;
+ case 1:
+ ungetc(ch, f);
+ n = treeFindTipName(f, tr, FALSE);
+ if(n <= 0 || n > tr->mxtips)
+ printBothOpen("parsing error, raxml is expecting to read a taxon name, found \"%c\" instead\n", ch);
+ assert(n > 0 && n <= tr->mxtips);
+ taxonCounter++;
+ groups[groupCounter][groupSize[groupCounter]] = n;
+ groupSize[groupCounter] = groupSize[groupCounter] + 1;
+ state = 2;
+ break;
+ case 2:
+ if(ch == ',')
+ state = 1;
+ else
+ {
+ if(ch == ')')
+ {
+ groupCounter++;
+ state = 3;
+ }
+ else
+ parseError('?');
+ }
+ break;
+ case 3:
+ if(groupCounter == 4)
+ {
+ if(ch == ';')
+ state = 4;
+ else
+ parseError(';');
+ }
+ else
+ {
+ if(ch != ',')
+ parseError(',');
+ state = 0;
+ }
+ break;
+ case 4:
+ printBothOpen("Error: extra char after ; %c\n", ch);
+ assert(0);
+ default:
+ assert(0);
+ }
+ }
+ }
+
+ assert(state == 4);
+ assert(groupCounter == 4);
+ assert(taxonCounter == tr->mxtips);
+
+ printBothOpen("Successfully parsed quartet groups\n\n");
+
+ /* print out the taxa that have been assigned to the 4 groups */
+
+ for(i = 0; i < 4; i++)
+ {
+ int
+ j;
+
+ printBothOpen("group %d has %d members\n", i, groupSize[i]);
+
+ for(j = 0; j < groupSize[i]; j++)
+ printBothOpen("%s\n", tr->nameList[groups[i][j]]);
+
+ printBothOpen("\n");
+ }
+
+ fclose(f);
+}
+
+/*****************************/
+
+static void nniSmooth(tree *tr, nodeptr p, int maxtimes)
+{
+ int
+ i;
+
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionConverged[i] = FALSE;
+
+ while (--maxtimes >= 0)
+ {
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionSmoothed[i] = TRUE;
+
+ assert(!isTip(p->number, tr->mxtips));
+ assert(!isTip(p->back->number, tr->mxtips));
+
+ update(tr, p);
+
+ update(tr, p->next);
+
+ update(tr, p->next->next);
+
+ update(tr, p->back->next);
+
+ update(tr, p->back->next->next);
+
+ if (allSmoothed(tr))
+ break;
+ }
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ tr->partitionSmoothed[i] = FALSE;
+ tr->partitionConverged[i] = FALSE;
+ }
+}
+
+
+
+
+
+static double quartetLikelihood(tree *tr, nodeptr p1, nodeptr p2, nodeptr p3, nodeptr p4, nodeptr q1, nodeptr q2, analdef *adef, boolean firstQuartet)
+{
+ /*
+ build a quartet tree, where q1 and q2 are the inner nodes and p1, p2, p3, p4
+ are the tips of the quartet where the sequence data is located.
+
+ initially set all branch lengths to the default value.
+ */
+
+ /*
+ for the tree and node data structure used, please see one of the last chapter's of Joe
+ Felsensteins book.
+ */
+
+ hookupDefault(q1, q2, tr->numBranches);
+
+ hookupDefault(q1->next, p1, tr->numBranches);
+ hookupDefault(q1->next->next, p2, tr->numBranches);
+
+ hookupDefault(q2->next, p3, tr->numBranches);
+ hookupDefault(q2->next->next, p4, tr->numBranches);
+
+ /* now compute the likelihood vectors at the two inner nodes of the tree,
+ here the virtual root is located between the two inner nodes q1 and q2.
+ */
+
+ newviewGeneric(tr, q1, FALSE);
+ newviewGeneric(tr, q2, FALSE);
+
+ /* call a function that is also used for NNIs that iteratively optimizes all
+ 5 branch lengths in the tree.
+
+ Note that 16 is an important tuning parameter, this integer value determines
+ how many times we visit all branches until we give up further optimizing the branch length
+ configuration.
+ */
+
+ nniSmooth(tr, q1, 16);
+
+ /* now compute the log likelihood of the tree for the virtual root located between inner nodes q1 and q2 */
+
+ /* debugging code
+ {
+ double l;
+ */
+
+ evaluateGeneric(tr, q1->back->next->next, FALSE);
+
+ /* debugging code
+
+ l = tr->likelihood;
+
+ newviewGeneric(tr, q1);
+ newviewGeneric(tr, q2);
+ evaluateGeneric(tr, q1);
+
+
+ assert(ABS(l - tr->likelihood) < 0.00001);
+ }
+ */
+
+ return (tr->likelihood);
+}
+
+
+
+static void computeAllThreeQuartets(tree *tr, nodeptr q1, nodeptr q2, int t1, int t2, int t3, int t4, FILE *f, analdef *adef)
+{
+ /* set the tip nodes to different sequences
+ with the tip indices t1, t2, t3, t4 */
+
+ nodeptr
+ p1 = tr->nodep[t1],
+ p2 = tr->nodep[t2],
+ p3 = tr->nodep[t3],
+ p4 = tr->nodep[t4];
+
+ double
+ l;
+
+ /* first quartet */
+
+ /* compute the likelihood of tree ((p1, p2), (p3, p4)) */
+
+ l = quartetLikelihood(tr, p1, p2, p3, p4, q1, q2, adef, TRUE);
+
+ if(processID == 0)
+ fprintf(f, "%d %d | %d %d: %f\n", p1->number, p2->number, p3->number, p4->number, l);
+
+ /* second quartet */
+
+ /* compute the likelihood of tree ((p1, p3), (p2, p4)) */
+
+ l = quartetLikelihood(tr, p1, p3, p2, p4, q1, q2, adef, FALSE);
+
+ if(processID == 0)
+ fprintf(f, "%d %d | %d %d: %f\n", p1->number, p3->number, p2->number, p4->number, l);
+
+ /* third quartet */
+
+ /* compute the likelihood of tree ((p1, p4), (p2, p3)) */
+
+ l = quartetLikelihood(tr, p1, p4, p2, p3, q1, q2, adef, FALSE);
+
+ if(processID == 0)
+ fprintf(f, "%d %d | %d %d: %f\n", p1->number, p4->number, p2->number, p3->number, l);
+}
+
+/* the three quartet options: all quartets, randomly sub-sample a certain number n of quartets,
+ subsample all quartets from 4 pre-defined groups of quartets */
+
+
+static void writeQuartetCheckpoint(uint64_t quartetCounter, FILE *f, tree *tr, analdef *adef)
+{
+ if(quartetCounter % adef->quartetCkpInterval == 0)
+ {
+ ckp.quartetCounter = quartetCounter;
+ if(processID == 0)
+ {
+ fflush(f);
+ ckp.filePosition = ftell(f);
+ }
+ printBothOpen("\nPrinting checkpoint after %f seconds of run-time\n", gettime() - masterTime);
+ writeCheckpoint(tr, adef);
+ }
+}
+
+
+#define ALL_QUARTETS 0
+#define RANDOM_QUARTETS 1
+#define GROUPED_QUARTETS 2
+
+void computeQuartets(tree *tr, analdef *adef)
+{
+ /* some indices for generating quartets in an arbitrary way */
+
+ int
+ flavor = ALL_QUARTETS, //type of quartet calculation
+ i,
+ t1,
+ t2,
+ t3,
+ t4,
+ *groups[4],
+ groupSize[4];
+
+ double
+ fraction = 0.0;
+
+ uint64_t
+ randomQuartets = (uint64_t)(adef->numberRandomQuartets), //number of random quartets to compute
+ quartetCounter = 0,
+ //total number of possible quartets, note that we count the following ((A,B),(C,D)), ((A,C),(B,D)), ((A,D),(B,C)) as one quartet here
+ numberOfQuartets = ((uint64_t)tr->mxtips * ((uint64_t)tr->mxtips - 1) * ((uint64_t)tr->mxtips - 2) * ((uint64_t)tr->mxtips - 3)) / 24;
+
+ /* use two inner tree nodes for building quartet trees */
+
+ nodeptr
+ q1 = tr->nodep[tr->mxtips + 1],
+ q2 = tr->nodep[tr->mxtips + 2];
+
+ FILE
+ *f;
+
+ long
+ seed = (long)(tr->randomSeed);
+
+ /***********************************/
+
+ /* get a starting tree on which we optimize the likelihood model parameters: either reads in a tree or computes a randomized stepwise addition parsimony tree */
+ if(adef->useCheckpoint)
+ {
+ /* read checkpoint file */
+ restart(tr, adef);
+
+ strcpy(quartetFileName, workdir);
+ strcat(quartetFileName, basename(ckp.quartetFileName));
+ printBothOpen("Time for reading checkpoint file: %f\n\n", gettime() - masterTime);
+
+ seed = ckp.seed;
+
+ if(processID == 0)
+ {
+ f = myfopen(quartetFileName, "r+");
+
+ fseek(f, ckp.filePosition, SEEK_SET);
+ if(ftruncate(fileno(f), ckp.filePosition) != 0)
+ assert(0);
+ }
+ }
+ else
+ {
+ getStartingTree(tr);
+ evaluateGeneric(tr, tr->start, TRUE);
+ treeEvaluate(tr, 1);
+
+ /* optimize model parameters on that comprehensive tree that can subsequently be used for evaluation of quartet likelihoods */
+
+ modOpt(tr, adef->likelihoodEpsilon, adef, 0);
+
+ printBothOpen("Time for parsing input tree or building parsimony tree and optimizing model parameters: %f\n\n", gettime() - masterTime);
+ printBothOpen("Tree likelihood: %f\n\n", tr->likelihood);
+
+ if(processID == 0)
+ f = myfopen(quartetFileName, "w");
+ }
+
+ /* figure out which flavor of quartets we want to compute */
+
+ if(adef->useQuartetGrouping)
+ {
+ //quartet grouping evaluates all possible quartets from four disjoint
+ //sets of user-specified taxon names
+
+ flavor = GROUPED_QUARTETS;
+
+ //parse the four disjoint sets of taxon names specified by the user from file
+ groupingParser(quartetGroupingFileName, groups, groupSize, tr);
+ }
+ else
+ {
+ //if the user specified more random quartets to sample than there actually
+ //exist for the number of taxa, then fix this.
+
+ if(randomQuartets == 0 || randomQuartets >= numberOfQuartets)
+ //TODO add usre warning? if second case above true?
+ flavor = ALL_QUARTETS;
+ else
+ {
+ //compute the fraction of random quartets to sample
+ //there may be an issue here with the unit64_t <-> double cast
+ fraction = (double)randomQuartets / (double)numberOfQuartets;
+ assert(fraction < 1.0);
+ flavor = RANDOM_QUARTETS;
+ }
+ }
+
+ ckp.state = QUARTETS;
+ ckp.seed = seed;
+ strncpy(ckp.quartetFileName, quartetFileName, 1024);
+
+ /* print some output on what we are doing*/
+
+ switch(flavor)
+ {
+ case ALL_QUARTETS:
+ printBothOpen("There are %" PRIu64 " quartet sets for which RAxML will evaluate all %" PRIu64 " quartet trees\n", numberOfQuartets, numberOfQuartets * 3);
+ break;
+ case RANDOM_QUARTETS:
+ printBothOpen("There are %" PRIu64 " quartet sets for which RAxML will randomly sub-sambple %" PRIu64 " sets (%f per cent), i.e., compute %" PRIu64 " quartet trees\n",
+ numberOfQuartets, randomQuartets, 100 * fraction, randomQuartets * 3);
+ break;
+ case GROUPED_QUARTETS:
+ printBothOpen("There are 4 quartet groups from which RAxML will evaluate all %u quartet trees\n",
+ (unsigned int)groupSize[0] * (unsigned int)groupSize[1] * (unsigned int)groupSize[2] * (unsigned int)groupSize[3] * 3);
+ break;
+ default:
+ assert(0);
+ }
+
+ /* print taxon name to taxon number correspondance table to output file */
+
+ if(!adef->useCheckpoint)
+ {
+ if(processID == 0)
+ fprintf(f, "Taxon names and indices:\n\n");
+
+ for(i = 1; i <= tr->mxtips; i++)
+ {
+ if(processID == 0)
+ fprintf(f, "%s %d\n", tr->nameList[i], i);
+ assert(tr->nodep[i]->number == i);
+ }
+
+ if(processID == 0)
+ {
+ fprintf(f, "\n\n");
+ fflush(f);
+ }
+ }
+
+
+ /* do a loop to generate some quartets to test.
+ note that tip nodes/sequences in RAxML are indexed from 1,...,n
+ and not from 0,...,n-1 as one might expect
+
+ tr->mxtips is the maximum number of tips in the alignment/tree
+ */
+
+
+ //now do the respective quartet evaluations by switching over the three distinct flavors
+
+ switch(flavor)
+ {
+ case ALL_QUARTETS:
+ {
+ /* compute all possible quartets */
+
+ for(t1 = 1; t1 <= tr->mxtips; t1++)
+ for(t2 = t1 + 1; t2 <= tr->mxtips; t2++)
+ for(t3 = t2 + 1; t3 <= tr->mxtips; t3++)
+ for(t4 = t3 + 1; t4 <= tr->mxtips; t4++)
+ {
+ if((adef->useCheckpoint && quartetCounter >= ckp.quartetCounter) || !adef->useCheckpoint)
+ {
+ writeQuartetCheckpoint(quartetCounter, f, tr, adef);
+
+ computeAllThreeQuartets(tr, q1, q2, t1, t2, t3, t4, f, adef);
+ }
+ quartetCounter++;
+ }
+
+ assert(quartetCounter == numberOfQuartets);
+ }
+ break;
+ case RANDOM_QUARTETS:
+ {
+
+ //endless loop ta make sure we randomly sub-sample exactly as many quartets as the user specified
+
+ //This is not very elegant, but it works, note however, that especially when the number of
+ //random quartets to be sampled is large, that is, close to the total number of quartets
+ //some quartets may be sampled twice by pure chance. To randomly sample unique quartets
+ //using hashes or bitmaps to store which quartets have already been sampled is not memory efficient.
+ //Insetad, we need to use a random number generator that can generate a unique series of random numbers
+ //and then have a function f() that maps those random numbers to the corresponding index quartet (t1, t2, t3, t4).
+
+ do
+ {
+ //loop over all quartets
+ for(t1 = 1; t1 <= tr->mxtips; t1++)
+ for(t2 = t1 + 1; t2 <= tr->mxtips; t2++)
+ for(t3 = t2 + 1; t3 <= tr->mxtips; t3++)
+ for(t4 = t3 + 1; t4 <= tr->mxtips; t4++)
+ {
+ //chose a random number
+ double
+ r = randum(&seed);
+
+ //if the random number is smaller than the fraction of quartets to subsample
+ //evaluate the likelihood of the current quartet
+ if(r < fraction)
+ {
+ if((adef->useCheckpoint && quartetCounter >= ckp.quartetCounter) || !adef->useCheckpoint)
+ {
+ writeQuartetCheckpoint(quartetCounter, f, tr, adef);
+
+ //function that computes the likelihood for all three possible unrooted trees
+ //defined by the given quartet of taxa
+ computeAllThreeQuartets(tr, q1, q2, t1, t2, t3, t4, f, adef);
+ }
+ //increment quartet counter that counts how many quartets we have evaluated
+ quartetCounter++;
+ }
+
+ //exit endless loop if we have randomly sub-sampled as many quartets as the user specified
+ if(quartetCounter == randomQuartets)
+ goto DONE;
+ }
+ }
+ while(1);
+
+ DONE:
+ assert(quartetCounter == randomQuartets);
+ }
+ break;
+ case GROUPED_QUARTETS:
+ {
+ /* compute all quartets that can be built out of the four pre-defined groups */
+
+ for(t1 = 0; t1 < groupSize[0]; t1++)
+ for(t2 = 0; t2 < groupSize[1]; t2++)
+ for(t3 = 0; t3 < groupSize[2]; t3++)
+ for(t4 = 0; t4 < groupSize[3]; t4++)
+ {
+ int
+ i1 = groups[0][t1],
+ i2 = groups[1][t2],
+ i3 = groups[2][t3],
+ i4 = groups[3][t4];
+
+ if((adef->useCheckpoint && quartetCounter >= ckp.quartetCounter) || !adef->useCheckpoint)
+ {
+ writeQuartetCheckpoint(quartetCounter, f, tr, adef);
+
+ computeAllThreeQuartets(tr, q1, q2, i1, i2, i3, i4, f, adef);
+ }
+ quartetCounter++;
+ }
+
+ printBothOpen("\nComputed all %" PRIu64 " possible grouped quartets\n", quartetCounter);
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ fclose(f);
+}
diff --git a/examl/restartHashTable.c b/examl/restartHashTable.c
new file mode 100644
index 0000000..6df780a
--- /dev/null
+++ b/examl/restartHashTable.c
@@ -0,0 +1,357 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+
+#include "axml.h"
+
+
+static boolean treeNeedString(const char *fp, char c1, int *position)
+{
+ char
+ c2 = fp[(*position)++];
+
+ if(c2 == c1)
+ return TRUE;
+ else
+ {
+ int
+ lower = MAX(0, *position - 20),
+ upper = *position + 20;
+
+ printf("Tree Parsing ERROR: Expecting '%c', found: '%c'\n", c1, c2);
+ printf("Context: \n");
+
+ while(lower < upper && fp[lower])
+ printf("%c", fp[lower++]);
+
+ printf("\n");
+
+ return FALSE;
+ }
+}
+
+
+
+static boolean treeLabelEndString (char ch)
+{
+ switch(ch)
+ {
+ case '\0':
+ case '\t':
+ case '\n':
+ case '\r':
+ case ' ':
+ case ':':
+ case ',':
+ case '(':
+ case ')':
+ case ';':
+ return TRUE;
+ default:
+ break;
+ }
+
+ return FALSE;
+}
+
+static boolean treeGetLabelString (const char *fp, char *lblPtr, int maxlen, int *position)
+{
+ char
+ ch;
+
+ boolean
+ done,
+ lblfound;
+
+ if (--maxlen < 0)
+ lblPtr = (char *)NULL;
+ else
+ if(lblPtr == NULL)
+ maxlen = 0;
+
+ ch = fp[(*position)++];
+
+ done = treeLabelEndString(ch);
+
+ lblfound = !done;
+
+ while(!done)
+ {
+ if(treeLabelEndString(ch))
+ break;
+
+ if(--maxlen >= 0)
+ *lblPtr++ = ch;
+
+ ch = fp[(*position)++];
+ }
+
+ (*position)--;
+
+ if (lblPtr != NULL)
+ *lblPtr = '\0';
+
+ return lblfound;
+}
+
+static boolean treeFlushLabelString(const char *fp, int *position)
+{
+ return treeGetLabelString(fp, (char *) NULL, (int) 0, position);
+}
+
+
+static boolean treeProcessLengthString (const char *fp, double *dptr, int *position)
+{
+ (*position)++;
+
+ if(sscanf(&fp[*position], "%lf", dptr) != 1)
+ {
+ printf("ERROR: treeProcessLength: Problem reading branch length\n");
+ assert(0);
+ }
+
+ while(fp[*position] != ',' && fp[*position] != ')' && fp[*position] != ';')
+ *position = *position + 1;
+
+ return TRUE;
+}
+
+static int treeFlushLenString (const char *fp, int *position)
+{
+ double
+ dummy;
+
+ char
+ ch;
+
+ ch = fp[(*position)++];
+
+ if(ch == ':')
+ {
+ if(!treeProcessLengthString(fp, &dummy, position))
+ return 0;
+ return 1;
+ }
+
+ (*position)--;
+
+ return 1;
+}
+
+static int treeFindTipByLabelString(char *str, tree *tr)
+{
+ int lookup = lookupWord(str, tr->nameHash);
+
+ if(lookup > 0)
+ {
+ assert(! tr->nodep[lookup]->back);
+ return lookup;
+ }
+ else
+ {
+ printf("ERROR: Cannot find tree species: %s\n", str);
+ return 0;
+ }
+}
+
+static int treeFindTipNameString (const char *fp, tree *tr, int *position)
+{
+ char str[nmlngth+2];
+ int n;
+
+ if(treeGetLabelString(fp, str, nmlngth+2, position))
+ n = treeFindTipByLabelString(str, tr);
+ else
+ n = 0;
+
+ return n;
+}
+
+static boolean addElementLenString(const char *fp, tree *tr, nodeptr p, int *position)
+{
+ nodeptr
+ q;
+
+ int
+ n,
+ fres;
+
+ char
+ ch;
+
+ if ((ch = fp[(*position)++]) == '(')
+ {
+ n = (tr->nextnode)++;
+ if (n > 2*(tr->mxtips) - 2)
+ {
+ if (tr->rooted || n > 2*(tr->mxtips) - 1)
+ {
+ printf("ERROR: Too many internal nodes. Is tree rooted?\n");
+ printf(" Deepest splitting should be a trifurcation.\n");
+ return FALSE;
+ }
+ else
+ {
+ tr->rooted = TRUE;
+ }
+ }
+
+ q = tr->nodep[n];
+
+ if (!addElementLenString(fp, tr, q->next, position))
+ return FALSE;
+ if (!treeNeedString(fp, ',', position))
+ return FALSE;
+ if (!addElementLenString(fp, tr, q->next->next, position))
+ return FALSE;
+ if (!treeNeedString(fp, ')', position))
+ return FALSE;
+
+
+ treeFlushLabelString(fp, position);
+ }
+ else
+ {
+ (*position)--;
+
+ if ((n = treeFindTipNameString(fp, tr, position)) <= 0)
+ return FALSE;
+ q = tr->nodep[n];
+
+ if (tr->start->number > n)
+ tr->start = q;
+ (tr->ntips)++;
+ }
+
+
+ fres = treeFlushLenString(fp, position);
+ if(!fres)
+ return FALSE;
+
+ hookupDefault(p, q, tr->numBranches);
+
+ return TRUE;
+}
+
+
+
+
+void treeReadTopologyString(char *treeString, tree *tr)
+{
+ char
+ *fp = treeString;
+
+ nodeptr
+ p;
+
+ int
+ position = 0,
+ i;
+
+ char
+ ch;
+
+
+ for(i = 1; i <= tr->mxtips; i++)
+ tr->nodep[i]->back = (node *)NULL;
+
+ for(i = tr->mxtips + 1; i < 2 * tr->mxtips; i++)
+ {
+ tr->nodep[i]->back = (nodeptr)NULL;
+ tr->nodep[i]->next->back = (nodeptr)NULL;
+ tr->nodep[i]->next->next->back = (nodeptr)NULL;
+ tr->nodep[i]->number = i;
+ tr->nodep[i]->next->number = i;
+ tr->nodep[i]->next->next->number = i;
+ }
+
+ tr->start = tr->nodep[1];
+ tr->ntips = 0;
+ tr->nextnode = tr->mxtips + 1;
+ tr->rooted = FALSE;
+
+ p = tr->nodep[(tr->nextnode)++];
+
+ assert(fp[position++] == '(');
+
+ if (! addElementLenString(fp, tr, p, &position))
+ assert(0);
+
+ if (! treeNeedString(fp, ',', &position))
+ assert(0);
+
+ if (! addElementLenString(fp, tr, p->next, &position))
+ assert(0);
+
+ if(!tr->rooted)
+ {
+ if ((ch = fp[position++]) == ',')
+ {
+ if (! addElementLenString(fp, tr, p->next->next, &position))
+ assert(0);
+ }
+ else
+ assert(0);
+ }
+ else
+ assert(0);
+
+ if (! treeNeedString(fp, ')', &position))
+ assert(0);
+
+ treeFlushLabelString(fp, &position);
+
+ if (!treeFlushLenString(fp, &position))
+ assert(0);
+
+ if (!treeNeedString(fp, ';', &position))
+ assert(0);
+
+ if(tr->rooted)
+ assert(0);
+ else
+ tr->start = tr->nodep[1];
+
+ printf("Tree parsed\n");
+
+}
diff --git a/examl/searchAlgo.c b/examl/searchAlgo.c
new file mode 100644
index 0000000..f02399b
--- /dev/null
+++ b/examl/searchAlgo.c
@@ -0,0 +1,2651 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+
+#include "axml.h"
+
+extern int processes;
+
+extern int Thorough;
+extern int optimizeRateCategoryInvocations;
+extern infoList iList;
+extern char seq_file[1024];
+extern char resultFileName[1024];
+extern char tree_file[1024];
+extern char workdir[1024];
+extern char run_id[128];
+extern double masterTime;
+extern double accumulatedTime;
+
+extern checkPointState ckp;
+extern partitionLengths pLengths[MAX_MODEL];
+extern char binaryCheckpointName[1024];
+extern char binaryCheckpointInputName[1024];
+
+extern int processID;
+
+
+
+static int checker(tree *tr, nodeptr p)
+{
+ int group = tr->constraintVector[p->number];
+
+ if(isTip(p->number, tr->mxtips))
+ {
+ group = tr->constraintVector[p->number];
+ return group;
+ }
+ else
+ {
+ if(group != -9)
+ return group;
+
+ group = checker(tr, p->next->back);
+ if(group != -9)
+ return group;
+
+ group = checker(tr, p->next->next->back);
+ if(group != -9)
+ return group;
+
+ return -9;
+ }
+}
+
+boolean initrav (tree *tr, nodeptr p)
+{
+ nodeptr q;
+
+ if (!isTip(p->number, tr->mxtips))
+ {
+ q = p->next;
+
+ do
+ {
+ if (! initrav(tr, q->back)) return FALSE;
+ q = q->next;
+ }
+ while (q != p);
+
+ newviewGeneric(tr, p, FALSE);
+ }
+
+ return TRUE;
+}
+
+
+
+
+
+
+
+
+
+
+/* #define _DEBUG_UPDATE */
+
+boolean update(tree *tr, nodeptr p)
+{
+ nodeptr q;
+ boolean smoothedPartitions[NUM_BRANCHES];
+ int i;
+ double z[NUM_BRANCHES], z0[NUM_BRANCHES];
+ double _deltaz;
+
+#ifdef _DEBUG_UPDATE
+ double
+ startLH;
+
+ evaluateGeneric(tr, p, FALSE);
+
+ startLH = tr->likelihood;
+#endif
+
+ q = p->back;
+
+ for(i = 0; i < tr->numBranches; i++)
+ z0[i] = q->z[i];
+
+ if(tr->numBranches > 1)
+ makenewzGeneric(tr, p, q, z0, newzpercycle, z, TRUE);
+ else
+ makenewzGeneric(tr, p, q, z0, newzpercycle, z, FALSE);
+
+ for(i = 0; i < tr->numBranches; i++)
+ smoothedPartitions[i] = tr->partitionSmoothed[i];
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ if(!tr->partitionConverged[i])
+ {
+ _deltaz = deltaz;
+
+ if(ABS(z[i] - z0[i]) > _deltaz)
+ {
+ smoothedPartitions[i] = FALSE;
+ }
+
+
+
+ p->z[i] = q->z[i] = z[i];
+ }
+ }
+
+#ifdef _DEBUG_UPDATE
+ evaluateGeneric(tr, p, FALSE);
+
+ if(tr->likelihood <= startLH)
+ {
+ if(fabs(tr->likelihood - startLH) > 0.01)
+ {
+ printf("%f %f\n", startLH, tr->likelihood);
+ assert(0);
+ }
+ }
+#endif
+
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionSmoothed[i] = smoothedPartitions[i];
+
+ return TRUE;
+}
+
+
+
+
+boolean smooth (tree *tr, nodeptr p)
+{
+ nodeptr q;
+
+ if (! update(tr, p)) return FALSE; /* Adjust branch */
+ if (! isTip(p->number, tr->mxtips))
+ { /* Adjust descendants */
+ q = p->next;
+ while (q != p)
+ {
+ if (! smooth(tr, q->back)) return FALSE;
+ q = q->next;
+ }
+
+ if(tr->numBranches > 1)
+ newviewGeneric(tr, p, TRUE);
+ else
+ newviewGeneric(tr, p, FALSE);
+ }
+
+ return TRUE;
+}
+
+boolean allSmoothed(tree *tr)
+{
+ int i;
+ boolean result = TRUE;
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ if(tr->partitionSmoothed[i] == FALSE)
+ result = FALSE;
+ else
+ tr->partitionConverged[i] = TRUE;
+ }
+
+ return result;
+}
+
+
+
+boolean smoothTree (tree *tr, int maxtimes)
+{
+ nodeptr p, q;
+ int i, count = 0;
+
+ p = tr->start;
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionConverged[i] = FALSE;
+
+ while (--maxtimes >= 0)
+ {
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionSmoothed[i] = TRUE;
+
+ if (! smooth(tr, p->back)) return FALSE;
+ if (!isTip(p->number, tr->mxtips))
+ {
+ q = p->next;
+ while (q != p)
+ {
+ if (! smooth(tr, q->back)) return FALSE;
+ q = q->next;
+ }
+ }
+
+ count++;
+
+ if (allSmoothed(tr))
+ break;
+ }
+
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionConverged[i] = FALSE;
+
+
+
+ return TRUE;
+}
+
+
+
+boolean localSmooth (tree *tr, nodeptr p, int maxtimes)
+{
+ nodeptr q;
+ int i;
+
+ if (isTip(p->number, tr->mxtips)) return FALSE;
+
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionConverged[i] = FALSE;
+
+ while (--maxtimes >= 0)
+ {
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionSmoothed[i] = TRUE;
+
+ q = p;
+ do
+ {
+ if (! update(tr, q)) return FALSE;
+ q = q->next;
+ }
+ while (q != p);
+
+ if (allSmoothed(tr))
+ break;
+ }
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ tr->partitionSmoothed[i] = FALSE;
+ tr->partitionConverged[i] = FALSE;
+ }
+
+ return TRUE;
+}
+
+
+
+
+
+static void resetInfoList(void)
+{
+ int i;
+
+ iList.valid = 0;
+
+ for(i = 0; i < iList.n; i++)
+ {
+ iList.list[i].node = (nodeptr)NULL;
+ iList.list[i].likelihood = unlikely;
+ }
+}
+
+void initInfoList(int n)
+{
+ int i;
+
+ iList.n = n;
+ iList.valid = 0;
+ iList.list = (bestInfo *)malloc(sizeof(bestInfo) * n);
+
+ for(i = 0; i < n; i++)
+ {
+ iList.list[i].node = (nodeptr)NULL;
+ iList.list[i].likelihood = unlikely;
+ }
+}
+
+void freeInfoList(void)
+{
+ free(iList.list);
+}
+
+
+void insertInfoList(nodeptr node, double likelihood)
+{
+ int i;
+ int min = 0;
+ double min_l = iList.list[0].likelihood;
+
+ for(i = 1; i < iList.n; i++)
+ {
+ if(iList.list[i].likelihood < min_l)
+ {
+ min = i;
+ min_l = iList.list[i].likelihood;
+ }
+ }
+
+ if(likelihood > min_l)
+ {
+ iList.list[min].likelihood = likelihood;
+ iList.list[min].node = node;
+ iList.valid += 1;
+ }
+
+ if(iList.valid > iList.n)
+ iList.valid = iList.n;
+}
+
+
+boolean smoothRegion (tree *tr, nodeptr p, int region)
+{
+ nodeptr q;
+
+ if (! update(tr, p)) return FALSE; /* Adjust branch */
+
+ if(region > 0)
+ {
+ if (!isTip(p->number, tr->mxtips))
+ {
+ q = p->next;
+ while (q != p)
+ {
+ if (! smoothRegion(tr, q->back, --region)) return FALSE;
+ q = q->next;
+ }
+
+ newviewGeneric(tr, p, FALSE);
+ }
+ }
+
+ return TRUE;
+}
+
+boolean regionalSmooth (tree *tr, nodeptr p, int maxtimes, int region)
+ {
+ nodeptr q;
+ int i;
+
+ if (isTip(p->number, tr->mxtips)) return FALSE; /* Should be an error */
+
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionConverged[i] = FALSE;
+
+ while (--maxtimes >= 0)
+ {
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionSmoothed[i] = TRUE;
+
+ q = p;
+ do
+ {
+ if (! smoothRegion(tr, q, region)) return FALSE;
+ q = q->next;
+ }
+ while (q != p);
+
+ if (allSmoothed(tr))
+ break;
+ }
+
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionSmoothed[i] = FALSE;
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionConverged[i] = FALSE;
+
+ return TRUE;
+ } /* localSmooth */
+
+
+
+
+
+nodeptr removeNodeBIG (tree *tr, nodeptr p, int numBranches)
+{
+ double zqr[NUM_BRANCHES], result[NUM_BRANCHES];
+ nodeptr q, r;
+ int i;
+
+ q = p->next->back;
+ r = p->next->next->back;
+
+ for(i = 0; i < numBranches; i++)
+ zqr[i] = q->z[i] * r->z[i];
+
+ makenewzGeneric(tr, q, r, zqr, iterations, result, FALSE);
+
+ for(i = 0; i < numBranches; i++)
+ tr->zqr[i] = result[i];
+
+ hookup(q, r, result, numBranches);
+
+ p->next->next->back = p->next->back = (node *) NULL;
+
+ return q;
+}
+
+nodeptr removeNodeRestoreBIG (tree *tr, nodeptr p)
+{
+ nodeptr q, r;
+
+ q = p->next->back;
+ r = p->next->next->back;
+
+ newviewGeneric(tr, q, FALSE);
+ newviewGeneric(tr, r, FALSE);
+
+ hookup(q, r, tr->currentZQR, tr->numBranches);
+
+ p->next->next->back = p->next->back = (node *) NULL;
+
+ return q;
+}
+
+
+boolean insertBIG (tree *tr, nodeptr p, nodeptr q, int numBranches)
+{
+ nodeptr r, s;
+ int i;
+
+ r = q->back;
+ s = p->back;
+
+ for(i = 0; i < numBranches; i++)
+ tr->lzi[i] = q->z[i];
+
+ if(Thorough)
+ {
+ double zqr[NUM_BRANCHES], zqs[NUM_BRANCHES], zrs[NUM_BRANCHES], lzqr, lzqs, lzrs, lzsum, lzq, lzr, lzs, lzmax;
+ double defaultArray[NUM_BRANCHES];
+ double e1[NUM_BRANCHES], e2[NUM_BRANCHES], e3[NUM_BRANCHES];
+ double *qz;
+
+ qz = q->z;
+
+ for(i = 0; i < numBranches; i++)
+ defaultArray[i] = defaultz;
+
+ makenewzGeneric(tr, q, r, qz, iterations, zqr, FALSE);
+ makenewzGeneric(tr, q, s, defaultArray, iterations, zqs, FALSE);
+ makenewzGeneric(tr, r, s, defaultArray, iterations, zrs, FALSE);
+
+
+ for(i = 0; i < numBranches; i++)
+ {
+ lzqr = (zqr[i] > zmin) ? log(zqr[i]) : log(zmin);
+ lzqs = (zqs[i] > zmin) ? log(zqs[i]) : log(zmin);
+ lzrs = (zrs[i] > zmin) ? log(zrs[i]) : log(zmin);
+ lzsum = 0.5 * (lzqr + lzqs + lzrs);
+
+ lzq = lzsum - lzrs;
+ lzr = lzsum - lzqs;
+ lzs = lzsum - lzqr;
+ lzmax = log(zmax);
+
+ if (lzq > lzmax) {lzq = lzmax; lzr = lzqr; lzs = lzqs;}
+ else if (lzr > lzmax) {lzr = lzmax; lzq = lzqr; lzs = lzrs;}
+ else if (lzs > lzmax) {lzs = lzmax; lzq = lzqs; lzr = lzrs;}
+
+ e1[i] = exp(lzq);
+ e2[i] = exp(lzr);
+ e3[i] = exp(lzs);
+ }
+ hookup(p->next, q, e1, numBranches);
+ hookup(p->next->next, r, e2, numBranches);
+ hookup(p, s, e3, numBranches);
+ }
+ else
+ {
+ double z[NUM_BRANCHES];
+
+ for(i = 0; i < numBranches; i++)
+ {
+ z[i] = sqrt(q->z[i]);
+
+ if(z[i] < zmin)
+ z[i] = zmin;
+ if(z[i] > zmax)
+ z[i] = zmax;
+ }
+
+ hookup(p->next, q, z, tr->numBranches);
+ hookup(p->next->next, r, z, tr->numBranches);
+ }
+
+ newviewGeneric(tr, p, FALSE);
+
+ if(Thorough)
+ {
+ localSmooth(tr, p, smoothings);
+
+ for(i = 0; i < numBranches; i++)
+ {
+ tr->lzq[i] = p->next->z[i];
+ tr->lzr[i] = p->next->next->z[i];
+ tr->lzs[i] = p->z[i];
+ }
+ }
+
+ return TRUE;
+}
+
+boolean insertRestoreBIG (tree *tr, nodeptr p, nodeptr q)
+{
+ nodeptr r, s;
+
+ r = q->back;
+ s = p->back;
+
+ if(Thorough)
+ {
+ hookup(p->next, q, tr->currentLZQ, tr->numBranches);
+ hookup(p->next->next, r, tr->currentLZR, tr->numBranches);
+ hookup(p, s, tr->currentLZS, tr->numBranches);
+ }
+ else
+ {
+ double z[NUM_BRANCHES];
+ int i;
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ double zz;
+ zz = sqrt(q->z[i]);
+ if(zz < zmin)
+ zz = zmin;
+ if(zz > zmax)
+ zz = zmax;
+ z[i] = zz;
+ }
+
+ hookup(p->next, q, z, tr->numBranches);
+ hookup(p->next->next, r, z, tr->numBranches);
+ }
+
+ newviewGeneric(tr, p, FALSE);
+
+ return TRUE;
+}
+
+
+static void restoreTopologyOnly(tree *tr, bestlist *bt, bestlist *bestML)
+{
+ nodeptr p = tr->removeNode;
+ nodeptr q = tr->insertNode;
+ double qz[NUM_BRANCHES], pz[NUM_BRANCHES], p1z[NUM_BRANCHES], p2z[NUM_BRANCHES];
+ nodeptr p1, p2, r, s;
+ double currentLH = tr->likelihood;
+ int i;
+
+ p1 = p->next->back;
+ p2 = p->next->next->back;
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ p1z[i] = p1->z[i];
+ p2z[i] = p2->z[i];
+ }
+
+ hookup(p1, p2, tr->currentZQR, tr->numBranches);
+
+ p->next->next->back = p->next->back = (node *) NULL;
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ qz[i] = q->z[i];
+ pz[i] = p->z[i];
+ }
+
+ r = q->back;
+ s = p->back;
+
+ if(Thorough)
+ {
+ hookup(p->next, q, tr->currentLZQ, tr->numBranches);
+ hookup(p->next->next, r, tr->currentLZR, tr->numBranches);
+ hookup(p, s, tr->currentLZS, tr->numBranches);
+ }
+ else
+ {
+ double z[NUM_BRANCHES];
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ z[i] = sqrt(q->z[i]);
+ if(z[i] < zmin)
+ z[i] = zmin;
+ if(z[i] > zmax)
+ z[i] = zmax;
+ }
+ hookup(p->next, q, z, tr->numBranches);
+ hookup(p->next->next, r, z, tr->numBranches);
+ }
+
+ tr->likelihood = tr->bestOfNode;
+
+ saveBestTree(bt, tr, TRUE);
+ if(tr->saveBestTrees)
+ saveBestTree(bestML, tr, FALSE);
+
+ tr->likelihood = currentLH;
+
+ hookup(q, r, qz, tr->numBranches);
+
+ p->next->next->back = p->next->back = (nodeptr) NULL;
+
+ if(Thorough)
+ hookup(p, s, pz, tr->numBranches);
+
+ hookup(p->next, p1, p1z, tr->numBranches);
+ hookup(p->next->next, p2, p2z, tr->numBranches);
+}
+
+
+
+boolean testInsertBIG (tree *tr, nodeptr p, nodeptr q)
+{
+ double qz[NUM_BRANCHES], pz[NUM_BRANCHES];
+ nodeptr r;
+ boolean doIt = TRUE;
+ double startLH = tr->endLH;
+ int i;
+
+ r = q->back;
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ qz[i] = q->z[i];
+ pz[i] = p->z[i];
+ }
+
+ if(tr->constraintTree)
+ {
+ int rNumber, qNumber, pNumber;
+
+ doIt = FALSE;
+
+ rNumber = tr->constraintVector[r->number];
+ qNumber = tr->constraintVector[q->number];
+ pNumber = tr->constraintVector[p->number];
+
+ if(pNumber == -9)
+ pNumber = checker(tr, p->back);
+ if(pNumber == -9)
+ doIt = TRUE;
+ else
+ {
+ if(qNumber == -9)
+ qNumber = checker(tr, q);
+
+ if(rNumber == -9)
+ rNumber = checker(tr, r);
+
+ if(pNumber == rNumber || pNumber == qNumber)
+ doIt = TRUE;
+ }
+ }
+
+ if(doIt)
+ {
+ if (! insertBIG(tr, p, q, tr->numBranches)) return FALSE;
+
+ evaluateGeneric(tr, p->next->next, FALSE);
+
+ if(tr->likelihood > tr->bestOfNode)
+ {
+ tr->bestOfNode = tr->likelihood;
+ tr->insertNode = q;
+ tr->removeNode = p;
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ tr->currentZQR[i] = tr->zqr[i];
+ tr->currentLZR[i] = tr->lzr[i];
+ tr->currentLZQ[i] = tr->lzq[i];
+ tr->currentLZS[i] = tr->lzs[i];
+ }
+ }
+
+ if(tr->likelihood > tr->endLH)
+ {
+ tr->insertNode = q;
+ tr->removeNode = p;
+ for(i = 0; i < tr->numBranches; i++)
+ tr->currentZQR[i] = tr->zqr[i];
+ tr->endLH = tr->likelihood;
+ }
+
+ hookup(q, r, qz, tr->numBranches);
+
+ p->next->next->back = p->next->back = (nodeptr) NULL;
+
+ if(Thorough)
+ {
+ nodeptr s = p->back;
+ hookup(p, s, pz, tr->numBranches);
+ }
+
+ if((tr->doCutoff) && (tr->likelihood < startLH))
+ {
+ tr->lhAVG += (startLH - tr->likelihood);
+ tr->lhDEC++;
+ if((startLH - tr->likelihood) >= tr->lhCutoff)
+ return FALSE;
+ else
+ return TRUE;
+ }
+ else
+ return TRUE;
+ }
+ else
+ return TRUE;
+}
+
+
+
+
+
+
+
+void addTraverseBIG(tree *tr, nodeptr p, nodeptr q, int mintrav, int maxtrav)
+{
+ if (--mintrav <= 0)
+ {
+ if (! testInsertBIG(tr, p, q)) return;
+
+ }
+
+ if ((!isTip(q->number, tr->mxtips)) && (--maxtrav > 0))
+ {
+ addTraverseBIG(tr, p, q->next->back, mintrav, maxtrav);
+ addTraverseBIG(tr, p, q->next->next->back, mintrav, maxtrav);
+ }
+}
+
+
+
+
+
+int rearrangeBIG(tree *tr, nodeptr p, int mintrav, int maxtrav)
+{
+ double p1z[NUM_BRANCHES], p2z[NUM_BRANCHES], q1z[NUM_BRANCHES], q2z[NUM_BRANCHES];
+ nodeptr p1, p2, q, q1, q2;
+ int mintrav2, i;
+ boolean doP = TRUE, doQ = TRUE;
+
+ if (maxtrav < 1 || mintrav > maxtrav) return 0;
+ q = p->back;
+
+
+
+ if (!isTip(p->number, tr->mxtips) && doP)
+ {
+ p1 = p->next->back;
+ p2 = p->next->next->back;
+
+
+ if(!isTip(p1->number, tr->mxtips) || !isTip(p2->number, tr->mxtips))
+ {
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ p1z[i] = p1->z[i];
+ p2z[i] = p2->z[i];
+ }
+
+ if (! removeNodeBIG(tr, p, tr->numBranches)) return badRear;
+
+ if (!isTip(p1->number, tr->mxtips))
+ {
+ addTraverseBIG(tr, p, p1->next->back,
+ mintrav, maxtrav);
+ addTraverseBIG(tr, p, p1->next->next->back,
+ mintrav, maxtrav);
+ }
+
+ if (!isTip(p2->number, tr->mxtips))
+ {
+ addTraverseBIG(tr, p, p2->next->back,
+ mintrav, maxtrav);
+ addTraverseBIG(tr, p, p2->next->next->back,
+ mintrav, maxtrav);
+ }
+
+ hookup(p->next, p1, p1z, tr->numBranches);
+ hookup(p->next->next, p2, p2z, tr->numBranches);
+ newviewGeneric(tr, p, FALSE);
+ }
+ }
+
+ if (!isTip(q->number, tr->mxtips) && maxtrav > 0 && doQ)
+ {
+ q1 = q->next->back;
+ q2 = q->next->next->back;
+
+ /*if (((!q1->tip) && (!q1->next->back->tip || !q1->next->next->back->tip)) ||
+ ((!q2->tip) && (!q2->next->back->tip || !q2->next->next->back->tip))) */
+ if (
+ (
+ ! isTip(q1->number, tr->mxtips) &&
+ (! isTip(q1->next->back->number, tr->mxtips) || ! isTip(q1->next->next->back->number, tr->mxtips))
+ )
+ ||
+ (
+ ! isTip(q2->number, tr->mxtips) &&
+ (! isTip(q2->next->back->number, tr->mxtips) || ! isTip(q2->next->next->back->number, tr->mxtips))
+ )
+ )
+ {
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ q1z[i] = q1->z[i];
+ q2z[i] = q2->z[i];
+ }
+
+ if (! removeNodeBIG(tr, q, tr->numBranches)) return badRear;
+
+ mintrav2 = mintrav > 2 ? mintrav : 2;
+
+ if (/*! q1->tip*/ !isTip(q1->number, tr->mxtips))
+ {
+ addTraverseBIG(tr, q, q1->next->back,
+ mintrav2 , maxtrav);
+ addTraverseBIG(tr, q, q1->next->next->back,
+ mintrav2 , maxtrav);
+ }
+
+ if (/*! q2->tip*/ ! isTip(q2->number, tr->mxtips))
+ {
+ addTraverseBIG(tr, q, q2->next->back,
+ mintrav2 , maxtrav);
+ addTraverseBIG(tr, q, q2->next->next->back,
+ mintrav2 , maxtrav);
+ }
+
+ hookup(q->next, q1, q1z, tr->numBranches);
+ hookup(q->next->next, q2, q2z, tr->numBranches);
+
+ newviewGeneric(tr, q, FALSE);
+ }
+ }
+
+ return 1;
+}
+
+
+
+
+
+double treeOptimizeRapid(tree *tr, int mintrav, int maxtrav, analdef *adef, bestlist *bt, bestlist *bestML)
+{
+ int
+ i,
+ index,
+ *perm = (int*)NULL;
+
+ nodeRectifier(tr);
+
+ if (maxtrav > tr->mxtips - 3)
+ maxtrav = tr->mxtips - 3;
+
+ resetInfoList();
+
+ resetBestTree(bt);
+
+ tr->startLH = tr->endLH = tr->likelihood;
+
+ if(tr->doCutoff)
+ {
+ if(tr->bigCutoff)
+ {
+ if(tr->itCount == 0)
+ tr->lhCutoff = 0.5 * (tr->likelihood / -1000.0);
+ else
+ tr->lhCutoff = 0.5 * ((tr->lhAVG) / ((double)(tr->lhDEC)));
+ }
+ else
+ {
+ if(tr->itCount == 0)
+ tr->lhCutoff = tr->likelihood / -1000.0;
+ else
+ tr->lhCutoff = (tr->lhAVG) / ((double)(tr->lhDEC));
+ }
+
+ tr->itCount = tr->itCount + 1;
+ tr->lhAVG = 0;
+ tr->lhDEC = 0;
+ }
+
+ /*
+ printf("DoCutoff: %d\n", tr->doCutoff);
+ printf("%d %f %f %f\n", tr->itCount, tr->lhAVG, tr->lhDEC, tr->lhCutoff);
+
+ printf("%d %d\n", mintrav, maxtrav);
+ */
+
+ for(i = 1; i <= tr->mxtips + tr->mxtips - 2; i++)
+ {
+ tr->bestOfNode = unlikely;
+
+ if(adef->permuteTreeoptimize)
+ index = perm[i];
+ else
+ index = i;
+
+ if(rearrangeBIG(tr, tr->nodep[index], mintrav, maxtrav))
+ {
+ if(Thorough)
+ {
+ if(tr->endLH > tr->startLH)
+ {
+ restoreTreeFast(tr);
+ tr->startLH = tr->endLH = tr->likelihood;
+ saveBestTree(bt, tr, TRUE);
+ if(tr->saveBestTrees)
+ saveBestTree(bestML, tr, FALSE);
+ }
+ else
+ {
+ if(tr->bestOfNode != unlikely)
+ restoreTopologyOnly(tr, bt, bestML);
+ }
+ }
+ else
+ {
+ insertInfoList(tr->nodep[index], tr->bestOfNode);
+ if(tr->endLH > tr->startLH)
+ {
+ restoreTreeFast(tr);
+ tr->startLH = tr->endLH = tr->likelihood;
+ }
+ }
+ }
+ }
+
+ if(!Thorough)
+ {
+ Thorough = 1;
+
+ for(i = 0; i < iList.valid; i++)
+ {
+ tr->bestOfNode = unlikely;
+
+ if(rearrangeBIG(tr, iList.list[i].node, mintrav, maxtrav))
+ {
+ if(tr->endLH > tr->startLH)
+ {
+ restoreTreeFast(tr);
+ tr->startLH = tr->endLH = tr->likelihood;
+ saveBestTree(bt, tr, TRUE);
+ if(tr->saveBestTrees)
+ saveBestTree(bestML, tr, FALSE);
+ }
+ else
+ {
+
+ if(tr->bestOfNode != unlikely)
+ {
+ restoreTopologyOnly(tr, bt, bestML);
+ }
+ }
+ }
+ }
+
+ Thorough = 0;
+ }
+
+ if(adef->permuteTreeoptimize)
+ free(perm);
+
+ return tr->startLH;
+}
+
+
+
+
+boolean testInsertRestoreBIG (tree *tr, nodeptr p, nodeptr q)
+{
+ if(Thorough)
+ {
+ if (! insertBIG(tr, p, q, tr->numBranches)) return FALSE;
+
+ evaluateGeneric(tr, p->next->next, FALSE);
+ }
+ else
+ {
+ if (! insertRestoreBIG(tr, p, q)) return FALSE;
+
+ {
+ nodeptr x, y;
+ x = p->next->next;
+ y = p->back;
+
+ if(! isTip(x->number, tr->mxtips) && isTip(y->number, tr->mxtips))
+ {
+ while ((! x->x))
+ {
+ if (! (x->x))
+ newviewGeneric(tr, x, FALSE);
+ }
+ }
+
+ if(isTip(x->number, tr->mxtips) && !isTip(y->number, tr->mxtips))
+ {
+ while ((! y->x))
+ {
+ if (! (y->x))
+ newviewGeneric(tr, y, FALSE);
+ }
+ }
+
+ if(!isTip(x->number, tr->mxtips) && !isTip(y->number, tr->mxtips))
+ {
+ while ((! x->x) || (! y->x))
+ {
+ if (! (x->x))
+ newviewGeneric(tr, x, FALSE);
+ if (! (y->x))
+ newviewGeneric(tr, y, FALSE);
+ }
+ }
+
+ }
+
+ tr->likelihood = tr->endLH;
+ }
+
+ return TRUE;
+}
+
+void restoreTreeFast(tree *tr)
+{
+ removeNodeRestoreBIG(tr, tr->removeNode);
+ testInsertRestoreBIG(tr, tr->removeNode, tr->insertNode);
+}
+
+
+static void writeTree(tree *tr, FILE *f)
+{
+ int
+ x = tr->mxtips + 3 * (tr->mxtips - 1);
+
+ nodeptr
+ base = tr->nodeBaseAddress;
+
+ myBinFwrite(&(tr->start->number), sizeof(int), 1, f);
+ myBinFwrite(&base, sizeof(nodeptr), 1, f);
+ myBinFwrite(tr->nodeBaseAddress, sizeof(node), x, f);
+
+}
+
+int ckpCount = 0;
+
+
+/**
+ gathers patrat and rateCategory
+ */
+static void gatherDistributedCatInfos(tree *tr, int **rateCategory_result, double **patrat_result)
+{
+ /*
+ countPerProc and displPerProc must be int, since the MPI functino
+ signatures demand so
+ */
+
+ int
+ *countPerProc = (int*)NULL,
+ *displPerProc = (int*)NULL;
+
+ calculateLengthAndDisplPerProcess(tr, &countPerProc, &displPerProc);
+
+ if(processID == 0)
+ {
+ *rateCategory_result = (int*)calloc((size_t)tr->originalCrunchedLength , sizeof(int));
+ *patrat_result = (double*)calloc((size_t)tr->originalCrunchedLength, sizeof(double));
+ }
+
+ gatherDistributedArray(tr, (void**) patrat_result, tr->patrat_basePtr, MPI_DOUBLE , countPerProc, displPerProc);
+ gatherDistributedArray(tr, (void**) rateCategory_result, tr->rateCategory_basePtr, MPI_INT, countPerProc, displPerProc );
+
+ free(countPerProc);
+ free(displPerProc);
+}
+
+
+/**
+ added parameters patrat and rateCategory. The checkpoint writer
+ has to gather this distributed information first.
+ */
+static void writeCheckpointInner(tree *tr, int *rateCategory, double *patrat, analdef *adef)
+{
+ int
+ model;
+
+ char
+ extendedName[2048],
+ buf[64];
+
+ FILE
+ *f;
+
+ /* only master should write the checkpoint */
+ assert(processID == 0);
+
+ strcpy(extendedName, binaryCheckpointName);
+ strcat(extendedName, "_");
+ sprintf(buf, "%d", ckpCount);
+ strcat(extendedName, buf);
+
+ ckpCount++;
+
+ f = myfopen(extendedName, "w");
+
+
+ ckp.cmd.useMedian = tr->useMedian;
+ ckp.cmd.saveBestTrees = tr->saveBestTrees;
+ ckp.cmd.saveMemory = tr->saveMemory;
+ ckp.cmd.searchConvergenceCriterion = tr->searchConvergenceCriterion;
+ ckp.cmd.perGeneBranchLengths = adef->perGeneBranchLengths; //adef
+ ckp.cmd.likelihoodEpsilon = adef->likelihoodEpsilon; //adef
+ ckp.cmd.categories = tr->categories;
+ ckp.cmd.mode = adef->mode; //adef
+ ckp.cmd.fastTreeEvaluation = tr->fastTreeEvaluation;
+ ckp.cmd.initialSet = adef->initialSet;//adef
+ ckp.cmd.initial = adef->initial;//adef
+ ckp.cmd.rateHetModel = tr->rateHetModel;
+ ckp.cmd.autoProteinSelectionType = tr->autoProteinSelectionType;
+
+ ckp.cmd.useQuartetGrouping = adef->useQuartetGrouping;
+ ckp.cmd.numberRandomQuartets = adef->numberRandomQuartets;
+
+ /* cdta */
+
+ ckp.accumulatedTime = accumulatedTime + (gettime() - masterTime);
+ ckp.constraintTree = tr->constraintTree;
+
+ /* printf("Acc time: %f\n", ckp.accumulatedTime); */
+
+ myBinFwrite(&ckp, sizeof(checkPointState), 1, f);
+
+ if(tr->constraintTree)
+ myBinFwrite(tr->constraintVector, sizeof(int), 2 * tr->mxtips, f);
+
+ myBinFwrite(tr->tree0, sizeof(char), tr->treeStringLength, f);
+ myBinFwrite(tr->tree1, sizeof(char), tr->treeStringLength, f);
+
+
+ if(tr->rateHetModel == CAT)
+ {
+ myBinFwrite(rateCategory, sizeof(int), tr->originalCrunchedLength, f);
+ myBinFwrite(patrat, sizeof(double), tr->originalCrunchedLength, f);
+ }
+
+ //end
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ int
+ dataType = tr->partitionData[model].dataType;
+
+ myBinFwrite(&(tr->partitionData[model].numberOfCategories), sizeof(int), 1, f);
+ myBinFwrite(tr->partitionData[model].perSiteRates, sizeof(double), tr->maxCategories, f);
+ myBinFwrite(tr->partitionData[model].EIGN, sizeof(double), pLengths[dataType].eignLength, f);
+ myBinFwrite(tr->partitionData[model].EV, sizeof(double), pLengths[dataType].evLength, f);
+ myBinFwrite(tr->partitionData[model].EI, sizeof(double), pLengths[dataType].eiLength, f);
+
+ myBinFwrite(tr->partitionData[model].freqExponents, sizeof(double), pLengths[dataType].frequenciesLength, f);
+ myBinFwrite(tr->partitionData[model].frequencies, sizeof(double), pLengths[dataType].frequenciesLength, f);
+ myBinFwrite(tr->partitionData[model].tipVector, sizeof(double), pLengths[dataType].tipVectorLength, f);
+ myBinFwrite(tr->partitionData[model].substRates, sizeof(double), pLengths[dataType].substRatesLength, f);
+
+ //LG4X related variables
+
+ myBinFwrite(tr->partitionData[model].weights , sizeof(double), 4, f);
+ myBinFwrite(tr->partitionData[model].weightExponents , sizeof(double), 4, f);
+ //myBinFwrite(tr->partitionData[model].weightsBuffer , sizeof(double), 4, f);
+ //myBinFwrite(tr->partitionData[model].weightExponentsBuffer , sizeof(double), 4, f);
+
+ //LG4X end
+
+ if(tr->partitionData[model].protModels == LG4M || tr->partitionData[model].protModels == LG4X)
+ {
+ int
+ k;
+
+ for(k = 0; k < 4; k++)
+ {
+ myBinFwrite(tr->partitionData[model].rawEIGN_LG4[k], sizeof(double), pLengths[dataType].eignLength, f);
+ myBinFwrite(tr->partitionData[model].EIGN_LG4[k], sizeof(double), pLengths[dataType].eignLength, f);
+ myBinFwrite(tr->partitionData[model].EV_LG4[k], sizeof(double), pLengths[dataType].evLength, f);
+ myBinFwrite(tr->partitionData[model].EI_LG4[k], sizeof(double), pLengths[dataType].eiLength, f);
+ myBinFwrite(tr->partitionData[model].frequencies_LG4[k], sizeof(double), pLengths[dataType].frequenciesLength, f);
+ myBinFwrite(tr->partitionData[model].tipVector_LG4[k], sizeof(double), pLengths[dataType].tipVectorLength, f);
+ myBinFwrite(tr->partitionData[model].substRates_LG4[k], sizeof(double), pLengths[dataType].substRatesLength, f);
+ }
+ }
+
+ myBinFwrite(&(tr->partitionData[model].alpha), sizeof(double), 1, f);
+ myBinFwrite(&(tr->partitionData[model].gammaRates), sizeof(double), 4, f);
+
+ myBinFwrite(&(tr->partitionData[model].protModels), sizeof(int), 1, f);
+ myBinFwrite(&(tr->partitionData[model].autoProtModels), sizeof(int), 1, f);
+ }
+
+ if(ckp.state == MOD_OPT)
+ {
+ myBinFwrite(tr->likelihoods, sizeof(double), tr->numberOfTrees, f);
+ myBinFwrite(tr->treeStrings, sizeof(char), (size_t)tr->treeStringLength * (size_t)tr->numberOfTrees, f);
+ }
+
+ writeTree(tr, f);
+
+ fclose(f);
+
+ /* printBothOpen("\nCheckpoint written to: %s likelihood: %f\n", extendedName, tr->likelihood); */
+}
+
+
+void writeCheckpoint(tree *tr, analdef *adef)
+{
+ int
+ *rateCategory = (int *)NULL;
+
+ double
+ *patrat = (double *)NULL;
+
+ if(tr->rateHetModel == CAT)
+ gatherDistributedCatInfos(tr, &rateCategory, &patrat);
+
+ if(processID == 0)
+ {
+ writeCheckpointInner(tr, rateCategory, patrat, adef);
+
+ if(tr->rateHetModel == CAT)
+ {
+ free(rateCategory);
+ free(patrat);
+ }
+ }
+}
+
+
+
+
+static void readTree(tree *tr, FILE *f)
+{
+ int
+ nodeNumber,
+ x = tr->mxtips + 3 * (tr->mxtips - 1);
+
+
+
+
+
+ nodeptr
+ startAddress;
+
+ myBinFread(&nodeNumber, sizeof(int), 1, f);
+
+ tr->start = tr->nodep[nodeNumber];
+
+ /*printf("Start: %d %d\n", tr->start->number, nodeNumber);*/
+
+ myBinFread(&startAddress, sizeof(nodeptr), 1, f);
+
+ /*printf("%u %u\n", (size_t)startAddress, (size_t)tr->nodeBaseAddress);*/
+
+
+
+ myBinFread(tr->nodeBaseAddress, sizeof(node), x, f);
+
+ {
+ int i;
+
+ size_t
+ offset;
+
+ boolean
+ addIt;
+
+ if(startAddress > tr->nodeBaseAddress)
+ {
+ addIt = FALSE;
+ offset = (size_t)startAddress - (size_t)tr->nodeBaseAddress;
+ }
+ else
+ {
+ addIt = TRUE;
+ offset = (size_t)tr->nodeBaseAddress - (size_t)startAddress;
+ }
+
+ for(i = 0; i < x; i++)
+ {
+ if(addIt)
+ {
+ tr->nodeBaseAddress[i].next = (nodeptr)((size_t)tr->nodeBaseAddress[i].next + offset);
+ tr->nodeBaseAddress[i].back = (nodeptr)((size_t)tr->nodeBaseAddress[i].back + offset);
+ }
+ else
+ {
+
+ tr->nodeBaseAddress[i].next = (nodeptr)((size_t)tr->nodeBaseAddress[i].next - offset);
+ tr->nodeBaseAddress[i].back = (nodeptr)((size_t)tr->nodeBaseAddress[i].back - offset);
+ }
+ }
+
+ }
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ if(ckp.state != QUARTETS)
+ printBothOpen("ExaML Restart with likelihood: %1.50f\n", tr->likelihood);
+}
+
+static void genericError(void)
+{
+ printBothOpen("\nError: command lines used in initial run and re-start from checkpoint do not match!\n");
+}
+
+static void checkCommandLineArguments(tree *tr, analdef *adef)
+{
+ boolean
+ match = TRUE;
+
+ if(ckp.cmd.useMedian != tr->useMedian)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in median for gamma option: -a\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.saveBestTrees != tr->saveBestTrees)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in tree saving option: -B\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.saveMemory != tr->saveMemory)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in memory saving option: -S\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.searchConvergenceCriterion != tr->searchConvergenceCriterion)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in search convergence criterion: -D\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.perGeneBranchLengths != adef->perGeneBranchLengths)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in using per-partition branch lengths: -M\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.likelihoodEpsilon != adef->likelihoodEpsilon)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in likelihood epsilon value: -e\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.categories != tr->categories)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in number of PSR rate categories: -c\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.mode != adef->mode)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in tree search or evaluation mode\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.fastTreeEvaluation != tr->fastTreeEvaluation)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in fast tree evaluation: -e|-E\n");
+ match = FALSE;
+ }
+
+
+
+ if(ckp.cmd.initialSet != adef->initialSet)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in rearrangement radius limitation setting: -i\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.initial != adef->initial)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in rearrangement radius value: -i\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.rateHetModel != tr->rateHetModel)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in rate heterogeneity model: -m\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.autoProteinSelectionType != tr->autoProteinSelectionType)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in protein model selection criterion: --auto-prot\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.useQuartetGrouping != adef->useQuartetGrouping)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in quartet grouping option: -Y\n");
+ match = FALSE;
+ }
+
+ if(ckp.cmd.numberRandomQuartets != adef->numberRandomQuartets)
+ {
+ genericError();
+ printBothOpen("\nDisagreement in number of random quartet subsamples: -r\n");
+ match = FALSE;
+ }
+
+ if(!match)
+ {
+ printBothOpen("\nExaML will exit now ...\n\n");
+ errorExit(-1);
+ }
+}
+
+static void readCheckpoint(tree *tr, analdef *adef)
+{
+ int
+ model;
+
+ FILE
+ *f = myfopen(binaryCheckpointInputName, "rb");
+
+ /* cdta */
+
+ myBinFread(&ckp, sizeof(checkPointState), 1, f);
+
+ checkCommandLineArguments(tr, adef);
+
+ tr->constraintTree = ckp.constraintTree;
+
+ if(tr->constraintTree)
+ myBinFread(tr->constraintVector, sizeof(int), 2 * tr->mxtips, f);
+
+ tr->ntips = tr->mxtips;
+
+
+
+ tr->startLH = ckp.tr_startLH;
+ tr->endLH = ckp.tr_endLH;
+ tr->likelihood = ckp.tr_likelihood;
+ tr->bestOfNode = ckp.tr_bestOfNode;
+
+ tr->lhCutoff = ckp.tr_lhCutoff;
+ tr->lhAVG = ckp.tr_lhAVG;
+ tr->lhDEC = ckp.tr_lhDEC;
+ tr->itCount = ckp.tr_itCount;
+ Thorough = ckp.Thorough;
+
+ accumulatedTime = ckp.accumulatedTime;
+
+ /* printf("Accumulated time so far: %f\n", accumulatedTime); */
+
+ optimizeRateCategoryInvocations = ckp.optimizeRateCategoryInvocations;
+
+
+ myBinFread(tr->tree0, sizeof(char), tr->treeStringLength, f);
+ myBinFread(tr->tree1, sizeof(char), tr->treeStringLength, f);
+
+ if(tr->searchConvergenceCriterion && processID == 0)
+ {
+ int bCounter = 0;
+
+ if((ckp.state == FAST_SPRS && ckp.fastIterations > 0) ||
+ (ckp.state == SLOW_SPRS && ckp.thoroughIterations > 0))
+ {
+
+#ifdef _DEBUG_CHECKPOINTING
+ printf("parsing Tree 0\n");
+#endif
+
+ treeReadTopologyString(tr->tree0, tr);
+
+ bitVectorInitravSpecial(tr->bitVectors, tr->nodep[1]->back, tr->mxtips, tr->vLength, tr->h, 0, BIPARTITIONS_RF, (branchInfo *)NULL,
+ &bCounter, 1, FALSE, FALSE);
+
+ assert(bCounter == tr->mxtips - 3);
+ }
+
+ bCounter = 0;
+
+ if((ckp.state == FAST_SPRS && ckp.fastIterations > 1) ||
+ (ckp.state == SLOW_SPRS && ckp.thoroughIterations > 1))
+ {
+
+#ifdef _DEBUG_CHECKPOINTING
+ printf("parsing Tree 1\n");
+#endif
+
+ treeReadTopologyString(tr->tree1, tr);
+
+ bitVectorInitravSpecial(tr->bitVectors, tr->nodep[1]->back, tr->mxtips, tr->vLength, tr->h, 1, BIPARTITIONS_RF, (branchInfo *)NULL,
+ &bCounter, 1, FALSE, FALSE);
+
+ assert(bCounter == tr->mxtips - 3);
+ }
+ }
+
+
+ if(tr->rateHetModel == CAT )
+ {
+ /* every process reads its data */
+
+ /* Andre will this also work if we re-start with a different
+ number of processors? have you tested?
+
+ => Andre: yes that works: before writing the checkpoint, we
+ gather all lhs/patrat with gatherDistributedCatInfos. This
+ function calls gatherDisributedArray in
+ communication.c. gatherDistributedArray takes care of
+ reordering the data it obtained from the various processes,
+ such the correct global array (i.e., indexing consistent with
+ character position) is obtained. Thus, the indexing below
+ (for reading in the patrat/lhs again) works correctly. */
+
+
+ /* Andre I think tr->originalCrunchedLength is of type size_t???
+ -> casting required ?
+
+ Andre: in the very worst case, pPos overflows. There is not
+ much one can do here. See explanation about fseek/fseeko at
+ other location. But I have added an assert, in case something
+ goes wrong */
+ exa_off_t
+ rPos = exa_ftell(f),
+ pPos = rPos + sizeof(int) * tr->originalCrunchedLength;
+
+ /* fails, in case reading failed (ftello returns -1) or an overflow happened */
+ assert( ! ( rPos < 0 || pPos < 0 ) && rPos <= pPos ) ;
+
+ /* first patrat then rateCategory */
+
+ Assign *aIter = tr->partAssigns,
+ *aEnd = &(tr->partAssigns [ tr->numAssignments ]) ;
+
+ /* Andre coould you maybe add a drawing (scanned drawn by hand if you like) documenting this layout ? => Andre: TODO */
+
+ while(aIter != aEnd)
+ {
+ if(aIter->procId == processID)
+ {
+ pInfo
+ *partition = &(tr->partitionData[aIter->partitionId]);
+ exa_off_t
+ theOffset = pPos + (partition->lower + aIter->offset) * sizeof(double);
+ assert(pPos <= theOffset);
+
+ exa_fseek(f, theOffset, SEEK_SET);
+
+ myBinFread(partition->patrat, sizeof(double), aIter->width, f);
+
+ theOffset = rPos + (partition->lower + aIter->offset) * sizeof(int);
+ assert(rPos <= theOffset);
+ exa_fseek(f, theOffset, SEEK_SET);
+ myBinFread(partition->rateCategory, sizeof(int), aIter->width, f);
+ }
+ ++aIter;
+ }
+
+ /* Set file pointer to the end of both of the arrays */
+ exa_fseek(f, pPos + tr->originalCrunchedLength * sizeof(double) , SEEK_SET);
+ }
+
+
+
+
+
+
+ //end
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ int
+ dataType = tr->partitionData[model].dataType;
+
+ myBinFread(&(tr->partitionData[model].numberOfCategories), sizeof(int), 1, f);
+ myBinFread(tr->partitionData[model].perSiteRates, sizeof(double), tr->maxCategories, f);
+ myBinFread(tr->partitionData[model].EIGN, sizeof(double), pLengths[dataType].eignLength, f);
+ myBinFread(tr->partitionData[model].EV, sizeof(double), pLengths[dataType].evLength, f);
+ myBinFread(tr->partitionData[model].EI, sizeof(double), pLengths[dataType].eiLength, f);
+
+ myBinFread(tr->partitionData[model].freqExponents, sizeof(double), pLengths[dataType].frequenciesLength, f);
+ myBinFread(tr->partitionData[model].frequencies, sizeof(double), pLengths[dataType].frequenciesLength, f);
+ myBinFread(tr->partitionData[model].tipVector, sizeof(double), pLengths[dataType].tipVectorLength, f);
+ myBinFread(tr->partitionData[model].substRates, sizeof(double), pLengths[dataType].substRatesLength, f);
+
+ //LG4X related variables
+
+ myBinFread(tr->partitionData[model].weights , sizeof(double), 4, f);
+ myBinFread(tr->partitionData[model].weightExponents , sizeof(double), 4, f);
+ //myBinFread(tr->partitionData[model].weightsBuffer , sizeof(double), 4, f);
+ //myBinFread(tr->partitionData[model].weightExponentsBuffer , sizeof(double), 4, f);
+
+ //LG4X end
+
+ if(tr->partitionData[model].protModels == LG4X || tr->partitionData[model].protModels == LG4M)
+ {
+ int
+ k;
+
+ for(k = 0; k < 4; k++)
+ {
+ myBinFread(tr->partitionData[model].rawEIGN_LG4[k], sizeof(double), pLengths[dataType].eignLength, f);
+ myBinFread(tr->partitionData[model].EIGN_LG4[k], sizeof(double), pLengths[dataType].eignLength, f);
+ myBinFread(tr->partitionData[model].EV_LG4[k], sizeof(double), pLengths[dataType].evLength, f);
+ myBinFread(tr->partitionData[model].EI_LG4[k], sizeof(double), pLengths[dataType].eiLength, f);
+ myBinFread(tr->partitionData[model].frequencies_LG4[k], sizeof(double), pLengths[dataType].frequenciesLength, f);
+ myBinFread(tr->partitionData[model].tipVector_LG4[k], sizeof(double), pLengths[dataType].tipVectorLength, f);
+ myBinFread(tr->partitionData[model].substRates_LG4[k], sizeof(double), pLengths[dataType].substRatesLength, f);
+ }
+ }
+
+
+ myBinFread(&(tr->partitionData[model].alpha), sizeof(double), 1, f);
+ myBinFread(&(tr->partitionData[model].gammaRates), sizeof(double), 4, f);
+ //conditional added by Andre modified by me
+ //only overwrite values of discrete gamma cats by calling makeGammaCats if not using
+ //LG4X!
+ if(tr->rateHetModel != CAT && !(tr->partitionData[model].protModels == LG4X))
+ makeGammaCats(tr->partitionData[model].alpha, tr->partitionData[model].gammaRates, 4, tr->useMedian);
+
+ myBinFread(&(tr->partitionData[model].protModels), sizeof(int), 1, f);
+ myBinFread(&(tr->partitionData[model].autoProtModels), sizeof(int), 1, f);
+ }
+
+ if(ckp.state == MOD_OPT)
+ {
+ myBinFread(tr->likelihoods, sizeof(double), tr->numberOfTrees, f);
+ myBinFread(tr->treeStrings, sizeof(char), (size_t)tr->treeStringLength * (size_t)tr->numberOfTrees, f);
+ }
+
+ if(tr->rateHetModel == CAT)
+ checkPerSiteRates(tr);
+
+ readTree(tr, f);
+ fclose(f);
+}
+
+
+void restart(tree *tr, analdef *adef)
+{
+ readCheckpoint(tr, adef);
+
+ switch(ckp.state)
+ {
+ case REARR_SETTING:
+ assert(adef->mode == BIG_RAPID_MODE);
+ break;
+ case FAST_SPRS:
+ assert(adef->mode == BIG_RAPID_MODE);
+ break;
+ case SLOW_SPRS:
+ assert(adef->mode == BIG_RAPID_MODE);
+ break;
+ case MOD_OPT:
+ assert(adef->mode == TREE_EVALUATION);
+ break;
+ case QUARTETS:
+ assert(adef->mode == QUARTET_CALCULATION);
+ break;
+ default:
+ assert(0);
+ }
+}
+
+int determineRearrangementSetting(tree *tr, analdef *adef, bestlist *bestT, bestlist *bt, bestlist *bestML)
+{
+ const
+ int MaxFast = 26;
+
+ int
+ i,
+ maxtrav = 5,
+ bestTrav = 5;
+
+ double
+ startLH = tr->likelihood;
+
+ boolean
+ impr = TRUE,
+ cutoff = tr->doCutoff;
+
+ if(adef->useCheckpoint)
+ {
+ assert(ckp.state == REARR_SETTING);
+
+ maxtrav = ckp.maxtrav;
+ bestTrav = ckp.bestTrav;
+ startLH = ckp.startLH;
+ impr = ckp.impr;
+
+ cutoff = ckp.cutoff;
+
+ adef->useCheckpoint = FALSE;
+ }
+
+ tr->doCutoff = FALSE;
+
+ resetBestTree(bt);
+
+#ifdef _DEBUG_CHECKPOINTING
+ printBothOpen("MAXTRAV: %d\n", maxtrav);
+#endif
+
+ assert(Thorough == 0);
+
+ while(impr && maxtrav < MaxFast)
+ {
+ recallBestTree(bestT, 1, tr);
+ nodeRectifier(tr);
+
+ /* Andre I believe that the code below, except for
+ writeCheckpoint cann still only be executed by process 0 =>
+ Andre: all other processes need to enter writeCheckpoint,
+ because of the gather that happens there. But the assignments
+ to the checkpoint state are not necessary for all processes;
+ does it matter? */
+ {
+ ckp.optimizeRateCategoryInvocations = optimizeRateCategoryInvocations;
+
+ ckp.cutoff = cutoff;
+ ckp.state = REARR_SETTING;
+ ckp.maxtrav = maxtrav;
+ ckp.bestTrav = bestTrav;
+ ckp.startLH = startLH;
+ ckp.impr = impr;
+
+ ckp.tr_startLH = tr->startLH;
+ ckp.tr_endLH = tr->endLH;
+ ckp.tr_likelihood = tr->likelihood;
+ ckp.tr_bestOfNode = tr->bestOfNode;
+
+ ckp.tr_lhCutoff = tr->lhCutoff;
+ ckp.tr_lhAVG = tr->lhAVG;
+ ckp.tr_lhDEC = tr->lhDEC;
+ ckp.tr_itCount = tr->itCount;
+
+
+ writeCheckpoint(tr, adef);
+ }
+
+ if (maxtrav > tr->mxtips - 3)
+ maxtrav = tr->mxtips - 3;
+
+ tr->startLH = tr->endLH = tr->likelihood;
+
+ /* printBothOpen("TRAV: %d lh %f MNZC %d\n", maxtrav, tr->likelihood, mnzc); */
+
+ {
+ int changes = 0;
+
+ for(i = 1; i <= tr->mxtips + tr->mxtips - 2; i++)
+ {
+ tr->bestOfNode = unlikely;
+
+ if(rearrangeBIG(tr, tr->nodep[i], 1, maxtrav))
+ {
+ if(tr->endLH > tr->startLH)
+ {
+ restoreTreeFast(tr);
+ tr->startLH = tr->endLH = tr->likelihood;
+ changes++;
+ }
+ }
+ }
+
+
+ /*
+ evaluateGeneric(tr, tr->start, TRUE);
+
+ printBothOpen("Changes: %d TRAV: %d lh %f MNZC %d\n", changes, maxtrav, tr->likelihood, mnzc);
+ */
+ }
+
+ treeEvaluate(tr, 0.25);
+
+ /* printBothOpen("TRAV: %d lh %f MNZC %d\n", maxtrav, tr->likelihood, mnzc); */
+
+ saveBestTree(bt, tr, TRUE);
+ if(tr->saveBestTrees)
+ saveBestTree(bestML, tr, FALSE);
+
+#ifdef _DEBUG_CHECKPOINTING
+ printBothOpen("TRAV: %d lh %f MNZC %d\n", maxtrav, tr->likelihood, mnzc);
+#endif
+
+ if(tr->likelihood > startLH)
+ {
+ startLH = tr->likelihood;
+ printLog(tr);
+ bestTrav = maxtrav;
+ impr = TRUE;
+ }
+ else
+ impr = FALSE;
+
+
+
+ if(tr->doCutoff)
+ {
+ tr->lhCutoff = (tr->lhAVG) / ((double)(tr->lhDEC));
+
+ tr->itCount = tr->itCount + 1;
+ tr->lhAVG = 0;
+ tr->lhDEC = 0;
+ }
+
+ maxtrav += 5;
+
+
+ }
+
+ recallBestTree(bt, 1, tr);
+
+ tr->doCutoff = cutoff;
+
+#ifdef _DEBUG_CHECKPOINTING
+ printBothOpen("BestTrav %d\n", bestTrav);
+#endif
+
+ return bestTrav;
+}
+
+
+
+
+
+void computeBIGRAPID (tree *tr, analdef *adef, boolean estimateModel)
+{
+ int
+ i,
+ impr,
+ bestTrav = 0,
+ treeVectorLength = 0,
+ rearrangementsMax = 0,
+ rearrangementsMin = 0,
+ thoroughIterations = 0,
+ fastIterations = 0;
+
+ double
+ lh = unlikely,
+ previousLh = unlikely,
+ difference,
+ epsilon;
+
+ bestlist
+ *bestML,
+ *bestT,
+ *bt;
+
+ /* now here is the RAxML hill climbing search algorithm */
+
+ tr->lhAVG = 0.0;
+ tr->lhDEC = 0.0;
+
+ /* initialization for the hash table to compute RF distances */
+
+ if(tr->searchConvergenceCriterion && processID == 0)
+ treeVectorLength = 1;
+
+ /* initialize two lists of size 1 and size 20 that will keep track of the best
+ and 20 best tree topologies respectively */
+
+ bestT = (bestlist *) malloc(sizeof(bestlist));
+ bestT->ninit = 0;
+ initBestTree(bestT, 1, tr->mxtips);
+
+ bt = (bestlist *) malloc(sizeof(bestlist));
+ bt->ninit = 0;
+ initBestTree(bt, 20, tr->mxtips);
+
+
+
+ if(tr->saveBestTrees > 0)
+ {
+ bestML = (bestlist *) malloc(sizeof(bestlist));
+ bestML->ninit = 0;
+ initBestTree(bestML, tr->saveBestTrees, tr->mxtips);
+ }
+ else
+ bestML = (bestlist *)NULL;
+
+
+ /* initialize an additional data structure used by the search algo, all of this is pretty
+ RAxML-specific and should probably not be in the library */
+
+ initInfoList(50);
+
+ /* some pretty atbitrary thresholds */
+
+ difference = 10.0;
+ epsilon = 0.01;
+
+ /* Thorough = 0 means that we will do fast SPR inbsertions without optimizing the
+ three branches adjacent to the subtree insertion position via Newton-Raphson
+ */
+
+ Thorough = 0;
+
+ /* if we are not using a checkpoint and estimateModel is set to TRUE we call the function
+ that optimizes model parameters, such as the CAT model assignment, the alpha paremeter
+ or the rates in the GTR matrix. Otherwise we just optimize the branch lengths. Note that
+ the second parameter of treeEvaluate() controls how many times we will iterate over all branches
+ of the tree until we give up, provided that, the br-len opt. has not converged before.
+ */
+
+ if(!adef->useCheckpoint)
+ {
+ if(estimateModel)
+ modOpt(tr, 10.0, adef, 0);
+ else
+ treeEvaluate(tr, 2);
+ }
+
+ /* print some stuff to the RAxML_log file */
+
+ printLog(tr);
+
+ /* save the current tree (which is the input tree parsed via -t in the bestT list */
+
+ saveBestTree(bestT, tr, TRUE);
+
+ /* if the rearrangmenet radius has been set by the user ie. adef->initailSet == TRUE
+ then just set the apppropriate parameter.
+ Otherwise, call the function determineRearrangementSetting() that seeks
+ for the best radius by executing SPR moves on the initial tree with different radii
+ and returns the smallest radius that yields the best log likelihood score after
+ applying one cycle of SPR moves to the tree
+ */
+
+ if(!adef->initialSet)
+ {
+ if((!adef->useCheckpoint) || (adef->useCheckpoint && ckp.state == REARR_SETTING))
+ {
+ bestTrav = adef->bestTrav = determineRearrangementSetting(tr, adef, bestT, bt, bestML);
+ printBothOpen("\nBest rearrangement radius: %d\n", bestTrav);
+ }
+ }
+ else
+ {
+ bestTrav = adef->bestTrav = adef->initial;
+ printBothOpen("\nUser-defined rearrangement radius: %d\n", bestTrav);
+ }
+
+
+ /* some checkpointing noise */
+ if(!(adef->useCheckpoint && (ckp.state == FAST_SPRS || ckp.state == SLOW_SPRS)))
+ {
+
+ /* optimize model params more thoroughly or just optimize branch lengths */
+ if(estimateModel)
+ modOpt(tr, 5.0, adef, 0);
+ else
+ treeEvaluate(tr, 1);
+ }
+
+ /* save the current tree again, while the topology has not changed, the branch lengths have changed in the meantime, hence
+ we need to store them again */
+
+ saveBestTree(bestT, tr, TRUE);
+
+ /* set the loop variable to TRUE */
+
+ impr = 1;
+
+ /* this is for the additional RAxML heuristics described imn this paper here:
+
+ A. Stamatakis, F. Blagojevic, C.D. Antonopoulos, D.S. Nikolopoulos: "Exploring new Search Algorithms and Hardware for Phylogenetics: RAxML meets the IBM Cell".
+ In Journal of VLSI Signal Processing Systems, 48(3):271-286, 2007.
+
+ This is turned on by default
+ */
+
+
+ if(tr->doCutoff)
+ tr->itCount = 0;
+
+ /* figure out where to continue computations if we restarted from a checkpoint */
+
+ if(adef->useCheckpoint && ckp.state == FAST_SPRS)
+ goto START_FAST_SPRS;
+
+ if(adef->useCheckpoint && ckp.state == SLOW_SPRS)
+ goto START_SLOW_SPRS;
+
+ while(impr)
+ {
+ START_FAST_SPRS:
+ /* if re-starting from checkpoint set the required variable values to the
+ values that they had when the checkpoint was written */
+
+ if(adef->useCheckpoint && ckp.state == FAST_SPRS)
+ {
+ optimizeRateCategoryInvocations = ckp.optimizeRateCategoryInvocations;
+
+
+ impr = ckp.impr;
+ Thorough = ckp.Thorough;
+ bestTrav = ckp.bestTrav;
+ treeVectorLength = ckp.treeVectorLength;
+ rearrangementsMax = ckp.rearrangementsMax;
+ rearrangementsMin = ckp.rearrangementsMin;
+ thoroughIterations = ckp.thoroughIterations;
+ fastIterations = ckp.fastIterations;
+
+
+ lh = ckp.lh;
+ previousLh = ckp.previousLh;
+ difference = ckp.difference;
+ epsilon = ckp.epsilon;
+
+
+ tr->likelihood = ckp.tr_likelihood;
+
+ tr->lhCutoff = ckp.tr_lhCutoff;
+ tr->lhAVG = ckp.tr_lhAVG;
+ tr->lhDEC = ckp.tr_lhDEC;
+ tr->itCount = ckp.tr_itCount;
+
+ adef->useCheckpoint = FALSE;
+ }
+ else
+ /* otherwise, restore the currently best tree */
+ recallBestTree(bestT, 1, tr);
+
+ /* save states of algorithmic/heuristic variables for printing the next checkpoint */
+
+ /*
+ Andre I believe that the code below, except for
+ writeCheckpoint cann still only be executed by process 0 =>
+ Andre: see above
+ */
+ {
+ ckp.state = FAST_SPRS;
+ ckp.optimizeRateCategoryInvocations = optimizeRateCategoryInvocations;
+
+
+ ckp.impr = impr;
+ ckp.Thorough = Thorough;
+ ckp.bestTrav = bestTrav;
+ ckp.treeVectorLength = treeVectorLength;
+ ckp.rearrangementsMax = rearrangementsMax;
+ ckp.rearrangementsMin = rearrangementsMin;
+ ckp.thoroughIterations = thoroughIterations;
+ ckp.fastIterations = fastIterations;
+
+
+ ckp.lh = lh;
+ ckp.previousLh = previousLh;
+ ckp.difference = difference;
+ ckp.epsilon = epsilon;
+
+
+ ckp.bestTrav = bestTrav;
+ ckp.impr = impr;
+
+ ckp.tr_startLH = tr->startLH;
+ ckp.tr_endLH = tr->endLH;
+ ckp.tr_likelihood = tr->likelihood;
+ ckp.tr_bestOfNode = tr->bestOfNode;
+
+ ckp.tr_lhCutoff = tr->lhCutoff;
+ ckp.tr_lhAVG = tr->lhAVG;
+ ckp.tr_lhDEC = tr->lhDEC;
+ ckp.tr_itCount = tr->itCount;
+
+ /* write a binary checkpoint */
+ writeCheckpoint(tr, adef);
+ }
+
+ /* this is the aforementioned convergence criterion that requires computing the RF,
+ let's not worry about this right now */
+
+ if(tr->searchConvergenceCriterion && processID == 0)
+ {
+ int
+ bCounter = 0;
+
+ char
+ *buffer = (char*)calloc(tr->treeStringLength, sizeof(char));
+
+ if(fastIterations > 1)
+ cleanupHashTable(tr->h, (fastIterations % 2));
+
+
+ bitVectorInitravSpecial(tr->bitVectors, tr->nodep[1]->back, tr->mxtips, tr->vLength, tr->h, fastIterations % 2, BIPARTITIONS_RF, (branchInfo *)NULL,
+ &bCounter, 1, FALSE, FALSE);
+
+
+#ifdef _DEBUG_CHECKPOINTING
+ printf("Storing tree in slot %d\n", fastIterations % 2);
+#endif
+
+ Tree2String(buffer, tr, tr->start->back, FALSE, TRUE, FALSE, FALSE, FALSE, SUMMARIZE_LH, FALSE, FALSE);
+
+ if(fastIterations % 2 == 0)
+ memcpy(tr->tree0, buffer, tr->treeStringLength * sizeof(char));
+ else
+ memcpy(tr->tree1, buffer, tr->treeStringLength * sizeof(char));
+
+ free(buffer);
+
+ assert(bCounter == tr->mxtips - 3);
+
+ if(fastIterations > 0)
+ {
+ double
+ rrf = convergenceCriterion(tr->h, tr->mxtips);
+
+ MPI_Bcast(&rrf, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+
+ if(rrf <= 0.01) /* 1% cutoff */
+ {
+ printBothOpen("ML fast search converged at fast SPR cycle %d with stopping criterion\n", fastIterations);
+ printBothOpen("Relative Robinson-Foulds (RF) distance between respective best trees after one succseful SPR cycle: %f%s\n", rrf, "%");
+ cleanupHashTable(tr->h, 0);
+ cleanupHashTable(tr->h, 1);
+ goto cleanup_fast;
+ }
+ else
+ printBothOpen("ML search convergence criterion fast cycle %d->%d Relative Robinson-Foulds %f\n", fastIterations - 1, fastIterations, rrf);
+ }
+ }
+
+ if(tr->searchConvergenceCriterion && processID != 0 && fastIterations > 0)
+ {
+ double
+ rrf;
+
+ MPI_Bcast(&rrf, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+
+ if(rrf <= 0.01) /* 1% cutoff */
+ goto cleanup_fast;
+ }
+
+
+ /* count how many fast iterations with so-called fast SPR moves we have executed */
+
+ fastIterations++;
+
+ /* optimize branch lengths */
+
+
+ treeEvaluate(tr, 1.0);
+
+ /* save the tree with those branch lengths again */
+
+ saveBestTree(bestT, tr, TRUE);
+
+ /* print the log likelihood */
+
+ printLog(tr);
+
+ /* print this intermediate tree to file */
+
+ printResult(tr, adef, FALSE);
+
+ /* update the current best likelihood */
+
+ lh = previousLh = tr->likelihood;
+
+ /* in here we actually do a cycle of SPR moves */
+
+ treeOptimizeRapid(tr, 1, bestTrav, adef, bt, bestML);
+
+ /* set impr to 0 since in the immediately following for loop we check if the SPR moves above have generated
+ a better tree */
+
+ impr = 0;
+
+ /* loop over the 20 best trees generated by the fast SPR moves, and check if they improve the likelihood after all of their branch lengths
+ have been optimized */
+
+ for(i = 1; i <= bt->nvalid; i++)
+ {
+ /* restore tree i from list generated by treeOptimizeRapid */
+
+ recallBestTree(bt, i, tr);
+
+ /* optimize branch lengths of this tree */
+
+ treeEvaluate(tr, 0.25);
+
+ /* calc. the likelihood improvement */
+
+ difference = ((tr->likelihood > previousLh)?
+ tr->likelihood - previousLh:
+ previousLh - tr->likelihood);
+
+ /* if the likelihood has improved save the current tree as best tree and continue */
+ /* note that we always compre this tree to the likelihood of the previous best tree */
+
+ if(tr->likelihood > lh && difference > epsilon)
+ {
+ impr = 1;
+ lh = tr->likelihood;
+ saveBestTree(bestT, tr, TRUE);
+
+ }
+ }
+#ifdef _DEBUG_CHECKPOINTING
+ printBothOpen("FAST LH: %f\n", lh);
+#endif
+
+
+ }
+
+ /* needed for this RF-based convergence criterion that I actually describe in here:
+
+ A. Stamatakis: "Phylogenetic Search Algorithms for Maximum Likelihood". In M. Elloumi, A.Y. Zomaya, editors.
+ Algorithms in Computational Biology: techniques, Approaches and Applications, John Wiley and Sons
+
+ a copy of this book is in my office */
+
+ if(tr->searchConvergenceCriterion && processID == 0)
+ {
+ cleanupHashTable(tr->h, 0);
+ cleanupHashTable(tr->h, 1);
+ }
+
+ cleanup_fast:
+ /*
+ now we have jumped out of the loop that executes
+ fast SPRs, and next we will execute a loop that executes throough SPR cycles (with SPR moves
+ that optimize via newton-Raphson all adjacent branches to the insertion point)
+ until no through SPR move can be found that improves the likelihood further. A classic
+ hill climbing algo.
+ */
+
+ Thorough = 1;
+ impr = 1;
+
+ /* restore the currently best tree. this si actually required, because we do not know which tree
+ is actually stored in the tree data structure when the above loop exits */
+
+ recallBestTree(bestT, 1, tr);
+
+ /* RE-TRAVERSE THE ENTIRE TREE */
+
+ evaluateGeneric(tr, tr->start, TRUE);
+#ifdef _DEBUG_CHECKPOINTING
+ printBothOpen("After Fast SPRs Final %f\n", tr->likelihood);
+#endif
+
+ /* optimize model params (including branch lengths) or just
+ optimize branch lengths and leave the other model parameters (GTR rates, alhpa)
+ alone */
+
+ if(estimateModel)
+ modOpt(tr, 1.0, adef, 0);
+ else
+ treeEvaluate(tr, 1.0);
+
+ /* start loop that executes thorough SPR cycles */
+
+ while(1)
+ {
+ /* once again if we want to restart from a checkpoint that was written during this loop we need
+ to restore the values of the variables appropriately */
+ START_SLOW_SPRS:
+ if(adef->useCheckpoint && ckp.state == SLOW_SPRS)
+ {
+ optimizeRateCategoryInvocations = ckp.optimizeRateCategoryInvocations;
+
+
+
+
+ impr = ckp.impr;
+ Thorough = ckp.Thorough;
+ bestTrav = ckp.bestTrav;
+ treeVectorLength = ckp.treeVectorLength;
+ rearrangementsMax = ckp.rearrangementsMax;
+ rearrangementsMin = ckp.rearrangementsMin;
+ thoroughIterations = ckp.thoroughIterations;
+ fastIterations = ckp.fastIterations;
+
+
+ lh = ckp.lh;
+ previousLh = ckp.previousLh;
+ difference = ckp.difference;
+ epsilon = ckp.epsilon;
+
+
+ tr->likelihood = ckp.tr_likelihood;
+
+ tr->lhCutoff = ckp.tr_lhCutoff;
+ tr->lhAVG = ckp.tr_lhAVG;
+ tr->lhDEC = ckp.tr_lhDEC;
+ tr->itCount = ckp.tr_itCount;
+
+ adef->useCheckpoint = FALSE;
+ }
+ else
+ /* otherwise we restore the currently best tree and load it from bestT into our tree data
+ structuire tr */
+ recallBestTree(bestT, 1, tr);
+
+ /* now, we write a checkpoint */
+ /* Andre I believe that the code below, except for
+ writeCheckpoint cann still only be executed by process 0
+ => Andre: see above */
+ {
+ ckp.state = SLOW_SPRS;
+ ckp.optimizeRateCategoryInvocations = optimizeRateCategoryInvocations;
+
+
+ ckp.impr = impr;
+ ckp.Thorough = Thorough;
+ ckp.bestTrav = bestTrav;
+ ckp.treeVectorLength = treeVectorLength;
+ ckp.rearrangementsMax = rearrangementsMax;
+ ckp.rearrangementsMin = rearrangementsMin;
+ ckp.thoroughIterations = thoroughIterations;
+ ckp.fastIterations = fastIterations;
+
+
+ ckp.lh = lh;
+ ckp.previousLh = previousLh;
+ ckp.difference = difference;
+ ckp.epsilon = epsilon;
+
+
+ ckp.bestTrav = bestTrav;
+ ckp.impr = impr;
+
+ ckp.tr_startLH = tr->startLH;
+ ckp.tr_endLH = tr->endLH;
+ ckp.tr_likelihood = tr->likelihood;
+ ckp.tr_bestOfNode = tr->bestOfNode;
+
+ ckp.tr_lhCutoff = tr->lhCutoff;
+ ckp.tr_lhAVG = tr->lhAVG;
+ ckp.tr_lhDEC = tr->lhDEC;
+ ckp.tr_itCount = tr->itCount;
+
+ /* write binary checkpoint to file */
+
+ writeCheckpoint(tr, adef);
+ }
+
+ if(impr)
+ {
+ /* if the logl has improved write out some stuff and adapt the rearrangement radii */
+ printResult(tr, adef, FALSE);
+ /* minimum rearrangement radius */
+ rearrangementsMin = 1;
+ /* max radius, this is probably something I need to explain at the whiteboard */
+ rearrangementsMax = adef->stepwidth;
+
+ /* once again the convergence criterion */
+
+ if(tr->searchConvergenceCriterion && processID == 0)
+ {
+ int
+ bCounter = 0;
+
+ char
+ *buffer = (char*)calloc(tr->treeStringLength, sizeof(char));
+
+ if(thoroughIterations > 1)
+ cleanupHashTable(tr->h, (thoroughIterations % 2));
+
+ bitVectorInitravSpecial(tr->bitVectors, tr->nodep[1]->back, tr->mxtips, tr->vLength, tr->h, thoroughIterations % 2, BIPARTITIONS_RF, (branchInfo *)NULL,
+ &bCounter, 1, FALSE, FALSE);
+
+
+#ifdef _DEBUG_CHECKPOINTING
+ printf("Storing tree in slot %d\n", thoroughIterations % 2);
+#endif
+
+ Tree2String(buffer, tr, tr->start->back, FALSE, TRUE, FALSE, FALSE, FALSE, SUMMARIZE_LH, FALSE, FALSE);
+
+ if(thoroughIterations % 2 == 0)
+ memcpy(tr->tree0, buffer, tr->treeStringLength * sizeof(char));
+ else
+ memcpy(tr->tree1, buffer, tr->treeStringLength * sizeof(char));
+
+ free(buffer);
+
+ assert(bCounter == tr->mxtips - 3);
+
+ if(thoroughIterations > 0)
+ {
+ double
+ rrf = convergenceCriterion(tr->h, tr->mxtips);
+
+ MPI_Bcast(&rrf, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+
+ if(rrf <= 0.01) /* 1% cutoff */
+ {
+ printBothOpen("ML search converged at thorough SPR cycle %d with stopping criterion\n", thoroughIterations);
+ printBothOpen("Relative Robinson-Foulds (RF) distance between respective best trees after one succseful SPR cycle: %f%s\n", rrf, "%");
+ goto cleanup;
+ }
+ else
+ printBothOpen("ML search convergence criterion thorough cycle %d->%d Relative Robinson-Foulds %f\n", thoroughIterations - 1, thoroughIterations, rrf);
+ }
+ }
+
+ if(tr->searchConvergenceCriterion && processID != 0 && thoroughIterations > 0)
+ {
+ double
+ rrf;
+
+ MPI_Bcast(&rrf, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+
+ if(rrf <= 0.01) /* 1% cutoff */
+ goto cleanup;
+ }
+
+
+
+ thoroughIterations++;
+ }
+ else
+ {
+ /* if the lnl has not imrpved by the current SPR cycle adapt the min and max rearrangemnt radii and try again */
+
+ rearrangementsMax += adef->stepwidth;
+ rearrangementsMin += adef->stepwidth;
+
+ /* if we have already tried them then abandon this loop, the search has converged */
+ if(rearrangementsMax > adef->max_rearrange)
+ goto cleanup;
+ }
+
+ /* optimize branch lengths of best tree */
+
+ treeEvaluate(tr, 1.0);
+
+ /* do some bokkeeping and printouts again */
+ previousLh = lh = tr->likelihood;
+ saveBestTree(bestT, tr, TRUE);
+ printLog(tr);
+
+ /* do a cycle of thorough SPR moves with the minimum and maximum rearrangement radii */
+
+ treeOptimizeRapid(tr, rearrangementsMin, rearrangementsMax, adef, bt, bestML);
+
+ impr = 0;
+
+ /* once again get the best 20 trees produced by the SPR cycle, load them from the bt tree list into tr
+ optimize their branch lengths and figure out if the LnL of the tree has improved */
+
+ for(i = 1; i <= bt->nvalid; i++)
+ {
+ recallBestTree(bt, i, tr);
+
+ treeEvaluate(tr, 0.25);
+
+ difference = ((tr->likelihood > previousLh)?
+ tr->likelihood - previousLh:
+ previousLh - tr->likelihood);
+ if(tr->likelihood > lh && difference > epsilon)
+ {
+ impr = 1;
+ lh = tr->likelihood;
+ saveBestTree(bestT, tr, TRUE);
+ }
+ }
+
+#ifdef _DEBUG_CHECKPOINTING
+ printBothOpen("SLOW LH: %f\n", lh);
+#endif
+ }
+
+ cleanup:
+
+ /* do a final full tree traversal, not sure if this is required here */
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+#ifdef _DEBUG_CHECKPOINTING
+ printBothOpen("After SLOW SPRs Final %f\n", tr->likelihood);
+#endif
+
+ printBothOpen("\nLikelihood of best tree: %f\n", tr->likelihood);
+ /* print the absolut best tree */
+
+ printLog(tr);
+ printResult(tr, adef, TRUE);
+
+ /* print other good trees encountered during the search */
+
+ if(tr->saveBestTrees > 0)
+ {
+ char
+ fileName[2048] = "",
+ buf[64] = "";
+
+ printBothOpen("\n\nEvaluating %d other good ML trees\n\n", bestML->nvalid);
+
+ for(i = 1; i <= bestML->nvalid; i++)
+ {
+ recallBestTree(bestML, i, tr);
+ /*treeEvaluate(tr, 0.25);*/
+ printBothOpen("tree %d likelihood %1.80f\n", i, tr->likelihood);
+
+ if(processID == 0)
+ {
+ FILE
+ *treeFile;
+
+ strcpy(fileName, workdir);
+ strcat(fileName, "RAxML_");
+ sprintf(buf, "%d", bestML->nvalid);
+ strcat(fileName, buf);
+ strcat(fileName, "_goodTrees.");
+ strcat(fileName, run_id);
+
+ treeFile = myfopen(fileName, "a");
+
+ Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, TRUE, SUMMARIZE_LH, FALSE, FALSE);
+
+ fprintf(treeFile, "%s", tr->tree_string);
+ fclose(treeFile);
+ }
+
+
+ }
+
+ printBothOpen("\n\nOther good trees written to file %s\n", fileName);
+ }
+
+
+ /* free data structures */
+
+ if(tr->searchConvergenceCriterion && processID == 0)
+ {
+ freeBitVectors(tr->bitVectors, 2 * tr->mxtips);
+ free(tr->bitVectors);
+ freeHashTable(tr->h);
+ free(tr->h);
+ }
+
+ freeBestTree(bestT);
+ free(bestT);
+ freeBestTree(bt);
+ free(bt);
+ freeInfoList();
+
+
+ /* and we are done, return to main() in axml.c */
+
+}
+
+
+
+boolean treeEvaluate (tree *tr, double smoothFactor) /* Evaluate a user tree */
+{
+ boolean result;
+
+
+ result = smoothTree(tr, (int)((double)smoothings * smoothFactor));
+
+ assert(result);
+
+ //make sure that all vectors are oriented correctly !
+
+ evaluateGeneric(tr, tr->start, TRUE);
+
+
+ return TRUE;
+}
+
diff --git a/examl/topologies.c b/examl/topologies.c
new file mode 100644
index 0000000..3e30bf8
--- /dev/null
+++ b/examl/topologies.c
@@ -0,0 +1,653 @@
+
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+
+#include "axml.h"
+
+
+
+
+
+
+static void saveTopolRELLRec(tree *tr, nodeptr p, topolRELL *tpl, int *i, int numsp, int numBranches)
+{
+ int k;
+ if(isTip(p->number, numsp))
+ return;
+ else
+ {
+ nodeptr q = p->next;
+ while(q != p)
+ {
+ tpl->connect[*i].p = q;
+ tpl->connect[*i].q = q->back;
+
+ if(tr->constraintTree)
+ {
+ tpl->connect[*i].cp = tr->constraintVector[q->number];
+ tpl->connect[*i].cq = tr->constraintVector[q->back->number];
+ }
+
+ for(k = 0; k < numBranches; k++)
+ tpl->connect[*i].z[k] = q->z[k];
+ *i = *i + 1;
+
+ saveTopolRELLRec(tr, q->back, tpl, i, numsp, numBranches);
+ q = q->next;
+ }
+ }
+}
+
+static void saveTopolRELL(tree *tr, topolRELL *tpl)
+{
+ nodeptr p = tr->start;
+ int k, i = 0;
+
+ tpl->likelihood = tr->likelihood;
+ tpl->start = 1;
+
+ tpl->connect[i].p = p;
+ tpl->connect[i].q = p->back;
+
+ if(tr->constraintTree)
+ {
+ tpl->connect[i].cp = tr->constraintVector[p->number];
+ tpl->connect[i].cq = tr->constraintVector[p->back->number];
+ }
+
+ for(k = 0; k < tr->numBranches; k++)
+ tpl->connect[i].z[k] = p->z[k];
+ i++;
+
+ saveTopolRELLRec(tr, p->back, tpl, &i, tr->mxtips, tr->numBranches);
+
+ assert(i == 2 * tr->mxtips - 3);
+}
+
+
+static void restoreTopolRELL(tree *tr, topolRELL *tpl)
+{
+ int i;
+
+ for (i = 0; i < 2 * tr->mxtips - 3; i++)
+ {
+ hookup(tpl->connect[i].p, tpl->connect[i].q, tpl->connect[i].z, tr->numBranches);
+ tr->constraintVector[tpl->connect[i].p->number] = tpl->connect[i].cp;
+ tr->constraintVector[tpl->connect[i].q->number] = tpl->connect[i].cq;
+ }
+
+
+ tr->likelihood = tpl->likelihood;
+ tr->start = tr->nodep[tpl->start];
+ /* TODO */
+}
+
+
+
+
+void initTL(topolRELL_LIST *rl, tree *tr, int n)
+{
+ int i;
+
+ rl->max = n;
+ rl->t = (topolRELL **)malloc(sizeof(topolRELL *) * n);
+
+ for(i = 0; i < n; i++)
+ {
+ rl->t[i] = (topolRELL *)malloc(sizeof(topolRELL));
+ rl->t[i]->connect = (connectRELL *)malloc((2 * tr->mxtips - 3) * sizeof(connectRELL));
+ rl->t[i]->likelihood = unlikely;
+ }
+}
+
+
+void freeTL(topolRELL_LIST *rl)
+{
+ int i;
+ for(i = 0; i < rl->max; i++)
+ {
+ free(rl->t[i]->connect);
+ free(rl->t[i]);
+ }
+ free(rl->t);
+}
+
+
+void restoreTL(topolRELL_LIST *rl, tree *tr, int n)
+{
+ assert(n >= 0 && n < rl->max);
+
+ restoreTopolRELL(tr, rl->t[n]);
+}
+
+
+
+
+void resetTL(topolRELL_LIST *rl)
+{
+ int i;
+
+ for(i = 0; i < rl->max; i++)
+ rl->t[i]->likelihood = unlikely;
+}
+
+
+
+
+void saveTL(topolRELL_LIST *rl, tree *tr, int index)
+{
+ assert(index >= 0 && index < rl->max);
+
+ if(tr->likelihood > rl->t[index]->likelihood)
+ saveTopolRELL(tr, rl->t[index]);
+}
+
+
+static void *tipValPtr (nodeptr p)
+{
+ return (void *) & p->number;
+}
+
+
+static int cmpTipVal (void *v1, void *v2)
+{
+ int i1, i2;
+
+ i1 = *((int *) v1);
+ i2 = *((int *) v2);
+ return (i1 < i2) ? -1 : ((i1 == i2) ? 0 : 1);
+}
+
+
+/* These are the only routines that need to UNDERSTAND topologies */
+
+static topol *setupTopol (int maxtips)
+{
+ topol *tpl;
+
+ if (! (tpl = (topol *) malloc(sizeof(topol))) ||
+ ! (tpl->links = (connptr) malloc((2*maxtips-3) * sizeof(connect))))
+ {
+ printf("ERROR: Unable to get topology memory");
+ tpl = (topol *) NULL;
+ }
+ else
+ {
+ tpl->likelihood = unlikely;
+ tpl->start = (node *) NULL;
+ tpl->nextlink = 0;
+ tpl->ntips = 0;
+ tpl->nextnode = 0;
+ tpl->scrNum = 0; /* position in sorted list of scores */
+ tpl->tplNum = 0; /* position in sorted list of trees */
+ }
+
+ return tpl;
+}
+
+
+static void freeTopol (topol *tpl)
+{
+ free(tpl->links);
+ free(tpl);
+}
+
+
+static int saveSubtree (nodeptr p, topol *tpl, int numsp, int numBranches)
+{
+ connptr r, r0;
+ nodeptr q, s;
+ int t, t0, t1, k;
+
+ r0 = tpl->links;
+ r = r0 + (tpl->nextlink)++;
+ r->p = p;
+ r->q = q = p->back;
+
+ for(k = 0; k < numBranches; k++)
+ r->z[k] = p->z[k];
+
+ r->descend = 0; /* No children (yet) */
+
+ if (isTip(q->number, numsp))
+ {
+ r->valptr = tipValPtr(q); /* Assign value */
+ }
+ else
+ { /* Internal node, look at children */
+ s = q->next; /* First child */
+ do
+ {
+ t = saveSubtree(s, tpl, numsp, numBranches); /* Generate child's subtree */
+
+ t0 = 0; /* Merge child into list */
+ t1 = r->descend;
+ while (t1 && (cmpTipVal(r0[t1].valptr, r0[t].valptr) < 0)) {
+ t0 = t1;
+ t1 = r0[t1].sibling;
+ }
+ if (t0) r0[t0].sibling = t; else r->descend = t;
+ r0[t].sibling = t1;
+
+ s = s->next; /* Next child */
+ } while (s != q);
+
+ r->valptr = r0[r->descend].valptr; /* Inherit first child's value */
+ } /* End of internal node processing */
+
+ return (r - r0);
+}
+
+
+static nodeptr minSubtreeTip (nodeptr p0, int numsp)
+{
+ nodeptr minTip, p, testTip;
+
+ if (isTip(p0->number, numsp))
+ return p0;
+
+ p = p0->next;
+
+ minTip = minSubtreeTip(p->back, numsp);
+
+ while ((p = p->next) != p0)
+ {
+ testTip = minSubtreeTip(p->back, numsp);
+ if (cmpTipVal(tipValPtr(testTip), tipValPtr(minTip)) < 0)
+ minTip = testTip;
+ }
+ return minTip;
+}
+
+
+static nodeptr minTreeTip (nodeptr p, int numsp)
+{
+ nodeptr minp, minpb;
+
+ minp = minSubtreeTip(p, numsp);
+ minpb = minSubtreeTip(p->back, numsp);
+ return (cmpTipVal(tipValPtr(minp), tipValPtr(minpb)) < 0 ? minp : minpb);
+}
+
+
+static void saveTree (tree *tr, topol *tpl)
+/* Save a tree topology in a standard order so that first branches
+ * from a node contain lower value tips than do second branches from
+ * the node. The root tip should have the lowest value of all.
+ */
+{
+ connptr r;
+
+ tpl->nextlink = 0; /* Reset link pointer */
+ r = tpl->links + saveSubtree(minTreeTip(tr->start, tr->mxtips), tpl, tr->mxtips, tr->numBranches); /* Save tree */
+ r->sibling = 0;
+
+ tpl->likelihood = tr->likelihood;
+ tpl->start = tr->start;
+ tpl->ntips = tr->ntips;
+ tpl->nextnode = tr->nextnode;
+
+} /* saveTree */
+
+
+static boolean restoreTree (topol *tpl, tree *tr)
+{
+ connptr r;
+ nodeptr p, p0;
+ int i;
+
+ for (i = 1; i <= 2*(tr->mxtips) - 2; i++)
+ {
+ /* Uses p = p->next at tip */
+ p0 = p = tr->nodep[i];
+ do
+ {
+ p->back = (nodeptr) NULL;
+ p = p->next;
+ }
+ while (p != p0);
+ }
+
+ /* Copy connections from topology */
+
+ for (r = tpl->links, i = 0; i < tpl->nextlink; r++, i++)
+ hookup(r->p, r->q, r->z, tr->numBranches);
+
+ tr->likelihood = tpl->likelihood;
+ tr->start = tpl->start;
+ tr->ntips = tpl->ntips;
+
+ tr->nextnode = tpl->nextnode;
+
+ evaluateGeneric(tr, tr->start, TRUE);
+ return TRUE;
+}
+
+
+
+
+int initBestTree (bestlist *bt, int newkeep, int numsp)
+{ /* initBestTree */
+ int i;
+
+ bt->nkeep = 0;
+
+ if (bt->ninit <= 0)
+ {
+ if (! (bt->start = setupTopol(numsp))) return 0;
+ bt->ninit = -1;
+ bt->nvalid = 0;
+ bt->numtrees = 0;
+ bt->best = unlikely;
+ bt->improved = FALSE;
+ bt->byScore = (topol **) malloc((newkeep+1) * sizeof(topol *));
+ bt->byTopol = (topol **) malloc((newkeep+1) * sizeof(topol *));
+ if (! bt->byScore || ! bt->byTopol) {
+ printf( "initBestTree: malloc failure\n");
+ return 0;
+ }
+ }
+ else if (ABS(newkeep) > bt->ninit) {
+ if (newkeep < 0) newkeep = -(bt->ninit);
+ else newkeep = bt->ninit;
+ }
+
+ if (newkeep < 1) { /* Use negative newkeep to clear list */
+ newkeep = -newkeep;
+ if (newkeep < 1) newkeep = 1;
+ bt->nvalid = 0;
+ bt->best = unlikely;
+ }
+
+ if (bt->nvalid >= newkeep) {
+ bt->nvalid = newkeep;
+ bt->worst = bt->byScore[newkeep]->likelihood;
+ }
+ else
+ {
+ bt->worst = unlikely;
+ }
+
+ for (i = bt->ninit + 1; i <= newkeep; i++)
+ {
+ if (! (bt->byScore[i] = setupTopol(numsp))) break;
+ bt->byTopol[i] = bt->byScore[i];
+ bt->ninit = i;
+ }
+
+ return (bt->nkeep = MIN(newkeep, bt->ninit));
+} /* initBestTree */
+
+
+
+void resetBestTree (bestlist *bt)
+{ /* resetBestTree */
+ bt->best = unlikely;
+ bt->worst = unlikely;
+ bt->nvalid = 0;
+ bt->improved = FALSE;
+} /* resetBestTree */
+
+
+boolean freeBestTree(bestlist *bt)
+{ /* freeBestTree */
+ while (bt->ninit >= 0) freeTopol(bt->byScore[(bt->ninit)--]);
+
+ /* VALGRIND */
+
+ free(bt->byScore);
+ free(bt->byTopol);
+
+ /* VALGRIND END */
+
+ freeTopol(bt->start);
+ return TRUE;
+} /* freeBestTree */
+
+
+/* Compare two trees, assuming that each is in standard order. Return
+ * -1 if first preceeds second, 0 if they are identical, or +1 if first
+ * follows second in standard order. Lower number tips preceed higher
+ * number tips. A tip preceeds a corresponding internal node. Internal
+ * nodes are ranked by their lowest number tip.
+ */
+
+static int cmpSubtopol (connptr p10, connptr p1, connptr p20, connptr p2)
+{
+ connptr p1d, p2d;
+ int cmp;
+
+ if (! p1->descend && ! p2->descend) /* Two tips */
+ return cmpTipVal(p1->valptr, p2->valptr);
+
+ if (! p1->descend) return -1; /* p1 = tip, p2 = node */
+ if (! p2->descend) return 1; /* p2 = tip, p1 = node */
+
+ p1d = p10 + p1->descend;
+ p2d = p20 + p2->descend;
+ while (1) { /* Two nodes */
+ if ((cmp = cmpSubtopol(p10, p1d, p20, p2d))) return cmp; /* Subtrees */
+ if (! p1d->sibling && ! p2d->sibling) return 0; /* Lists done */
+ if (! p1d->sibling) return -1; /* One done, other not */
+ if (! p2d->sibling) return 1; /* One done, other not */
+ p1d = p10 + p1d->sibling; /* Neither done */
+ p2d = p20 + p2d->sibling;
+ }
+}
+
+
+
+static int cmpTopol (void *tpl1, void *tpl2)
+{
+ connptr r1, r2;
+ int cmp;
+
+ r1 = ((topol *) tpl1)->links;
+ r2 = ((topol *) tpl2)->links;
+ cmp = cmpTipVal(tipValPtr(r1->p), tipValPtr(r2->p));
+ if (cmp)
+ return cmp;
+ return cmpSubtopol(r1, r1, r2, r2);
+}
+
+
+
+static int cmpTplScore (void *tpl1, void *tpl2)
+{
+ double l1, l2;
+
+ l1 = ((topol *) tpl1)->likelihood;
+ l2 = ((topol *) tpl2)->likelihood;
+ return (l1 > l2) ? -1 : ((l1 == l2) ? 0 : 1);
+}
+
+
+
+/* Find an item in a sorted list of n items. If the item is in the list,
+ * return its index. If it is not in the list, return the negative of the
+ * position into which it should be inserted.
+ */
+
+static int findInList (void *item, void *list[], int n, int (* cmpFunc)(void *, void *))
+{
+ int mid, hi, lo, cmp = 0;
+
+ if (n < 1) return -1; /* No match; first index */
+
+ lo = 1;
+ mid = 0;
+ hi = n;
+ while (lo < hi) {
+ mid = (lo + hi) >> 1;
+ cmp = (* cmpFunc)(item, list[mid-1]);
+ if (cmp) {
+ if (cmp < 0) hi = mid;
+ else lo = mid + 1;
+ }
+ else return mid; /* Exact match */
+ }
+
+ if (lo != mid) {
+ cmp = (* cmpFunc)(item, list[lo-1]);
+ if (cmp == 0) return lo;
+ }
+ if (cmp > 0) lo++; /* Result of step = 0 test */
+ return -lo;
+}
+
+
+
+static int findTreeInList (bestlist *bt, tree *tr)
+{
+ topol *tpl;
+
+ tpl = bt->byScore[0];
+ saveTree(tr, tpl);
+ return findInList((void *) tpl, (void **) (& (bt->byTopol[1])),
+ bt->nvalid, cmpTopol);
+}
+
+
+int saveBestTree (bestlist *bt, tree *tr, boolean keepIdenticalTrees)
+{
+ topol
+ *tpl,
+ *reuse;
+
+ int
+ tplNum,
+ scrNum,
+ reuseScrNum,
+ reuseTplNum,
+ i,
+ oldValid,
+ newValid;
+
+ tplNum = findTreeInList(bt, tr);
+ tpl = bt->byScore[0];
+ oldValid = newValid = bt->nvalid;
+
+ if(tplNum > 0)
+ {
+ /* Topology is in list */
+
+ if(!keepIdenticalTrees)
+ return 0;
+
+ reuse = bt->byTopol[tplNum]; /* Matching topol */
+ reuseScrNum = reuse->scrNum;
+ reuseTplNum = reuse->tplNum;
+ }
+ /* Good enough to keep? */
+ else
+ {
+ if(tr->likelihood < bt->worst)
+ return 0;
+ else
+ { /* Topology is not in list */
+ tplNum = -tplNum; /* Add to list (not replace) */
+ if (newValid < bt->nkeep) bt->nvalid = ++newValid;
+ reuseScrNum = newValid; /* Take worst tree */
+ reuse = bt->byScore[reuseScrNum];
+ reuseTplNum = (newValid > oldValid) ? newValid : reuse->tplNum;
+ if (tr->likelihood > bt->start->likelihood)
+ bt->improved = TRUE;
+ }
+ }
+
+ scrNum = findInList((void *) tpl, (void **) (& (bt->byScore[1])),
+ oldValid, cmpTplScore);
+ scrNum = ABS(scrNum);
+
+ if (scrNum < reuseScrNum)
+ {
+ for (i = reuseScrNum; i > scrNum; i--)
+ (bt->byScore[i] = bt->byScore[i-1])->scrNum = i;
+ }
+ else
+ {
+ if (scrNum > reuseScrNum)
+ {
+ scrNum--;
+ for (i = reuseScrNum; i < scrNum; i++)
+ (bt->byScore[i] = bt->byScore[i+1])->scrNum = i;
+ }
+ }
+
+ if(tplNum < reuseTplNum)
+ for (i = reuseTplNum; i > tplNum; i--)
+ (bt->byTopol[i] = bt->byTopol[i-1])->tplNum = i;
+ else
+ {
+ if (tplNum > reuseTplNum)
+ {
+ tplNum--;
+ for (i = reuseTplNum; i < tplNum; i++)
+ (bt->byTopol[i] = bt->byTopol[i+1])->tplNum = i;
+ }
+ }
+
+ tpl->scrNum = scrNum;
+ tpl->tplNum = tplNum;
+ bt->byTopol[tplNum] = bt->byScore[scrNum] = tpl;
+ bt->byScore[0] = reuse;
+
+ if (scrNum == 1) bt->best = tr->likelihood;
+ if (newValid == bt->nkeep) bt->worst = bt->byScore[newValid]->likelihood;
+
+ return scrNum;
+}
+
+
+int recallBestTree (bestlist *bt, int rank, tree *tr)
+{
+ if (rank < 1) rank = 1;
+ if (rank > bt->nvalid) rank = bt->nvalid;
+ if (rank > 0) if (! restoreTree(bt->byScore[rank], tr)) return FALSE;
+ return rank;
+}
+
+
+
+
diff --git a/examl/trash.c b/examl/trash.c
new file mode 100644
index 0000000..e681a85
--- /dev/null
+++ b/examl/trash.c
@@ -0,0 +1,78 @@
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <limits.h>
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include "axml.h"
+
+
+
+
+static void reorderNodes(tree *tr, nodeptr *np, nodeptr p, int *count)
+{
+ int i, found = 0;
+
+ if(isTip(p->number, tr->mxtips))
+ return;
+ else
+ {
+ for(i = tr->mxtips + 1; (i <= (tr->mxtips + tr->mxtips - 1)) && (found == 0); i++)
+ {
+ if (p == np[i] || p == np[i]->next || p == np[i]->next->next)
+ {
+ if(p == np[i])
+ tr->nodep[*count + tr->mxtips + 1] = np[i];
+ else
+ {
+ if(p == np[i]->next)
+ tr->nodep[*count + tr->mxtips + 1] = np[i]->next;
+ else
+ tr->nodep[*count + tr->mxtips + 1] = np[i]->next->next;
+ }
+
+ found = 1;
+ *count = *count + 1;
+ }
+ }
+
+ assert(found != 0);
+
+ reorderNodes(tr, np, p->next->back, count);
+ reorderNodes(tr, np, p->next->next->back, count);
+ }
+}
+
+void nodeRectifier(tree *tr)
+{
+ nodeptr *np = (nodeptr *)malloc(2 * tr->mxtips * sizeof(nodeptr));
+ int i;
+ int count = 0;
+
+ tr->start = tr->nodep[1];
+ tr->rooted = FALSE;
+
+ /* TODO why is tr->rooted set to FALSE here ?*/
+
+ for(i = tr->mxtips + 1; i <= (tr->mxtips + tr->mxtips - 1); i++)
+ np[i] = tr->nodep[i];
+
+ reorderNodes(tr, np, tr->start->back, &count);
+
+
+ free(np);
+}
+
+nodeptr findAnyTip(nodeptr p, int numsp)
+{
+ return isTip(p->number, numsp) ? p : findAnyTip(p->next->back, numsp);
+}
+
diff --git a/examl/treeIO.c b/examl/treeIO.c
new file mode 100644
index 0000000..ac5fe71
--- /dev/null
+++ b/examl/treeIO.c
@@ -0,0 +1,1184 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+
+#include "axml.h"
+
+
+extern char infoFileName[1024];
+extern char tree_file[1024];
+extern char *likelihood_key;
+extern char *ntaxa_key;
+extern char *smoothed_key;
+extern double masterTime;
+
+
+
+
+
+stringHashtable *initStringHashTable(hashNumberType n)
+{
+ /*
+ init with primes
+ */
+
+ static const hashNumberType initTable[] = {53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317,
+ 196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843,
+ 50331653, 100663319, 201326611, 402653189, 805306457, 1610612741};
+
+
+ /* init with powers of two
+
+ static const hashNumberType initTable[] = {64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384,
+ 32768, 65536, 131072, 262144, 524288, 1048576, 2097152,
+ 4194304, 8388608, 16777216, 33554432, 67108864, 134217728,
+ 268435456, 536870912, 1073741824, 2147483648U};
+ */
+
+ stringHashtable *h = (stringHashtable*)malloc(sizeof(stringHashtable));
+
+ hashNumberType
+ tableSize,
+ i,
+ primeTableLength = sizeof(initTable)/sizeof(initTable[0]),
+ maxSize = (hashNumberType)-1;
+
+ assert(n <= maxSize);
+
+ i = 0;
+
+ while(initTable[i] < n && i < primeTableLength)
+ i++;
+
+ assert(i < primeTableLength);
+
+ tableSize = initTable[i];
+
+ h->table = (stringEntry**)calloc(tableSize, sizeof(stringEntry*));
+ h->tableSize = tableSize;
+
+ return h;
+}
+
+
+static hashNumberType hashString(char *p, hashNumberType tableSize)
+{
+ hashNumberType h = 0;
+
+ for(; *p; p++)
+ h = 31 * h + *p;
+
+ return (h % tableSize);
+}
+
+
+
+void addword(char *s, stringHashtable *h, int nodeNumber)
+{
+ hashNumberType position = hashString(s, h->tableSize);
+ stringEntry *p = h->table[position];
+
+ for(; p!= NULL; p = p->next)
+ {
+ if(strcmp(s, p->word) == 0)
+ return;
+ }
+
+ p = (stringEntry *)malloc(sizeof(stringEntry));
+
+ assert(p);
+
+ p->nodeNumber = nodeNumber;
+
+ p->word = (char*)calloc(strlen(s) + 1, sizeof(char));
+ strcpy(p->word, s);
+
+ p->next = h->table[position];
+
+ h->table[position] = p;
+}
+
+int lookupWord(char *s, stringHashtable *h)
+{
+ hashNumberType position = hashString(s, h->tableSize);
+ stringEntry *p = h->table[position];
+
+ for(; p!= NULL; p = p->next)
+ {
+ if(strcmp(s, p->word) == 0)
+ return p->nodeNumber;
+ }
+
+ return -1;
+}
+
+
+int countTips(nodeptr p, int numsp)
+{
+ if(isTip(p->number, numsp))
+ return 1;
+ {
+ nodeptr q;
+ int tips = 0;
+
+ q = p->next;
+ while(q != p)
+ {
+ tips += countTips(q->back, numsp);
+ q = q->next;
+ }
+
+ return tips;
+ }
+}
+
+
+static double getBranchLength(tree *tr, int perGene, nodeptr p)
+{
+ double
+ z = 0.0,
+ x = 0.0;
+
+ assert(perGene != NO_BRANCHES);
+
+ if(tr->numBranches == 1)
+ {
+ z = p->z[0];
+ if (z < zmin)
+ z = zmin;
+
+ x = -log(z);
+ }
+ else
+ {
+ if(perGene == SUMMARIZE_LH)
+ {
+ int
+ i;
+
+ double
+ avgX = 0.0;
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ z = p->z[i];
+ if(z < zmin)
+ z = zmin;
+ x = -log(z);
+ avgX += x * tr->partitionContributions[i];
+ }
+
+ x = avgX;
+ }
+ else
+ {
+ assert(perGene >= 0 && perGene < tr->numBranches);
+
+ z = p->z[perGene];
+
+ if(z < zmin)
+ z = zmin;
+
+ x = -log(z);
+ }
+ }
+
+ return x;
+}
+
+
+
+
+
+static char *Tree2StringREC(char *treestr, tree *tr, nodeptr p, boolean printBranchLengths, boolean printNames,
+ boolean printLikelihood, boolean rellTree, boolean finalPrint, int perGene, boolean branchLabelSupport, boolean printSHSupport)
+{
+ char *nameptr;
+
+ if(isTip(p->number, tr->mxtips))
+ {
+ if(printNames)
+ {
+ nameptr = tr->nameList[p->number];
+ sprintf(treestr, "%s", nameptr);
+ }
+ else
+ sprintf(treestr, "%d", p->number);
+
+ while (*treestr) treestr++;
+ }
+ else
+ {
+ *treestr++ = '(';
+ treestr = Tree2StringREC(treestr, tr, p->next->back, printBranchLengths, printNames, printLikelihood, rellTree,
+ finalPrint, perGene, branchLabelSupport, printSHSupport);
+ *treestr++ = ',';
+ treestr = Tree2StringREC(treestr, tr, p->next->next->back, printBranchLengths, printNames, printLikelihood, rellTree,
+ finalPrint, perGene, branchLabelSupport, printSHSupport);
+ if(p == tr->start->back)
+ {
+ *treestr++ = ',';
+ treestr = Tree2StringREC(treestr, tr, p->back, printBranchLengths, printNames, printLikelihood, rellTree,
+ finalPrint, perGene, branchLabelSupport, printSHSupport);
+ }
+ *treestr++ = ')';
+ }
+
+ if(p == tr->start->back)
+ {
+ if(printBranchLengths && !rellTree)
+ sprintf(treestr, ":0.0;\n");
+ else
+ sprintf(treestr, ";\n");
+ }
+ else
+ {
+ if(rellTree || branchLabelSupport || printSHSupport)
+ {
+ if(( !isTip(p->number, tr->mxtips)) &&
+ ( !isTip(p->back->number, tr->mxtips)))
+ {
+ assert(0);
+
+ /*assert(p->bInf != (branchInfo *)NULL);*/
+
+ /*if(rellTree)
+ sprintf(treestr, "%d:%8.20f", p->bInf->support, p->z[0]);
+ if(branchLabelSupport)
+ sprintf(treestr, ":%8.20f[%d]", p->z[0], p->bInf->support);
+ if(printSHSupport)
+ sprintf(treestr, ":%8.20f[%d]", getBranchLength(tr, perGene, p), p->bInf->support);
+ */
+
+ }
+ else
+ {
+ if(rellTree || branchLabelSupport)
+ sprintf(treestr, ":%8.20f", p->z[0]);
+ if(printSHSupport)
+ sprintf(treestr, ":%8.20f", getBranchLength(tr, perGene, p));
+ }
+ }
+ else
+ {
+ if(printBranchLengths)
+ sprintf(treestr, ":%8.20f", getBranchLength(tr, perGene, p));
+ else
+ sprintf(treestr, "%s", "\0");
+ }
+ }
+
+ while (*treestr) treestr++;
+ return treestr;
+}
+
+
+
+
+
+
+
+
+
+
+char *Tree2String(char *treestr, tree *tr, nodeptr p, boolean printBranchLengths, boolean printNames, boolean printLikelihood,
+ boolean rellTree, boolean finalPrint, int perGene, boolean branchLabelSupport, boolean printSHSupport)
+{
+
+ if(rellTree)
+ assert(!branchLabelSupport && !printSHSupport);
+
+ if(branchLabelSupport)
+ assert(!rellTree && !printSHSupport);
+
+ if(printSHSupport)
+ assert(!branchLabelSupport && !rellTree);
+
+
+ Tree2StringREC(treestr, tr, p, printBranchLengths, printNames, printLikelihood, rellTree,
+ finalPrint, perGene, branchLabelSupport, printSHSupport);
+
+
+ while (*treestr) treestr++;
+
+ return treestr;
+}
+
+
+void printTreePerGene(tree *tr, analdef *adef, char *fileName, char *permission)
+{
+ FILE *treeFile;
+ char extendedTreeFileName[1024];
+ char buf[16];
+ int i;
+
+ assert(adef->perGeneBranchLengths);
+
+ for(i = 0; i < tr->numBranches; i++)
+ {
+ strcpy(extendedTreeFileName, fileName);
+ sprintf(buf,"%d", i);
+ strcat(extendedTreeFileName, ".PARTITION.");
+ strcat(extendedTreeFileName, buf);
+ /*printf("Partitiuon %d file %s\n", i, extendedTreeFileName);*/
+ Tree2String(tr->tree_string, tr, tr->start->back, TRUE, TRUE, FALSE, FALSE, TRUE, i, FALSE, FALSE);
+ treeFile = myfopen(extendedTreeFileName, permission);
+ fprintf(treeFile, "%s", tr->tree_string);
+ fclose(treeFile);
+ }
+
+}
+
+
+
+/*=======================================================================*/
+/* Read a tree from a file */
+/*=======================================================================*/
+
+
+/* 1.0.A Processing of quotation marks in comment removed
+ */
+
+static int treeFinishCom (FILE *fp, char **strp)
+{
+ int ch;
+
+ while ((ch = getc(fp)) != EOF && ch != ']') {
+ if (strp != NULL) *(*strp)++ = ch; /* save character */
+ if (ch == '[') { /* nested comment; find its end */
+ if ((ch = treeFinishCom(fp, strp)) == EOF) break;
+ if (strp != NULL) *(*strp)++ = ch; /* save closing ] */
+ }
+ }
+
+ if (strp != NULL) **strp = '\0'; /* terminate string */
+ return ch;
+} /* treeFinishCom */
+
+
+static int treeGetCh (FILE *fp) /* get next nonblank, noncomment character */
+{ /* treeGetCh */
+ int ch;
+
+ while ((ch = getc(fp)) != EOF) {
+ if (whitechar(ch)) ;
+ else if (ch == '[') { /* comment; find its end */
+ if ((ch = treeFinishCom(fp, (char **) NULL)) == EOF) break;
+ }
+ else break;
+ }
+
+ return ch;
+} /* treeGetCh */
+
+
+static boolean treeLabelEnd (int ch)
+{
+ switch (ch)
+ {
+ case EOF:
+ case '\0':
+ case '\t':
+ case '\n':
+ case '\r':
+ case ' ':
+ case ':':
+ case ',':
+ case '(':
+ case ')':
+ case ';':
+ return TRUE;
+ default:
+ break;
+ }
+ return FALSE;
+}
+
+
+static boolean treeGetLabel (FILE *fp, char *lblPtr, int maxlen)
+{
+ int ch;
+ boolean done, quoted, lblfound;
+
+ if (--maxlen < 0)
+ lblPtr = (char *) NULL;
+ else
+ if (lblPtr == NULL)
+ maxlen = 0;
+
+ ch = getc(fp);
+ done = treeLabelEnd(ch);
+
+ lblfound = ! done;
+ quoted = (ch == '\'');
+ if (quoted && ! done)
+ {
+ ch = getc(fp);
+ done = (ch == EOF);
+ }
+
+ while (! done)
+ {
+ if (quoted)
+ {
+ if (ch == '\'')
+ {
+ ch = getc(fp);
+ if (ch != '\'')
+ break;
+ }
+ }
+ else
+ if (treeLabelEnd(ch)) break;
+
+ if (--maxlen >= 0) *lblPtr++ = ch;
+ ch = getc(fp);
+ if (ch == EOF) break;
+ }
+
+ if (ch != EOF) (void) ungetc(ch, fp);
+
+ if (lblPtr != NULL) *lblPtr = '\0';
+
+ return lblfound;
+}
+
+
+static boolean treeFlushLabel (FILE *fp)
+{
+ return treeGetLabel(fp, (char *) NULL, (int) 0);
+}
+
+
+
+
+static int treeFindTipByLabelString(char *str, tree *tr, boolean check)
+{
+ int lookup = lookupWord(str, tr->nameHash);
+
+ if(lookup > 0)
+ {
+ if(check)
+ assert(! tr->nodep[lookup]->back);
+ return lookup;
+ }
+ else
+ {
+ printf("ERROR: Cannot find tree species: %s\n", str);
+ return 0;
+ }
+}
+
+
+int treeFindTipName(FILE *fp, tree *tr, boolean check)
+{
+ char str[nmlngth+2];
+ int n;
+
+ if(treeGetLabel(fp, str, nmlngth+2))
+ n = treeFindTipByLabelString(str, tr, check);
+ else
+ n = 0;
+
+
+ return n;
+}
+
+
+
+static void treeEchoContext (FILE *fp1, FILE *fp2, int n)
+{ /* treeEchoContext */
+ int ch;
+ boolean waswhite;
+
+ waswhite = TRUE;
+
+ while (n > 0 && ((ch = getc(fp1)) != EOF)) {
+ if (whitechar(ch)) {
+ ch = waswhite ? '\0' : ' ';
+ waswhite = TRUE;
+ }
+ else {
+ waswhite = FALSE;
+ }
+
+ if (ch > '\0') {putc(ch, fp2); n--;}
+ }
+} /* treeEchoContext */
+
+
+static boolean treeProcessLength (FILE *fp, double *dptr)
+{
+ int ch;
+
+ if ((ch = treeGetCh(fp)) == EOF) return FALSE; /* Skip comments */
+ (void) ungetc(ch, fp);
+
+ if (fscanf(fp, "%lf", dptr) != 1) {
+ printf("ERROR: treeProcessLength: Problem reading branch length\n");
+ treeEchoContext(fp, stdout, 40);
+ printf("\n");
+ return FALSE;
+ }
+
+ return TRUE;
+}
+
+
+static int treeFlushLen (FILE *fp)
+{
+ double dummy;
+ int ch;
+
+ ch = treeGetCh(fp);
+
+ if (ch == ':')
+ {
+ ch = treeGetCh(fp);
+
+ ungetc(ch, fp);
+ if(!treeProcessLength(fp, & dummy)) return 0;
+ return 1;
+ }
+
+
+
+ if (ch != EOF) (void) ungetc(ch, fp);
+ return 1;
+}
+
+
+
+
+
+static boolean treeNeedCh (FILE *fp, int c1, char *where)
+{
+ int c2;
+
+ if ((c2 = treeGetCh(fp)) == c1) return TRUE;
+
+ printf("ERROR: Expecting '%c' %s tree; found:", c1, where);
+ if (c2 == EOF)
+ {
+ printf("End-of-File");
+ }
+ else
+ {
+ ungetc(c2, fp);
+ treeEchoContext(fp, stdout, 40);
+ }
+ putchar('\n');
+
+ if(c1 == ':')
+ printf("RAxML may be expecting to read a tree that contains branch lengths\n");
+
+ return FALSE;
+}
+
+
+
+static boolean addElementLen (FILE *fp, tree *tr, nodeptr p, boolean readBranchLengths, boolean readNodeLabels, int *lcount)
+{
+ nodeptr q;
+ int n, ch, fres;
+
+ if ((ch = treeGetCh(fp)) == '(')
+ {
+ n = (tr->nextnode)++;
+ if (n > 2*(tr->mxtips) - 2)
+ {
+ if (tr->rooted || n > 2*(tr->mxtips) - 1)
+ {
+ printf("ERROR: Too many internal nodes. Is tree rooted?\n");
+ printf(" Deepest splitting should be a trifurcation.\n");
+ return FALSE;
+ }
+ else
+ {
+ assert(!readNodeLabels);
+ tr->rooted = TRUE;
+ }
+ }
+
+ q = tr->nodep[n];
+
+ if (! addElementLen(fp, tr, q->next, readBranchLengths, readNodeLabels, lcount)) return FALSE;
+ if (! treeNeedCh(fp, ',', "in")) return FALSE;
+ if (! addElementLen(fp, tr, q->next->next, readBranchLengths, readNodeLabels, lcount)) return FALSE;
+ if (! treeNeedCh(fp, ')', "in")) return FALSE;
+
+ if(readNodeLabels)
+ {
+ char label[64];
+ int support;
+
+ if(treeGetLabel (fp, label, 10))
+ {
+ int val = sscanf(label, "%d", &support);
+
+ assert(val == 1);
+
+ /*printf("LABEL %s Number %d\n", label, support);*/
+ /*p->support = q->support = support;*/
+ /*printf("%d %d %d %d\n", p->support, q->support, p->number, q->number);*/
+ assert(p->number > tr->mxtips && q->number > tr->mxtips);
+ *lcount = *lcount + 1;
+ }
+ }
+ else
+ (void) treeFlushLabel(fp);
+ }
+ else
+ {
+ ungetc(ch, fp);
+ if ((n = treeFindTipName(fp, tr, TRUE)) <= 0) return FALSE;
+ q = tr->nodep[n];
+ if (tr->start->number > n) tr->start = q;
+ (tr->ntips)++;
+ }
+
+ if(readBranchLengths)
+ {
+ double branch;
+ if (! treeNeedCh(fp, ':', "in")) return FALSE;
+ if (! treeProcessLength(fp, &branch)) return FALSE;
+
+ /*printf("Branch %8.20f %d\n", branch, tr->numBranches);*/
+ hookup(p, q, &branch, tr->numBranches);
+ }
+ else
+ {
+ fres = treeFlushLen(fp);
+ if(!fres) return FALSE;
+
+ hookupDefault(p, q, tr->numBranches);
+ }
+ return TRUE;
+}
+
+
+
+
+
+
+
+
+
+
+
+
+static nodeptr uprootTree (tree *tr, nodeptr p, boolean readBranchLengths)
+{
+ nodeptr q, r, s, start;
+ int n, i;
+
+ for(i = tr->mxtips + 1; i < 2 * tr->mxtips - 1; i++)
+ assert(i == tr->nodep[i]->number);
+
+ if(isTip(p->number, tr->mxtips) || p->back)
+ {
+ printf("ERROR: Unable to uproot tree.\n");
+ printf(" Inappropriate node marked for removal.\n");
+ assert(0);
+ }
+
+ assert(p->back == (nodeptr)NULL);
+
+ tr->nextnode = tr->nextnode - 1;
+
+ assert(tr->nextnode < 2 * tr->mxtips);
+
+ n = tr->nextnode;
+
+ assert(tr->nodep[tr->nextnode]);
+
+ if (n != tr->mxtips + tr->ntips - 1)
+ {
+ printf("ERROR: Unable to uproot tree. Inconsistent\n");
+ printf(" number of tips and nodes for rooted tree.\n");
+ assert(0);
+ }
+
+ q = p->next->back; /* remove p from tree */
+ r = p->next->next->back;
+ assert(p->back == (nodeptr)NULL);
+
+ if(readBranchLengths)
+ {
+ double b[NUM_BRANCHES];
+ int i;
+ for(i = 0; i < tr->numBranches; i++)
+ b[i] = (r->z[i] + q->z[i]);
+ hookup (q, r, b, tr->numBranches);
+ }
+ else
+ hookupDefault(q, r, tr->numBranches);
+
+ if(tr->constraintTree)
+ {
+ if(tr->constraintVector[p->number] != 0)
+ {
+ printf("Root node to remove should have top-level grouping of 0\n");
+ assert(0);
+ }
+ }
+
+ assert(!(isTip(r->number, tr->mxtips) && isTip(q->number, tr->mxtips)));
+
+ assert(p->number > tr->mxtips);
+
+ if(tr->ntips > 2 && p->number != n)
+ {
+ q = tr->nodep[n]; /* transfer last node's conections to p */
+ r = q->next;
+ s = q->next->next;
+
+ if(tr->constraintTree)
+ tr->constraintVector[p->number] = tr->constraintVector[q->number];
+
+ hookup(p, q->back, q->z, tr->numBranches); /* move connections to p */
+ hookup(p->next, r->back, r->z, tr->numBranches);
+ hookup(p->next->next, s->back, s->z, tr->numBranches);
+
+ q->back = q->next->back = q->next->next->back = (nodeptr) NULL;
+ }
+ else
+ p->back = p->next->back = p->next->next->back = (nodeptr) NULL;
+
+ assert(tr->ntips > 2);
+
+ start = findAnyTip(tr->nodep[tr->mxtips + 1], tr->mxtips);
+
+ assert(isTip(start->number, tr->mxtips));
+ tr->rooted = FALSE;
+ return start;
+}
+
+
+int treeReadLen (FILE *fp, tree *tr, boolean readBranches, boolean readNodeLabels, boolean topologyOnly)
+{
+ nodeptr
+ p;
+
+ int
+ i,
+ ch,
+ lcount = 0;
+
+ for (i = 1; i <= tr->mxtips; i++)
+ {
+ tr->nodep[i]->back = (node *) NULL;
+ /*if(topologyOnly)
+ tr->nodep[i]->support = -1;*/
+ }
+
+ for(i = tr->mxtips + 1; i < 2 * tr->mxtips; i++)
+ {
+ tr->nodep[i]->back = (nodeptr)NULL;
+ tr->nodep[i]->next->back = (nodeptr)NULL;
+ tr->nodep[i]->next->next->back = (nodeptr)NULL;
+ tr->nodep[i]->number = i;
+ tr->nodep[i]->next->number = i;
+ tr->nodep[i]->next->next->number = i;
+
+ /*if(topologyOnly)
+ {
+ tr->nodep[i]->support = -2;
+ tr->nodep[i]->next->support = -2;
+ tr->nodep[i]->next->next->support = -2;
+ }*/
+ }
+
+ if(topologyOnly)
+ tr->start = tr->nodep[tr->mxtips];
+ else
+ tr->start = tr->nodep[1];
+
+ tr->ntips = 0;
+ tr->nextnode = tr->mxtips + 1;
+
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionSmoothed[i] = FALSE;
+
+ tr->rooted = FALSE;
+
+ p = tr->nodep[(tr->nextnode)++];
+
+ while((ch = treeGetCh(fp)) != '(');
+
+ if(!topologyOnly)
+ assert(readBranches == FALSE && readNodeLabels == FALSE);
+
+
+ if (! addElementLen(fp, tr, p, readBranches, readNodeLabels, &lcount))
+ assert(0);
+ if (! treeNeedCh(fp, ',', "in"))
+ assert(0);
+ if (! addElementLen(fp, tr, p->next, readBranches, readNodeLabels, &lcount))
+ assert(0);
+ if (! tr->rooted)
+ {
+ if ((ch = treeGetCh(fp)) == ',')
+ {
+ if (! addElementLen(fp, tr, p->next->next, readBranches, readNodeLabels, &lcount))
+ assert(0);
+ }
+ else
+ { /* A rooted format */
+ tr->rooted = TRUE;
+ if (ch != EOF) (void) ungetc(ch, fp);
+ }
+ }
+ else
+ {
+ p->next->next->back = (nodeptr) NULL;
+ }
+ if (! treeNeedCh(fp, ')', "in"))
+ assert(0);
+
+ if(topologyOnly)
+ assert(!(tr->rooted && readNodeLabels));
+
+ (void) treeFlushLabel(fp);
+
+ if (! treeFlushLen(fp))
+ assert(0);
+
+ if (! treeNeedCh(fp, ';', "at end of"))
+ assert(0);
+
+ if (tr->rooted)
+ {
+ assert(!readNodeLabels);
+
+ p->next->next->back = (nodeptr) NULL;
+ tr->start = uprootTree(tr, p->next->next, FALSE);
+ if (! tr->start)
+ {
+ printf("FATAL ERROR UPROOTING TREE\n");
+ assert(0);
+ }
+ }
+ else
+ tr->start = findAnyTip(p, tr->mxtips);
+
+
+
+ assert(tr->ntips == tr->mxtips);
+
+
+
+
+ return lcount;
+}
+
+
+static int randomInt(int n)
+{
+ return rand() %n;
+}
+
+static boolean addElementLenMULT (FILE *fp, tree *tr, nodeptr p, int partitionCounter, int *partCount)
+{
+ nodeptr q, r, s;
+ int n, ch, fres, rn;
+ double randomResolution;
+ int old;
+
+ tr->constraintVector[p->number] = partitionCounter;
+
+ if ((ch = treeGetCh(fp)) == '(')
+ {
+ *partCount = *partCount + 1;
+ old = *partCount;
+
+ n = (tr->nextnode)++;
+ if (n > 2*(tr->mxtips) - 2)
+ {
+ if (tr->rooted || n > 2*(tr->mxtips) - 1)
+ {
+ printf("ERROR: Too many internal nodes. Is tree rooted?\n");
+ printf(" Deepest splitting should be a trifurcation.\n");
+ return FALSE;
+ }
+ else
+ {
+ tr->rooted = TRUE;
+ }
+ }
+ q = tr->nodep[n];
+ tr->constraintVector[q->number] = *partCount;
+ if (! addElementLenMULT(fp, tr, q->next, old, partCount)) return FALSE;
+ if (! treeNeedCh(fp, ',', "in")) return FALSE;
+ if (! addElementLenMULT(fp, tr, q->next->next, old, partCount)) return FALSE;
+
+ hookupDefault(p, q, tr->numBranches);
+
+ while((ch = treeGetCh(fp)) == ',')
+ {
+ n = (tr->nextnode)++;
+ if (n > 2*(tr->mxtips) - 2)
+ {
+ if (tr->rooted || n > 2*(tr->mxtips) - 1)
+ {
+ printf("ERROR: Too many internal nodes. Is tree rooted?\n");
+ printf(" Deepest splitting should be a trifurcation.\n");
+ return FALSE;
+ }
+ else
+ {
+ tr->rooted = TRUE;
+ }
+ }
+ r = tr->nodep[n];
+ tr->constraintVector[r->number] = *partCount;
+
+ rn = randomInt(10000);
+ if(rn == 0)
+ randomResolution = 0;
+ else
+ randomResolution = ((double)rn)/10000.0;
+
+ if(randomResolution < 0.5)
+ {
+ s = q->next->back;
+ r->back = q->next;
+ q->next->back = r;
+ r->next->back = s;
+ s->back = r->next;
+ addElementLenMULT(fp, tr, r->next->next, old, partCount);
+ }
+ else
+ {
+ s = q->next->next->back;
+ r->back = q->next->next;
+ q->next->next->back = r;
+ r->next->back = s;
+ s->back = r->next;
+ addElementLenMULT(fp, tr, r->next->next, old, partCount);
+ }
+ }
+
+ if(ch != ')')
+ {
+ printf("Missing /) in treeReadLenMULT\n");
+ exit(-1);
+ }
+
+
+
+ (void) treeFlushLabel(fp);
+ }
+ else
+ {
+ ungetc(ch, fp);
+ if ((n = treeFindTipName(fp, tr, TRUE)) <= 0) return FALSE;
+ q = tr->nodep[n];
+ tr->constraintVector[q->number] = partitionCounter;
+
+ if (tr->start->number > n) tr->start = q;
+ (tr->ntips)++;
+ hookupDefault(p, q, tr->numBranches);
+ }
+
+ fres = treeFlushLen(fp);
+ if(!fres) return FALSE;
+
+ return TRUE;
+}
+
+
+
+
+boolean treeReadLenMULT (FILE *fp, tree *tr, int *partCount)
+{
+ nodeptr p, r, s;
+ int i, ch, n, rn;
+ int partitionCounter = 0;
+ double randomResolution;
+
+ srand(tr->randomSeed);
+
+ for(i = 0; i < 2 * tr->mxtips; i++)
+ tr->constraintVector[i] = -1;
+
+ for (i = 1; i <= tr->mxtips; i++)
+ tr->nodep[i]->back = (node *) NULL;
+
+ for(i = tr->mxtips + 1; i < 2 * tr->mxtips; i++)
+ {
+ tr->nodep[i]->back = (nodeptr)NULL;
+ tr->nodep[i]->next->back = (nodeptr)NULL;
+ tr->nodep[i]->next->next->back = (nodeptr)NULL;
+ tr->nodep[i]->number = i;
+ tr->nodep[i]->next->number = i;
+ tr->nodep[i]->next->next->number = i;
+ }
+
+
+ tr->start = tr->nodep[tr->mxtips];
+ tr->ntips = 0;
+ tr->nextnode = tr->mxtips + 1;
+
+ for(i = 0; i < tr->numBranches; i++)
+ tr->partitionSmoothed[i] = FALSE;
+
+ tr->rooted = FALSE;
+
+ p = tr->nodep[(tr->nextnode)++];
+ while((ch = treeGetCh(fp)) != '(');
+
+ if (! addElementLenMULT(fp, tr, p, partitionCounter, partCount)) return FALSE;
+ if (! treeNeedCh(fp, ',', "in")) return FALSE;
+ if (! addElementLenMULT(fp, tr, p->next, partitionCounter, partCount)) return FALSE;
+ if (! tr->rooted)
+ {
+ if ((ch = treeGetCh(fp)) == ',')
+ {
+ if (! addElementLenMULT(fp, tr, p->next->next, partitionCounter, partCount)) return FALSE;
+
+ while((ch = treeGetCh(fp)) == ',')
+ {
+ n = (tr->nextnode)++;
+ assert(n <= 2*(tr->mxtips) - 2);
+
+ r = tr->nodep[n];
+ tr->constraintVector[r->number] = partitionCounter;
+
+ rn = randomInt(10000);
+ if(rn == 0)
+ randomResolution = 0;
+ else
+ randomResolution = ((double)rn)/10000.0;
+
+
+ if(randomResolution < 0.5)
+ {
+ s = p->next->next->back;
+ r->back = p->next->next;
+ p->next->next->back = r;
+ r->next->back = s;
+ s->back = r->next;
+ addElementLenMULT(fp, tr, r->next->next, partitionCounter, partCount);
+ }
+ else
+ {
+ s = p->next->back;
+ r->back = p->next;
+ p->next->back = r;
+ r->next->back = s;
+ s->back = r->next;
+ addElementLenMULT(fp, tr, r->next->next, partitionCounter, partCount);
+ }
+ }
+
+ if(ch != ')')
+ {
+ printf("Missing /) in treeReadLenMULT\n");
+ exit(-1);
+ }
+ else
+ ungetc(ch, fp);
+ }
+ else
+ {
+ tr->rooted = TRUE;
+ if (ch != EOF) (void) ungetc(ch, fp);
+ }
+ }
+ else
+ {
+ p->next->next->back = (nodeptr) NULL;
+ }
+
+ if (! treeNeedCh(fp, ')', "in")) return FALSE;
+ (void) treeFlushLabel(fp);
+ if (! treeFlushLen(fp)) return FALSE;
+
+ if (! treeNeedCh(fp, ';', "at end of")) return FALSE;
+
+
+ if (tr->rooted)
+ {
+ p->next->next->back = (nodeptr) NULL;
+ tr->start = uprootTree(tr, p->next->next, FALSE);
+ if (! tr->start) return FALSE;
+ }
+ else
+ {
+ tr->start = findAnyTip(p, tr->mxtips);
+ }
+
+
+
+
+
+ assert(tr->ntips == tr->mxtips);
+
+ return TRUE;
+}
+
+
+void getStartingTree(tree *tr)
+{
+ FILE *treeFile = myfopen(tree_file, "rb");
+
+ tr->likelihood = unlikely;
+
+ if(tr->constraintTree)
+ {
+ int
+ partCount = 0;
+ if (! treeReadLenMULT(treeFile, tr, &partCount))
+ exit(-1);
+ }
+ else
+ treeReadLen(treeFile, tr, FALSE, FALSE, FALSE);
+
+ fclose(treeFile);
+
+ tr->start = tr->nodep[1];
+}
+
+
+
diff --git a/gpl-3.0.txt b/gpl-3.0.txt
new file mode 100644
index 0000000..94a9ed0
--- /dev/null
+++ b/gpl-3.0.txt
@@ -0,0 +1,674 @@
+ GNU GENERAL PUBLIC LICENSE
+ Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+ The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works. By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users. We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors. You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+ To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights. Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received. You must make sure that they, too, receive
+or can get the source code. And you must show them these terms so they
+know their rights.
+
+ Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+ For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software. For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+ Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so. This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software. The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable. Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products. If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+ Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary. To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ TERMS AND CONDITIONS
+
+ 0. Definitions.
+
+ "This License" refers to version 3 of the GNU General Public License.
+
+ "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+ "The Program" refers to any copyrightable work licensed under this
+License. Each licensee is addressed as "you". "Licensees" and
+"recipients" may be individuals or organizations.
+
+ To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy. The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+ A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+ To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy. Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+ To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies. Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+ An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License. If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+ 1. Source Code.
+
+ The "source code" for a work means the preferred form of the work
+for making modifications to it. "Object code" means any non-source
+form of a work.
+
+ A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+ The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form. A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+ The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities. However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work. For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+ The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+ The Corresponding Source for a work in source code form is that
+same work.
+
+ 2. Basic Permissions.
+
+ All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met. This License explicitly affirms your unlimited
+permission to run the unmodified Program. The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work. This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+ You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force. You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright. Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+ Conveying under any other circumstances is permitted solely under
+the conditions stated below. Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+ No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+ When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+ 4. Conveying Verbatim Copies.
+
+ You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+ You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+ 5. Conveying Modified Source Versions.
+
+ You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+ a) The work must carry prominent notices stating that you modified
+ it, and giving a relevant date.
+
+ b) The work must carry prominent notices stating that it is
+ released under this License and any conditions added under section
+ 7. This requirement modifies the requirement in section 4 to
+ "keep intact all notices".
+
+ c) You must license the entire work, as a whole, under this
+ License to anyone who comes into possession of a copy. This
+ License will therefore apply, along with any applicable section 7
+ additional terms, to the whole of the work, and all its parts,
+ regardless of how they are packaged. This License gives no
+ permission to license the work in any other way, but it does not
+ invalidate such permission if you have separately received it.
+
+ d) If the work has interactive user interfaces, each must display
+ Appropriate Legal Notices; however, if the Program has interactive
+ interfaces that do not display Appropriate Legal Notices, your
+ work need not make them do so.
+
+ A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit. Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+ 6. Conveying Non-Source Forms.
+
+ You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+ a) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by the
+ Corresponding Source fixed on a durable physical medium
+ customarily used for software interchange.
+
+ b) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by a
+ written offer, valid for at least three years and valid for as
+ long as you offer spare parts or customer support for that product
+ model, to give anyone who possesses the object code either (1) a
+ copy of the Corresponding Source for all the software in the
+ product that is covered by this License, on a durable physical
+ medium customarily used for software interchange, for a price no
+ more than your reasonable cost of physically performing this
+ conveying of source, or (2) access to copy the
+ Corresponding Source from a network server at no charge.
+
+ c) Convey individual copies of the object code with a copy of the
+ written offer to provide the Corresponding Source. This
+ alternative is allowed only occasionally and noncommercially, and
+ only if you received the object code with such an offer, in accord
+ with subsection 6b.
+
+ d) Convey the object code by offering access from a designated
+ place (gratis or for a charge), and offer equivalent access to the
+ Corresponding Source in the same way through the same place at no
+ further charge. You need not require recipients to copy the
+ Corresponding Source along with the object code. If the place to
+ copy the object code is a network server, the Corresponding Source
+ may be on a different server (operated by you or a third party)
+ that supports equivalent copying facilities, provided you maintain
+ clear directions next to the object code saying where to find the
+ Corresponding Source. Regardless of what server hosts the
+ Corresponding Source, you remain obligated to ensure that it is
+ available for as long as needed to satisfy these requirements.
+
+ e) Convey the object code using peer-to-peer transmission, provided
+ you inform other peers where the object code and Corresponding
+ Source of the work are being offered to the general public at no
+ charge under subsection 6d.
+
+ A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+ A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling. In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage. For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product. A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+ "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source. The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+ If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information. But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+ The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed. Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+ Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+ 7. Additional Terms.
+
+ "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law. If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+ When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it. (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.) You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+ Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+ a) Disclaiming warranty or limiting liability differently from the
+ terms of sections 15 and 16 of this License; or
+
+ b) Requiring preservation of specified reasonable legal notices or
+ author attributions in that material or in the Appropriate Legal
+ Notices displayed by works containing it; or
+
+ c) Prohibiting misrepresentation of the origin of that material, or
+ requiring that modified versions of such material be marked in
+ reasonable ways as different from the original version; or
+
+ d) Limiting the use for publicity purposes of names of licensors or
+ authors of the material; or
+
+ e) Declining to grant rights under trademark law for use of some
+ trade names, trademarks, or service marks; or
+
+ f) Requiring indemnification of licensors and authors of that
+ material by anyone who conveys the material (or modified versions of
+ it) with contractual assumptions of liability to the recipient, for
+ any liability that these contractual assumptions directly impose on
+ those licensors and authors.
+
+ All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10. If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term. If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+ If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+ Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+ 8. Termination.
+
+ You may not propagate or modify a covered work except as expressly
+provided under this License. Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+ However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+ Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+ Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License. If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+ 9. Acceptance Not Required for Having Copies.
+
+ You are not required to accept this License in order to receive or
+run a copy of the Program. Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance. However,
+nothing other than this License grants you permission to propagate or
+modify any covered work. These actions infringe copyright if you do
+not accept this License. Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+ 10. Automatic Licensing of Downstream Recipients.
+
+ Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License. You are not responsible
+for enforcing compliance by third parties with this License.
+
+ An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations. If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+ You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License. For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+ 11. Patents.
+
+ A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based. The
+work thus licensed is called the contributor's "contributor version".
+
+ A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version. For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+ In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement). To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+ If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients. "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+ If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+ A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License. You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+ Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+ 12. No Surrender of Others' Freedom.
+
+ If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all. For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+ 13. Use with the GNU Affero General Public License.
+
+ Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work. The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+ 14. Revised Versions of this License.
+
+ The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+ Each version is given a distinguishing version number. If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation. If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+ If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+ Later license versions may give you additional or different
+permissions. However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+ 15. Disclaimer of Warranty.
+
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+ 16. Limitation of Liability.
+
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+ 17. Interpretation of Sections 15 and 16.
+
+ If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+ If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+ <program> Copyright (C) <year> <name of author>
+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+ You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<http://www.gnu.org/licenses/>.
+
+ The GNU General Public License does not permit incorporating your program
+into proprietary programs. If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License. But first, please read
+<http://www.gnu.org/philosophy/why-not-lgpl.html>.
diff --git a/manual/ExaML.backup.odt b/manual/ExaML.backup.odt
new file mode 100644
index 0000000..3eb065a
Binary files /dev/null and b/manual/ExaML.backup.odt differ
diff --git a/manual/ExaML.odt b/manual/ExaML.odt
new file mode 100644
index 0000000..ec5fcb1
Binary files /dev/null and b/manual/ExaML.odt differ
diff --git a/manual/ExaML.pdf b/manual/ExaML.pdf
new file mode 100644
index 0000000..424f2a9
Binary files /dev/null and b/manual/ExaML.pdf differ
diff --git a/parser/Makefile.SSE3.gcc b/parser/Makefile.SSE3.gcc
new file mode 100644
index 0000000..45b6fa2
--- /dev/null
+++ b/parser/Makefile.SSE3.gcc
@@ -0,0 +1,29 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = gcc
+CFLAGS = -fomit-frame-pointer -O2 -D_GNU_SOURCE -msse -funroll-loops #-Wall -Wunused-parameter -Wredundant-decls -Wreturn-type -Wswitch-default -Wunused-value -Wimplicit -Wimplicit-function-declaration -Wimplicit-int -Wimport -Wunused -Wunused-function -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wstrict-prototypes -Wpointer-sign -Wextra -Wredundant-decls -Wunused -Wunused-fun [...]
+
+
+LIBRARIES = -lm
+
+RM = rm -f
+
+objs = axml.o parsePartitions.o
+
+all : clean parse-examl
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h
+
+parse-examl : $(objs)
+ $(CC) -o parse-examl $(objs) $(LIBRARIES)
+
+
+axml.o : axml.c $(GLOBAL_DEPS)
+parsePartitions.o : parsePartitions.c $(GLOBAL_DEPS)
+
+clean :
+ $(RM) *.o parse-examl
+
+
+dev : parse-examl
\ No newline at end of file
diff --git a/parser/Makefile.check.warnings b/parser/Makefile.check.warnings
new file mode 100644
index 0000000..d3fe76d
--- /dev/null
+++ b/parser/Makefile.check.warnings
@@ -0,0 +1,26 @@
+# Makefile August 2006 by Alexandros Stamatakis
+# Makefile cleanup October 2006, Courtesy of Peter Cordes <peter at cordes.ca>
+
+CC = clang
+CFLAGS = -fomit-frame-pointer -O2 -D_GNU_SOURCE -msse -funroll-loops -Weverything -Wno-padded #-Wall -Wunused-parameter -Wredundant-decls -Wreturn-type -Wswitch-default -Wunused-value -Wimplicit -Wimplicit-function-declaration -Wimplicit-int -Wimport -Wunused -Wunused-function -Wunused-label -Wno-int-to-pointer-cast -Wbad-function-cast -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wstrict-prototypes -Wpointer-sign -Wextra -Wredundant-de [...]
+
+
+LIBRARIES = -lm
+
+RM = rm -f
+
+objs = axml.o parsePartitions.o
+
+all : parse-examl
+
+GLOBAL_DEPS = axml.h globalVariables.h ../versionHeader/version.h
+
+parse-examl : $(objs)
+ $(CC) -o parse-examl $(objs) $(LIBRARIES)
+
+
+axml.o : axml.c $(GLOBAL_DEPS)
+parsePartitions.o : parsePartitions.c $(GLOBAL_DEPS)
+
+clean :
+ $(RM) *.o parse-examl
diff --git a/parser/USAGE b/parser/USAGE
new file mode 100644
index 0000000..308016d
--- /dev/null
+++ b/parser/USAGE
@@ -0,0 +1 @@
+./parser -m DNA -s ../testdata/49 -q ../testdata/49.model -n 49
diff --git a/parser/axml.c b/parser/axml.c
new file mode 100644
index 0000000..2ff2c49
--- /dev/null
+++ b/parser/axml.c
@@ -0,0 +1,2895 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#ifdef WIN32
+#include <direct.h>
+#endif
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdarg.h>
+#include <limits.h>
+
+
+#ifdef _FINE_GRAIN_MPI
+#include <mpi.h>
+#endif
+
+
+
+#ifdef _USE_PTHREADS
+#include <pthread.h>
+
+#endif
+
+#if ! (defined(__ppc) || defined(__powerpc__) || defined(PPC))
+#include <xmmintrin.h>
+/*
+ special bug fix, enforces denormalized numbers to be flushed to zero,
+ without this program is a tiny bit faster though.
+ #include <emmintrin.h>
+ #define MM_DAZ_MASK 0x0040
+ #define MM_DAZ_ON 0x0040
+ #define MM_DAZ_OFF 0x0000
+*/
+#endif
+
+#include "axml.h"
+#include "globalVariables.h"
+
+
+#define _PORTABLE_PTHREADS
+
+
+
+
+/***************** UTILITY FUNCTIONS **************************/
+
+
+void myBinFwrite(const void *ptr, size_t size, size_t nmemb)
+{
+ size_t
+ bytes_written = fwrite(ptr, size, nmemb, byteFile);
+
+ assert(bytes_written == nmemb);
+}
+
+
+
+
+
+void *malloc_aligned(size_t size)
+{
+ void
+ *ptr = (void *)NULL;
+
+ int
+ res;
+
+
+#if defined (__APPLE__)
+ /*
+ presumably malloc on MACs always returns
+ a 16-byte aligned pointer
+ */
+
+ ptr = malloc(size);
+
+ if(ptr == (void*)NULL)
+ assert(0);
+
+#ifdef __AVX
+ assert(0);
+#endif
+
+
+#else
+ res = posix_memalign( &ptr, BYTE_ALIGNMENT, size );
+
+ if(res != 0)
+ assert(0);
+#endif
+
+ return ptr;
+}
+
+
+
+
+
+
+
+
+void printBothOpen(const char* format, ... )
+{
+ FILE *f = myfopen(infoFileName, "ab");
+
+ va_list args;
+ va_start(args, format);
+ vfprintf(f, format, args );
+ va_end(args);
+
+ va_start(args, format);
+ vprintf(format, args );
+ va_end(args);
+
+ fclose(f);
+}
+
+void printBothOpenMPI(const char* format, ... )
+{
+#ifdef _WAYNE_MPI
+ if(processID == 0)
+#endif
+ {
+ FILE *f = myfopen(infoFileName, "ab");
+
+ va_list args;
+ va_start(args, format);
+ vfprintf(f, format, args );
+ va_end(args);
+
+ va_start(args, format);
+ vprintf(format, args );
+ va_end(args);
+
+ fclose(f);
+ }
+}
+
+
+boolean getSmoothFreqs(int dataType)
+{
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ return pLengths[dataType].smoothFrequencies;
+}
+
+const unsigned int *getBitVector(int dataType)
+{
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ return pLengths[dataType].bitVector;
+}
+
+
+int getStates(int dataType)
+{
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ return pLengths[dataType].states;
+}
+
+unsigned char getUndetermined(int dataType)
+{
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ return pLengths[dataType].undetermined;
+}
+
+
+
+char getInverseMeaning(int dataType, unsigned char state)
+{
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ return pLengths[dataType].inverseMeaning[state];
+}
+
+partitionLengths *getPartitionLengths(pInfo *p)
+{
+ int
+ dataType = p->dataType,
+ states = p->states,
+ tipLength = p->maxTipStates;
+
+ assert(states != -1 && tipLength != -1);
+
+ assert(MIN_MODEL < dataType && dataType < MAX_MODEL);
+
+ pLength.leftLength = pLength.rightLength = states * states;
+ pLength.eignLength = states;
+ pLength.evLength = states * states;
+ pLength.eiLength = states * states;
+ pLength.substRatesLength = (states * states - states) / 2;
+ pLength.frequenciesLength = states;
+ pLength.tipVectorLength = tipLength * states;
+ pLength.symmetryVectorLength = (states * states - states) / 2;
+ pLength.frequencyGroupingLength = states;
+ pLength.nonGTR = FALSE;
+ pLength.optimizeBaseFrequencies = FALSE;
+
+ return (&pLengths[dataType]);
+}
+
+
+
+
+
+
+
+double gettime(void)
+{
+#ifdef WIN32
+ time_t tp;
+ struct tm localtm;
+ tp = time(NULL);
+ localtm = *localtime(&tp);
+ return 60.0*localtm.tm_min + localtm.tm_sec;
+#else
+ struct timeval ttime;
+ gettimeofday(&ttime , NULL);
+ return ttime.tv_sec + ttime.tv_usec * 0.000001;
+#endif
+}
+
+
+
+double randum (long *seed)
+{
+ long sum, mult0, mult1, seed0, seed1, seed2, newseed0, newseed1, newseed2;
+ double res;
+
+ mult0 = 1549;
+ seed0 = *seed & 4095;
+ sum = mult0 * seed0;
+ newseed0 = sum & 4095;
+ sum >>= 12;
+ seed1 = (*seed >> 12) & 4095;
+ mult1 = 406;
+ sum += mult0 * seed1 + mult1 * seed0;
+ newseed1 = sum & 4095;
+ sum >>= 12;
+ seed2 = (*seed >> 24) & 255;
+ sum += mult0 * seed2 + mult1 * seed1;
+ newseed2 = sum & 255;
+
+ *seed = newseed2 << 24 | newseed1 << 12 | newseed0;
+ res = 0.00390625 * (newseed2 + 0.000244140625 * (newseed1 + 0.000244140625 * newseed0));
+
+ return res;
+}
+
+static int filexists(char *filename)
+{
+ FILE *fp;
+ int res;
+ fp = fopen(filename,"rb");
+
+ if(fp)
+ {
+ res = 1;
+ fclose(fp);
+ }
+ else
+ res = 0;
+
+ return res;
+}
+
+
+FILE *myfopen(const char *path, const char *mode)
+{
+ FILE *fp = fopen(path, mode);
+
+ if(strcmp(mode,"r") == 0 || strcmp(mode,"rb") == 0)
+ {
+ if(fp)
+ return fp;
+ else
+ {
+ if(processID == 0)
+ printf("\n Error: the file %s you want to open for reading does not exist, exiting ...\n\n", path);
+ errorExit(-1);
+ return (FILE *)NULL;
+ }
+ }
+ else
+ {
+ if(fp)
+ return fp;
+ else
+ {
+ if(processID == 0)
+ printf("\n Error: the file %s you want to open for writing or appending can not be opened [mode: %s], exiting ...\n\n",
+ path, mode);
+ errorExit(-1);
+ return (FILE *)NULL;
+ }
+ }
+
+
+}
+
+
+
+
+
+/********************* END UTILITY FUNCTIONS ********************/
+
+
+/******************************some functions for the likelihood computation ****************************/
+
+
+
+
+
+
+
+
+
+
+/***********************reading and initializing input ******************/
+
+static void getnums (rawdata *rdta)
+{
+ if (fscanf(INFILE, "%d %d", & rdta->numsp, & rdta->sites) != 2)
+ {
+ if(processID == 0)
+ printf("\n Error: problem reading number of species and sites\n\n");
+ errorExit(-1);
+ }
+
+ if (rdta->numsp < 4)
+ {
+ if(processID == 0)
+ printf("\n Error: too few species\n\n");
+ errorExit(-1);
+ }
+
+ if (rdta->sites < 1)
+ {
+ if(processID == 0)
+ printf("\n Error: too few sites\n\n");
+ errorExit(-1);
+ }
+
+ return;
+}
+
+
+
+
+
+boolean whitechar (int ch)
+{
+ return (ch == ' ' || ch == '\n' || ch == '\t' || ch == '\r');
+}
+
+
+static void uppercase (int *chptr)
+{
+ int ch;
+
+ ch = *chptr;
+ if ((ch >= 'a' && ch <= 'i') || (ch >= 'j' && ch <= 'r')
+ || (ch >= 's' && ch <= 'z'))
+ *chptr = ch + 'A' - 'a';
+}
+
+
+
+
+static void getyspace (rawdata *rdta)
+{
+ size_t size = 4 * ((size_t)(rdta->sites / 4 + 1));
+
+
+
+ int i;
+ unsigned char *y0;
+
+ rdta->y = (unsigned char **) malloc(((size_t)rdta->numsp + 1) * sizeof(unsigned char *));
+ assert(rdta->y);
+
+ y0 = (unsigned char *)calloc(((size_t)(rdta->numsp + 1)) * size, sizeof(unsigned char));
+
+ /*
+ printf("Raw alignment data Assigning %Zu bytes\n", ((size_t)(rdta->numsp + 1)) * size * sizeof(unsigned char));
+
+ */
+
+ assert(y0);
+
+ rdta->y0 = y0;
+
+ for (i = 0; i <= rdta->numsp; i++)
+ {
+ rdta->y[i] = y0;
+ y0 += size;
+ }
+
+ return;
+}
+
+
+
+
+static boolean setupTree (tree *tr, analdef *adef)
+{
+ nodeptr
+ p0;
+
+ int
+ tips,
+ inter;
+
+ if(!adef->readTaxaOnly)
+ {
+ /*tr->bigCutoff = FALSE;*/
+
+ tr->patternPosition = (int*)NULL;
+ tr->columnPosition = (int*)NULL;
+
+ /*tr->maxCategories = MAX(4, adef->categories);*/
+
+ /*tr->partitionContributions = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ tr->partitionContributions[i] = -1.0;
+
+ tr->perPartitionLH = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ {
+ tr->perPartitionLH[i] = 0.0;
+ }
+
+ if(adef->grouping)
+ tr->grouped = TRUE;
+ else
+ tr->grouped = FALSE;
+
+ if(adef->constraint)
+ tr->constrained = TRUE;
+ else
+ tr->constrained = FALSE;
+
+ tr->treeID = 0;*/
+ }
+
+ tips = tr->mxtips;
+ inter = tr->mxtips - 1;
+
+ if(!adef->readTaxaOnly)
+ {
+ tr->yVector = (unsigned char **) malloc(((size_t)tr->mxtips + 1) * sizeof(unsigned char *));
+
+ /* tr->fracchanges = (double *)malloc(tr->NumberOfModels * sizeof(double));
+ tr->likelihoods = (double *)malloc(adef->multipleRuns * sizeof(double));*/
+ }
+
+ /*tr->numberOfTrees = -1;
+
+
+
+ tr->treeStringLength = tr->mxtips * (nmlngth+128) + 256 + tr->mxtips * 2;
+
+ tr->tree_string = (char*)calloc(tr->treeStringLength, sizeof(char));
+ tr->tree0 = (char*)calloc(tr->treeStringLength, sizeof(char));
+ tr->tree1 = (char*)calloc(tr->treeStringLength, sizeof(char));*/
+
+
+ /*TODO, must that be so long ?*/
+
+ if(!adef->readTaxaOnly)
+ {
+
+ /*tr->td[0].count = 0;
+ tr->td[0].ti = (traversalInfo *)malloc(sizeof(traversalInfo) * tr->mxtips);
+ tr->td[0].executeModel = (boolean *)malloc(sizeof(boolean) * tr->NumberOfModels);
+ tr->td[0].parameterValues = (double *)malloc(sizeof(double) * tr->NumberOfModels);
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ tr->fracchanges[i] = -1.0;
+ tr->fracchange = -1.0;
+
+ tr->constraintVector = (int *)malloc((2 * tr->mxtips) * sizeof(int));*/
+
+ tr->nameList = (char **)malloc(sizeof(char *) * ((size_t)tips + 1));
+ }
+
+ if (!(p0 = (nodeptr) malloc(((size_t)tips + 3 * (size_t)inter) * sizeof(node))))
+ {
+ printf("\n Error: unable to obtain sufficient tree memory\n\n");
+ return FALSE;
+ }
+
+
+
+
+
+ tr->vLength = 0;
+
+ tr->h = (hashtable*)NULL;
+
+
+ return TRUE;
+}
+
+
+static void checkTaxonName(char *buffer, int len)
+{
+ int i;
+
+ for(i = 0; i < len - 1; i++)
+ {
+ boolean valid;
+
+ switch(buffer[i])
+ {
+ case '\0':
+ case '\t':
+ case '\n':
+ case '\r':
+ case ' ':
+ case ':':
+ case ',':
+ case '(':
+ case ')':
+ case ';':
+ case '[':
+ case ']':
+ valid = FALSE;
+ break;
+ default:
+ valid = TRUE;
+ }
+
+ if(!valid)
+ {
+ printf("\n Error: Taxon Name \"%s\" is invalid at position %d, it contains illegal character %c\n\n", buffer, i, buffer[i]);
+ printf(" Illegal characters in taxon-names are: tabulators, carriage returns, spaces, \":\", \",\", \")\", \"(\", \";\", \"]\", \"[\"\n");
+ printf(" Exiting\n");
+ exit(-1);
+ }
+
+ }
+ assert(buffer[len - 1] == '\0');
+}
+
+static boolean getdata(analdef *adef, rawdata *rdta, tree *tr)
+{
+ int
+ i,
+ j,
+ basesread,
+ basesnew,
+ ch, my_i, meaning,
+ len,
+ meaningAA[256],
+ meaningDNA[256],
+ meaningBINARY[256],
+ meaningGeneric32[256],
+ meaningGeneric64[256];
+
+ boolean
+ allread,
+ firstpass;
+
+ char
+ buffer[nmlngth + 2];
+
+ unsigned char
+ genericChars32[32] = {'0', '1', '2', '3', '4', '5', '6', '7',
+ '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
+ 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
+ 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V'};
+ unsigned long
+ total = 0,
+ gaps = 0;
+
+ for (i = 0; i < 256; i++)
+ {
+ meaningAA[i] = -1;
+ meaningDNA[i] = -1;
+ meaningBINARY[i] = -1;
+ meaningGeneric32[i] = -1;
+ meaningGeneric64[i] = -1;
+ }
+
+ /* generic 32 data */
+
+ for(i = 0; i < 32; i++)
+ meaningGeneric32[genericChars32[i]] = i;
+ meaningGeneric32['-'] = getUndetermined(GENERIC_32);
+ meaningGeneric32['?'] = getUndetermined(GENERIC_32);
+
+ /* AA data */
+
+ meaningAA['A'] = 0; /* alanine */
+ meaningAA['R'] = 1; /* arginine */
+ meaningAA['N'] = 2; /* asparagine*/
+ meaningAA['D'] = 3; /* aspartic */
+ meaningAA['C'] = 4; /* cysteine */
+ meaningAA['Q'] = 5; /* glutamine */
+ meaningAA['E'] = 6; /* glutamic */
+ meaningAA['G'] = 7; /* glycine */
+ meaningAA['H'] = 8; /* histidine */
+ meaningAA['I'] = 9; /* isoleucine */
+ meaningAA['L'] = 10; /* leucine */
+ meaningAA['K'] = 11; /* lysine */
+ meaningAA['M'] = 12; /* methionine */
+ meaningAA['F'] = 13; /* phenylalanine */
+ meaningAA['P'] = 14; /* proline */
+ meaningAA['S'] = 15; /* serine */
+ meaningAA['T'] = 16; /* threonine */
+ meaningAA['W'] = 17; /* tryptophan */
+ meaningAA['Y'] = 18; /* tyrosine */
+ meaningAA['V'] = 19; /* valine */
+ meaningAA['B'] = 20; /* asparagine, aspartic 2 and 3*/
+ meaningAA['Z'] = 21; /*21 glutamine glutamic 5 and 6*/
+
+ meaningAA['X'] =
+ meaningAA['?'] =
+ meaningAA['*'] =
+ meaningAA['-'] =
+ getUndetermined(AA_DATA);
+
+ /* DNA data */
+
+ meaningDNA['A'] = 1;
+ meaningDNA['B'] = 14;
+ meaningDNA['C'] = 2;
+ meaningDNA['D'] = 13;
+ meaningDNA['G'] = 4;
+ meaningDNA['H'] = 11;
+ meaningDNA['K'] = 12;
+ meaningDNA['M'] = 3;
+ meaningDNA['R'] = 5;
+ meaningDNA['S'] = 6;
+ meaningDNA['T'] = 8;
+ meaningDNA['U'] = 8;
+ meaningDNA['V'] = 7;
+ meaningDNA['W'] = 9;
+ meaningDNA['Y'] = 10;
+
+ meaningDNA['N'] =
+ meaningDNA['O'] =
+ meaningDNA['X'] =
+ meaningDNA['-'] =
+ meaningDNA['?'] =
+ getUndetermined(DNA_DATA);
+
+ /* BINARY DATA */
+
+ meaningBINARY['0'] = 1;
+ meaningBINARY['1'] = 2;
+
+ meaningBINARY['-'] =
+ meaningBINARY['?'] =
+ getUndetermined(BINARY_DATA);
+
+
+ /*******************************************************************/
+
+ basesread = basesnew = 0;
+
+ allread = FALSE;
+ firstpass = TRUE;
+ ch = ' ';
+
+ while (! allread)
+ {
+ for (i = 1; i <= tr->mxtips; i++)
+ {
+ if (firstpass)
+ {
+ ch = getc(INFILE);
+ while(ch == ' ' || ch == '\n' || ch == '\t' || ch == '\r')
+ ch = getc(INFILE);
+
+ my_i = 0;
+
+ do
+ {
+ buffer[my_i] = (char)ch;
+ ch = getc(INFILE);
+ my_i++;
+ if(my_i >= nmlngth)
+ {
+ if(processID == 0)
+ {
+ printf("Taxon Name to long at taxon %d, adapt constant nmlngth in\n", i);
+ printf("axml.h, current setting %d\n", nmlngth);
+ }
+ errorExit(-1);
+ }
+ }
+ while(ch != ' ' && ch != '\n' && ch != '\t' && ch != '\r');
+
+ while(ch == ' ' || ch == '\n' || ch == '\t' || ch == '\r')
+ ch = getc(INFILE);
+
+ ungetc(ch, INFILE);
+
+ buffer[my_i] = '\0';
+ len = (int)strlen(buffer) + 1;
+ checkTaxonName(buffer, len);
+ tr->nameList[i] = (char *)malloc(sizeof(char) * (size_t)len);
+ strcpy(tr->nameList[i], buffer);
+ }
+
+ j = basesread;
+
+ while ((j < rdta->sites) && ((ch = getc(INFILE)) != EOF) && (ch != '\n') && (ch != '\r'))
+ {
+ uppercase(& ch);
+
+ assert(tr->dataVector[j + 1] != -1);
+
+ switch(tr->dataVector[j + 1])
+ {
+ case BINARY_DATA:
+ meaning = meaningBINARY[ch];
+ break;
+ case DNA_DATA:
+ case SECONDARY_DATA:
+ case SECONDARY_DATA_6:
+ case SECONDARY_DATA_7:
+ /*
+ still dealing with DNA/RNA here, hence just act if as they where DNA characters
+ corresponding column merging for sec struct models will take place later
+ */
+ meaning = meaningDNA[ch];
+ break;
+ case AA_DATA:
+ meaning = meaningAA[ch];
+ break;
+ case GENERIC_32:
+ meaning = meaningGeneric32[ch];
+ break;
+ case GENERIC_64:
+ meaning = meaningGeneric64[ch];
+ break;
+ default:
+ assert(0);
+ }
+
+ if (meaning != -1)
+ {
+ j++;
+ rdta->y[i][j] = (unsigned char)ch;
+ }
+ else
+ {
+ if(!whitechar(ch))
+ {
+ printf("\n Error: bad base (%c) at site %d of sequence %d\n\n",
+ ch, j + 1, i);
+ return FALSE;
+ }
+ }
+ }
+
+ if (ch == EOF)
+ {
+ printf("\n Error: end-of-file at site %d of sequence %d\n\n", j + 1, i);
+ return FALSE;
+ }
+
+ if (! firstpass && (j == basesread))
+ i--;
+ else
+ {
+ if (i == 1)
+ basesnew = j;
+ else
+ if (j != basesnew)
+ {
+ printf("\n Error: sequences out of alignment\n");
+ printf("%d (instead of %d) residues read in sequence %d %s\n",
+ j - basesread, basesnew - basesread, i, tr->nameList[i]);
+ return FALSE;
+ }
+ }
+ while (ch != '\n' && ch != EOF && ch != '\r') ch = getc(INFILE); /* flush line *//* PC-LINEBREAK*/
+ }
+
+ firstpass = FALSE;
+ basesread = basesnew;
+ allread = (basesread >= rdta->sites);
+ }
+
+ for(j = 1; j <= tr->mxtips; j++)
+ for(i = 1; i <= rdta->sites; i++)
+ {
+ assert(tr->dataVector[i] != -1);
+
+ switch(tr->dataVector[i])
+ {
+ case BINARY_DATA:
+ meaning = meaningBINARY[rdta->y[j][i]];
+ if(meaning == getUndetermined(BINARY_DATA))
+ gaps++;
+ break;
+
+ case SECONDARY_DATA:
+ case SECONDARY_DATA_6:
+ case SECONDARY_DATA_7:
+ assert(tr->secondaryStructurePairs[i - 1] != -1);
+ assert(i - 1 == tr->secondaryStructurePairs[tr->secondaryStructurePairs[i - 1]]);
+ /*
+ don't worry too much about undetermined column count here for sec-struct, just count
+ DNA/RNA gaps here and worry about the rest later-on, falling through to DNA again :-)
+ */
+ case DNA_DATA:
+ meaning = meaningDNA[rdta->y[j][i]];
+ if(meaning == getUndetermined(DNA_DATA))
+ gaps++;
+ break;
+
+ case AA_DATA:
+ meaning = meaningAA[rdta->y[j][i]];
+ if(meaning == getUndetermined(AA_DATA))
+ gaps++;
+ break;
+
+ case GENERIC_32:
+ meaning = meaningGeneric32[rdta->y[j][i]];
+ if(meaning == getUndetermined(GENERIC_32))
+ gaps++;
+ break;
+
+ case GENERIC_64:
+ meaning = meaningGeneric64[rdta->y[j][i]];
+ if(meaning == getUndetermined(GENERIC_64))
+ gaps++;
+ break;
+ default:
+ assert(0);
+ }
+
+ total++;
+ rdta->y[j][i] = (unsigned char)meaning;
+ }
+
+ adef->gapyness = (double)gaps / (double)total;
+
+ /*myBinFwrite(&(adef->gapyness), sizeof(double), 1);*/
+
+ printf("gappyness: %f\n", adef->gapyness);
+
+ /*for(i = 1; i <= tr->mxtips; i++)
+ {
+ int
+ len = strlen(tr->nameList[i]) + 1;
+
+ myBinFwrite(&len, sizeof(int), 1);
+ myBinFwrite(tr->nameList[i], sizeof(char), len);
+
+ printf("%d %s\n", len, tr->nameList[i]);
+ } */
+
+ return TRUE;
+}
+
+
+
+static void inputweights (rawdata *rdta)
+{
+ int i, w, fres;
+ FILE *weightFile;
+ int *wv = (int *)malloc(sizeof(int) * (size_t)rdta->sites);
+
+ weightFile = myfopen(weightFileName, "rb");
+
+ i = 0;
+
+ while((fres = fscanf(weightFile,"%d", &w)) != EOF)
+ {
+ if(!fres)
+ {
+ if(processID == 0)
+ printf("error reading weight file probably encountered a non-integer weight value\n");
+ errorExit(-1);
+ }
+ wv[i] = w;
+ i++;
+ }
+
+ if(i != rdta->sites)
+ {
+ if(processID == 0)
+ printf("number %d of weights not equal to number %d of alignment columns\n", i, rdta->sites);
+ errorExit(-1);
+ }
+
+ for(i = 1; i <= rdta->sites; i++)
+ rdta->wgt[i] = wv[i - 1];
+
+ fclose(weightFile);
+ free(wv);
+}
+
+static hashNumberType hashString(char *p, hashNumberType tableSize)
+{
+ hashNumberType
+ h = 0;
+
+ for(; *p; p++)
+ h = 31 * h + (hashNumberType)*p;
+
+ return (h % tableSize);
+}
+
+static void addword(char *s, stringHashtable *h, int nodeNumber)
+{
+ hashNumberType position = hashString(s, h->tableSize);
+ stringEntry *p = h->table[position];
+
+ for(; p!= NULL; p = p->next)
+ {
+ if(strcmp(s, p->word) == 0)
+ return;
+ }
+
+ p = (stringEntry *)malloc(sizeof(stringEntry));
+
+ assert(p);
+
+ p->nodeNumber = nodeNumber;
+ p->word = (char *)malloc((strlen(s) + 1) * sizeof(char));
+
+ strcpy(p->word, s);
+
+ p->next = h->table[position];
+
+ h->table[position] = p;
+}
+
+
+static stringHashtable *initStringHashTable(hashNumberType n)
+{
+ /*
+ init with primes
+ */
+
+ static const hashNumberType initTable[] = {53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317,
+ 196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843,
+ 50331653, 100663319, 201326611, 402653189, 805306457, 1610612741};
+
+
+ /* init with powers of two
+
+ static const hashNumberType initTable[] = {64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384,
+ 32768, 65536, 131072, 262144, 524288, 1048576, 2097152,
+ 4194304, 8388608, 16777216, 33554432, 67108864, 134217728,
+ 268435456, 536870912, 1073741824, 2147483648U};
+ */
+
+ stringHashtable *h = (stringHashtable*)malloc(sizeof(stringHashtable));
+
+ hashNumberType
+ tableSize,
+ i,
+ primeTableLength = sizeof(initTable)/sizeof(initTable[0]),
+ maxSize = (hashNumberType)-1;
+
+ assert(n <= maxSize);
+
+ i = 0;
+
+ while(initTable[i] < n && i < primeTableLength)
+ i++;
+
+ assert(i < primeTableLength);
+
+ tableSize = initTable[i];
+
+ h->table = (stringEntry**)calloc(tableSize, sizeof(stringEntry*));
+ h->tableSize = tableSize;
+
+ return h;
+}
+
+
+
+static void getinput(analdef *adef, rawdata *rdta, cruncheddata *cdta, tree *tr)
+{
+ int i;
+
+ INFILE = myfopen(seq_file, "rb");
+
+ getnums(rdta);
+
+
+ /*myBinFwrite(&(rdta->sites), sizeof(int), 1);
+ myBinFwrite(&(rdta->numsp), sizeof(int), 1);
+
+ printf("%d %d\n", rdta->sites, rdta->numsp);*/
+
+
+ tr->mxtips = rdta->numsp;
+
+
+ rdta->wgt = (int *) malloc(((size_t)rdta->sites + 1) * sizeof(int));
+ cdta->alias = (int *) malloc(((size_t)rdta->sites + 1) * sizeof(int));
+ cdta->aliaswgt = (int *) malloc(((size_t)rdta->sites + 1) * sizeof(int));
+ tr->model = (int *) calloc(((size_t)rdta->sites + 1), sizeof(int));
+ tr->initialDataVector = (int *) malloc(((size_t)rdta->sites + 1) * sizeof(int));
+ tr->extendedDataVector = (int *) malloc(((size_t)rdta->sites + 1) * sizeof(int));
+
+ if(!adef->useWeightFile)
+ {
+ for (i = 1; i <= rdta->sites; i++)
+ rdta->wgt[i] = 1;
+ }
+ else
+ {
+ assert(!adef->useSecondaryStructure);
+ inputweights(rdta);
+ }
+
+
+ if(adef->useMultipleModel)
+ {
+ int ref;
+
+ parsePartitions(adef, rdta, tr);
+
+ for(i = 1; i <= rdta->sites; i++)
+ {
+ ref = tr->model[i];
+ tr->initialDataVector[i] = tr->initialPartitionData[ref].dataType;
+ }
+ }
+ else
+ {
+ int dataType = -1;
+
+ tr->initialPartitionData = (pInfo*)malloc(sizeof(pInfo));
+ tr->initialPartitionData->optimizeBaseFrequencies = FALSE;
+
+
+ tr->initialPartitionData[0].partitionName = (char*)malloc(128 * sizeof(char));
+ strcpy(tr->initialPartitionData[0].partitionName, "No Name Provided");
+
+ tr->initialPartitionData[0].protModels = adef->proteinMatrix;
+ tr->initialPartitionData[0].protFreqs = adef->protEmpiricalFreqs;
+
+
+ tr->NumberOfModels = 1;
+
+
+ if(adef->model == M_PROTCAT || adef->model == M_PROTGAMMA)
+ dataType = AA_DATA;
+ if(adef->model == M_GTRCAT || adef->model == M_GTRGAMMA)
+ dataType = DNA_DATA;
+ if(adef->model == M_BINCAT || adef->model == M_BINGAMMA)
+ dataType = BINARY_DATA;
+ if(adef->model == M_32CAT || adef->model == M_32GAMMA)
+ dataType = GENERIC_32;
+ if(adef->model == M_64CAT || adef->model == M_64GAMMA)
+ dataType = GENERIC_64;
+
+
+
+ assert(dataType == BINARY_DATA || dataType == DNA_DATA || dataType == AA_DATA ||
+ dataType == GENERIC_32 || dataType == GENERIC_64);
+
+ tr->initialPartitionData[0].dataType = dataType;
+
+ for(i = 0; i <= rdta->sites; i++)
+ {
+ tr->initialDataVector[i] = dataType;
+ tr->model[i] = 0;
+ }
+ }
+
+ if(adef->useSecondaryStructure)
+ {
+ memcpy(tr->extendedDataVector, tr->initialDataVector, ((size_t)rdta->sites + 1) * sizeof(int));
+
+ tr->extendedPartitionData =(pInfo*)malloc(sizeof(pInfo) * (size_t)tr->NumberOfModels);
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ {
+ tr->extendedPartitionData[i].partitionName = (char*)malloc((strlen(tr->initialPartitionData[i].partitionName) + 1) * sizeof(char));
+ strcpy(tr->extendedPartitionData[i].partitionName, tr->initialPartitionData[i].partitionName);
+ tr->extendedPartitionData[i].dataType = tr->initialPartitionData[i].dataType;
+
+ tr->extendedPartitionData[i].protModels = tr->initialPartitionData[i].protModels;
+ tr->extendedPartitionData[i].protFreqs = tr->initialPartitionData[i].protFreqs;
+ }
+
+ parseSecondaryStructure(tr, adef, rdta->sites);
+
+ tr->dataVector = tr->extendedDataVector;
+ tr->partitionData = tr->extendedPartitionData;
+ }
+ else
+ {
+ tr->dataVector = tr->initialDataVector;
+ tr->partitionData = tr->initialPartitionData;
+ }
+
+
+
+ getyspace(rdta);
+
+ setupTree(tr, adef);
+
+
+
+ if(!getdata(adef, rdta, tr))
+ {
+ printf("Problem reading alignment file \n");
+ errorExit(1);
+ }
+
+ tr->nameHash = initStringHashTable(10 * (size_t)tr->mxtips);
+ for(i = 1; i <= tr->mxtips; i++)
+ addword(tr->nameList[i], tr->nameHash, i);
+
+ fclose(INFILE);
+}
+
+
+
+static unsigned char buildStates(int secModel, unsigned char v1, unsigned char v2)
+{
+ unsigned char
+ new = 0;
+
+ switch(secModel)
+ {
+ case SECONDARY_DATA:
+ new = v1;
+ new = new << 4;
+ new = new | v2;
+ break;
+ case SECONDARY_DATA_6:
+ {
+ int
+ meaningDNA[256],
+ i;
+
+ const unsigned char
+ allowedStates[6][2] = {{'A','T'}, {'C', 'G'}, {'G', 'C'}, {'G','T'}, {'T', 'A'}, {'T', 'G'}};
+
+ const unsigned char
+ finalBinaryStates[6] = {1, 2, 4, 8, 16, 32};
+
+ unsigned char
+ intermediateBinaryStates[6];
+
+ int length = 6;
+
+ for(i = 0; i < 256; i++)
+ meaningDNA[i] = -1;
+
+ meaningDNA['A'] = 1;
+ meaningDNA['B'] = 14;
+ meaningDNA['C'] = 2;
+ meaningDNA['D'] = 13;
+ meaningDNA['G'] = 4;
+ meaningDNA['H'] = 11;
+ meaningDNA['K'] = 12;
+ meaningDNA['M'] = 3;
+ meaningDNA['N'] = 15;
+ meaningDNA['O'] = 15;
+ meaningDNA['R'] = 5;
+ meaningDNA['S'] = 6;
+ meaningDNA['T'] = 8;
+ meaningDNA['U'] = 8;
+ meaningDNA['V'] = 7;
+ meaningDNA['W'] = 9;
+ meaningDNA['X'] = 15;
+ meaningDNA['Y'] = 10;
+ meaningDNA['-'] = 15;
+ meaningDNA['?'] = 15;
+
+ for(i = 0; i < length; i++)
+ {
+ unsigned char n1 = meaningDNA[allowedStates[i][0]];
+ unsigned char n2 = meaningDNA[allowedStates[i][1]];
+
+ new = n1;
+ new = new << 4;
+ new = new | n2;
+
+ intermediateBinaryStates[i] = new;
+ }
+
+ new = v1;
+ new = new << 4;
+ new = new | v2;
+
+ for(i = 0; i < length; i++)
+ {
+ if(new == intermediateBinaryStates[i])
+ break;
+ }
+ if(i < length)
+ new = finalBinaryStates[i];
+ else
+ {
+ new = 0;
+ for(i = 0; i < length; i++)
+ {
+ if(v1 & meaningDNA[allowedStates[i][0]])
+ {
+ /*printf("Adding %c%c\n", allowedStates[i][0], allowedStates[i][1]);*/
+ new |= finalBinaryStates[i];
+ }
+ if(v2 & meaningDNA[allowedStates[i][1]])
+ {
+ /*printf("Adding %c%c\n", allowedStates[i][0], allowedStates[i][1]);*/
+ new |= finalBinaryStates[i];
+ }
+ }
+ }
+ }
+ break;
+ case SECONDARY_DATA_7:
+ {
+ int
+ meaningDNA[256],
+ i;
+
+ const unsigned char
+ allowedStates[6][2] = {{'A','T'}, {'C', 'G'}, {'G', 'C'}, {'G','T'}, {'T', 'A'}, {'T', 'G'}};
+
+ const unsigned char
+ finalBinaryStates[7] = {1, 2, 4, 8, 16, 32, 64};
+
+ unsigned char
+ intermediateBinaryStates[7];
+
+ for(i = 0; i < 256; i++)
+ meaningDNA[i] = -1;
+
+ meaningDNA['A'] = 1;
+ meaningDNA['B'] = 14;
+ meaningDNA['C'] = 2;
+ meaningDNA['D'] = 13;
+ meaningDNA['G'] = 4;
+ meaningDNA['H'] = 11;
+ meaningDNA['K'] = 12;
+ meaningDNA['M'] = 3;
+ meaningDNA['N'] = 15;
+ meaningDNA['O'] = 15;
+ meaningDNA['R'] = 5;
+ meaningDNA['S'] = 6;
+ meaningDNA['T'] = 8;
+ meaningDNA['U'] = 8;
+ meaningDNA['V'] = 7;
+ meaningDNA['W'] = 9;
+ meaningDNA['X'] = 15;
+ meaningDNA['Y'] = 10;
+ meaningDNA['-'] = 15;
+ meaningDNA['?'] = 15;
+
+
+ for(i = 0; i < 6; i++)
+ {
+ unsigned char n1 = meaningDNA[allowedStates[i][0]];
+ unsigned char n2 = meaningDNA[allowedStates[i][1]];
+
+ new = n1;
+ new = new << 4;
+ new = new | n2;
+
+ intermediateBinaryStates[i] = new;
+ }
+
+ new = v1;
+ new = new << 4;
+ new = new | v2;
+
+ for(i = 0; i < 6; i++)
+ {
+ /* exact match */
+ if(new == intermediateBinaryStates[i])
+ break;
+ }
+ if(i < 6)
+ new = finalBinaryStates[i];
+ else
+ {
+ /* distinguish between exact mismatches and partial mismatches */
+
+ for(i = 0; i < 6; i++)
+ if((v1 & meaningDNA[allowedStates[i][0]]) && (v2 & meaningDNA[allowedStates[i][1]]))
+ break;
+ if(i < 6)
+ {
+ /* printf("partial mismatch\n"); */
+
+ new = 0;
+ for(i = 0; i < 6; i++)
+ {
+ if((v1 & meaningDNA[allowedStates[i][0]]) && (v2 & meaningDNA[allowedStates[i][1]]))
+ {
+ /*printf("Adding %c%c\n", allowedStates[i][0], allowedStates[i][1]);*/
+ new |= finalBinaryStates[i];
+ }
+ else
+ new |= finalBinaryStates[6];
+ }
+ }
+ else
+ new = finalBinaryStates[6];
+ }
+ }
+ break;
+ default:
+ assert(0);
+ }
+
+ return new;
+
+}
+
+
+
+static void adaptRdataToSecondary(tree *tr, rawdata *rdta)
+{
+ int *alias = (int*)calloc((size_t)rdta->sites, sizeof(int));
+ int i, j, realPosition;
+
+ for(i = 0; i < rdta->sites; i++)
+ alias[i] = -1;
+
+ for(i = 0, realPosition = 0; i < rdta->sites; i++)
+ {
+ int partner = tr->secondaryStructurePairs[i];
+ if(partner != -1)
+ {
+ assert(tr->dataVector[i+1] == SECONDARY_DATA || tr->dataVector[i+1] == SECONDARY_DATA_6 || tr->dataVector[i+1] == SECONDARY_DATA_7);
+
+ if(i < partner)
+ {
+ for(j = 1; j <= rdta->numsp; j++)
+ {
+ unsigned char v1 = rdta->y[j][i+1];
+ unsigned char v2 = rdta->y[j][partner+1];
+
+ assert(i+1 < partner+1);
+
+ rdta->y[j][i+1] = buildStates(tr->dataVector[i+1], v1, v2);
+ }
+ alias[realPosition] = i;
+ realPosition++;
+ }
+ }
+ else
+ {
+ alias[realPosition] = i;
+ realPosition++;
+ }
+ }
+
+ assert(rdta->sites - realPosition == tr->numberOfSecondaryColumns / 2);
+
+ rdta->sites = realPosition;
+
+ for(i = 0; i < rdta->sites; i++)
+ {
+ assert(alias[i] != -1);
+ tr->model[i+1] = tr->model[alias[i]+1];
+ tr->dataVector[i+1] = tr->dataVector[alias[i]+1];
+ rdta->wgt[i+1] = rdta->wgt[alias[i]+1];
+
+ for(j = 1; j <= rdta->numsp; j++)
+ rdta->y[j][i+1] = rdta->y[j][alias[i]+1];
+ }
+
+ free(alias);
+}
+
+static void sitesort(rawdata *rdta, cruncheddata *cdta, tree *tr, analdef *adef)
+{
+ int gap, i, j, jj, jg, k, n, nsp;
+ int
+ *index,
+ *category = (int*)NULL;
+
+ boolean flip, tied;
+ unsigned char **data;
+
+ if(adef->useSecondaryStructure)
+ {
+ assert(tr->NumberOfModels > 1 && adef->useMultipleModel);
+
+ adaptRdataToSecondary(tr, rdta);
+ }
+
+ if(adef->useMultipleModel)
+ category = tr->model;
+
+
+ index = cdta->alias;
+ data = rdta->y;
+ n = rdta->sites;
+ nsp = rdta->numsp;
+ index[0] = -1;
+
+
+ if(adef->compressPatterns)
+ {
+ for (gap = n / 2; gap > 0; gap /= 2)
+ {
+ for (i = gap + 1; i <= n; i++)
+ {
+ j = i - gap;
+
+ do
+ {
+ jj = index[j];
+ jg = index[j+gap];
+ if(adef->useMultipleModel)
+ {
+ assert(category[jj] != -1 &&
+ category[jg] != -1);
+
+ flip = (category[jj] > category[jg]);
+ tied = (category[jj] == category[jg]);
+
+ }
+ else
+ {
+ flip = 0;
+ tied = 1;
+ }
+
+ for (k = 1; (k <= nsp) && tied; k++)
+ {
+ flip = (data[k][jj] > data[k][jg]);
+ tied = (data[k][jj] == data[k][jg]);
+ }
+
+ if (flip)
+ {
+ index[j] = jg;
+ index[j+gap] = jj;
+ j -= gap;
+ }
+ }
+ while (flip && (j > 0));
+ }
+ }
+ }
+}
+
+
+static void sitecombcrunch (rawdata *rdta, cruncheddata *cdta, tree *tr, analdef *adef)
+{
+
+ boolean
+ tied;
+
+ int
+ i,
+ sitei,
+ j,
+ sitej,
+ k;
+
+ int
+ *aliasModel = (int*)NULL,
+ *aliasSuperModel = (int*)NULL,
+ undeterminedSites = 0;
+
+ if(adef->useMultipleModel)
+ {
+ aliasSuperModel = (int*)malloc(sizeof(int) * ((size_t)rdta->sites + 1));
+ aliasModel = (int*)malloc(sizeof(int) * ((size_t)rdta->sites + 1));
+ }
+
+ i = 0;
+ cdta->alias[0] = cdta->alias[1];
+ cdta->aliaswgt[0] = 0;
+
+ if(adef->mode == PER_SITE_LL)
+ {
+ assert(0);
+
+ /*
+ tr->patternPosition = (int*)malloc(sizeof(int) * rdta->sites);
+ tr->columnPosition = (int*)malloc(sizeof(int) * rdta->sites);
+
+ for(i = 0; i < rdta->sites; i++)
+ {
+ tr->patternPosition[i] = -1;
+ tr->columnPosition[i] = -1;
+ }
+ */
+ }
+
+
+
+ i = 0;
+ for (j = 1; j <= rdta->sites; j++)
+ {
+ int
+ allGap = TRUE;
+
+ unsigned char
+ undetermined;
+
+ sitei = cdta->alias[i];
+ sitej = cdta->alias[j];
+
+ undetermined = getUndetermined(tr->dataVector[sitej]);
+
+ for(k = 1; k <= rdta->numsp; k++)
+ {
+ if(rdta->y[k][sitej] != undetermined)
+ {
+ allGap = FALSE;
+ break;
+ }
+ }
+
+ if(allGap)
+ undeterminedSites++;
+
+#ifdef _DEBUG_UNDET_REMOVAL
+ if(allGap)
+ printf("Skipping gap site %d\n", sitej);
+#endif
+
+ if(!adef->compressPatterns)
+ tied = 0;
+ else
+ {
+ if(adef->useMultipleModel)
+ {
+ tied = (tr->model[sitei] == tr->model[sitej]);
+ if(tied)
+ assert(tr->dataVector[sitei] == tr->dataVector[sitej]);
+ }
+ else
+ tied = 1;
+ }
+
+ for (k = 1; tied && (k <= rdta->numsp); k++)
+ tied = (rdta->y[k][sitei] == rdta->y[k][sitej]);
+
+ assert(!(tied && allGap));
+
+ if(tied && !allGap)
+ {
+ if(adef->mode == PER_SITE_LL)
+ {
+ tr->patternPosition[j - 1] = i;
+ tr->columnPosition[j - 1] = sitej;
+ /*printf("Pattern %d from column %d also at site %d\n", i, sitei, sitej);*/
+ }
+
+
+ cdta->aliaswgt[i] += rdta->wgt[sitej];
+ if(adef->useMultipleModel)
+ {
+ aliasModel[i] = tr->model[sitej];
+ aliasSuperModel[i] = tr->dataVector[sitej];
+ }
+ }
+ else
+ {
+ if(!allGap)
+ {
+ if(cdta->aliaswgt[i] > 0)
+ i++;
+
+ if(adef->mode == PER_SITE_LL)
+ {
+ tr->patternPosition[j - 1] = i;
+ tr->columnPosition[j - 1] = sitej;
+ /*printf("Pattern %d is from cloumn %d\n", i, sitej);*/
+ }
+
+ cdta->aliaswgt[i] = rdta->wgt[sitej];
+ cdta->alias[i] = sitej;
+ if(adef->useMultipleModel)
+ {
+ aliasModel[i] = tr->model[sitej];
+ aliasSuperModel[i] = tr->dataVector[sitej];
+ }
+ }
+ }
+ }
+
+ cdta->endsite = (size_t)i;
+ if (cdta->aliaswgt[i] > 0)
+ cdta->endsite++;
+
+#ifdef _DEBUG_UNDET_REMOVAL
+ printf("included sites: %d\n", cdta->endsite);
+#endif
+
+ if(adef->mode == PER_SITE_LL)
+ {
+ assert(0);
+
+ for(i = 0; i < rdta->sites; i++)
+ {
+ int p = tr->patternPosition[i];
+ int c = tr->columnPosition[i];
+
+ assert(p >= 0 && (size_t) p < cdta->endsite);
+ assert(c >= 1 && c <= rdta->sites);
+ }
+ }
+
+
+ if(adef->useMultipleModel)
+ {
+ for(i = 0; i <= rdta->sites; i++)
+ {
+ tr->model[i] = aliasModel[i];
+ tr->dataVector[i] = aliasSuperModel[i];
+ }
+ }
+
+ if(adef->useMultipleModel)
+ {
+ free(aliasModel);
+ free(aliasSuperModel);
+ }
+
+ if(undeterminedSites > 0)
+ printBothOpen("\nAlignment has %d completely undetermined sites that will be automatically removed from the binary alignment file\n\n", undeterminedSites);
+}
+
+
+static boolean makeweights (analdef *adef, rawdata *rdta, cruncheddata *cdta, tree *tr)
+{
+ int i;
+
+
+
+ for (i = 1; i <= rdta->sites; i++)
+ cdta->alias[i] = i;
+
+ sitesort(rdta, cdta, tr, adef);
+ sitecombcrunch(rdta, cdta, tr, adef);
+
+ return TRUE;
+}
+
+
+
+static boolean makevalues(rawdata *rdta, cruncheddata *cdta, tree *tr, analdef *adef)
+{
+ int
+ i,
+ model,
+ modelCounter;
+
+ size_t
+ j;
+
+ unsigned char
+ *y = (unsigned char *)malloc(((size_t)rdta->numsp) * ((size_t)cdta->endsite) * sizeof(unsigned char));
+
+
+ /*
+
+ printf("compressed data Assigning %Zu bytes\n", ((size_t)rdta->numsp) * ((size_t)cdta->endsite) * sizeof(unsigned char));
+
+ */
+
+
+ {
+ for (i = 1; i <= rdta->numsp; i++)
+ for (j = 0; j < cdta->endsite; j++)
+ y[(((size_t)(i - 1)) * ((size_t)cdta->endsite)) + j] = rdta->y[i][cdta->alias[j]];
+
+ /*
+ printf("Free on raw data\n");
+ */
+
+ free(rdta->y0);
+ free(rdta->y);
+
+ }
+
+ rdta->y0 = y;
+
+ if(!adef->useMultipleModel)
+ tr->NumberOfModels = 1;
+
+#ifdef _DEBUG_UNDET_REMOVAL
+ for(i = 0; i < cdta->endsite; i++)
+ printf("%d ", tr->model[i]);
+
+ printf("\n");
+#endif
+
+ if(adef->useMultipleModel)
+ {
+ tr->partitionData[0].lower = 0;
+
+ model = tr->model[0];
+ modelCounter = 0;
+
+ i = 1;
+
+ while((size_t) i < cdta->endsite)
+ {
+ if(tr->model[i] != model)
+ {
+ tr->partitionData[modelCounter].upper = (size_t)i;
+ tr->partitionData[modelCounter + 1].lower = (size_t)i;
+
+ model = tr->model[i];
+ modelCounter++;
+ }
+ i++;
+ }
+
+ if(modelCounter < tr->NumberOfModels - 1)
+ {
+ printf("\nYou specified %d partitions, but after parsing and pre-processing ExaML only found %d partitions\n", tr->NumberOfModels, modelCounter + 1);
+ printf("Presumably one or more partitions vanished because they consisted entirely of undetermined characters.\n");
+ printf("Please fix your data!\n\n");
+ exit(-1);
+ }
+
+
+ tr->partitionData[tr->NumberOfModels - 1].upper = (size_t)cdta->endsite;
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ tr->partitionData[i].width = tr->partitionData[i].upper - tr->partitionData[i].lower;
+
+ model = tr->model[0];
+ modelCounter = 0;
+ tr->model[0] = modelCounter;
+ i = 1;
+
+ while((size_t) i < cdta->endsite)
+ {
+ if(tr->model[i] != model)
+ {
+ model = tr->model[i];
+ modelCounter++;
+ tr->model[i] = modelCounter;
+ }
+ else
+ tr->model[i] = modelCounter;
+ i++;
+ }
+ }
+ else
+ {
+ tr->partitionData[0].lower = 0;
+ tr->partitionData[0].upper = (size_t)cdta->endsite;
+ tr->partitionData[0].width = tr->partitionData[0].upper - tr->partitionData[0].lower;
+ }
+
+ tr->rdta = rdta;
+ tr->cdta = cdta;
+
+ tr->originalCrunchedLength = tr->cdta->endsite;
+
+ for(i = 0; i < rdta->numsp; i++)
+ tr->yVector[i + 1] = &(rdta->y0[(tr->originalCrunchedLength) * ((size_t)i)]);
+
+ return TRUE;
+}
+
+
+
+static void initAdef(analdef *adef)
+{
+ adef->useSecondaryStructure = FALSE;
+ adef->bootstrapBranchLengths = FALSE;
+ adef->model = M_GTRCAT;
+ adef->max_rearrange = 21;
+ adef->stepwidth = 5;
+ adef->initial = adef->bestTrav = 10;
+ adef->initialSet = FALSE;
+ adef->restart = FALSE;
+ adef->mode = BIG_RAPID_MODE;
+ adef->categories = 25;
+ adef->boot = 0;
+ adef->rapidBoot = 0;
+ adef->useWeightFile = FALSE;
+ adef->checkpoints = 0;
+ adef->startingTreeOnly = 0;
+ adef->multipleRuns = 1;
+ adef->useMultipleModel = FALSE;
+ adef->likelihoodEpsilon = 0.1;
+ adef->constraint = FALSE;
+ adef->grouping = FALSE;
+ adef->randomStartingTree = FALSE;
+ adef->parsimonySeed = 0;
+ adef->proteinMatrix = JTT;
+ adef->protEmpiricalFreqs = 0;
+ adef->useInvariant = FALSE;
+ adef->permuteTreeoptimize = FALSE;
+ adef->useInvariant = FALSE;
+ adef->allInOne = FALSE;
+ adef->likelihoodTest = FALSE;
+ adef->perGeneBranchLengths = FALSE;
+ adef->generateBS = FALSE;
+ adef->bootStopping = FALSE;
+ adef->gapyness = 0.0;
+ adef->similarityFilterMode = 0;
+ adef->useExcludeFile = FALSE;
+ adef->userProteinModel = FALSE;
+ adef->externalAAMatrix = (double*)NULL;
+ adef->computeELW = FALSE;
+ adef->computeDistance = FALSE;
+ adef->thoroughInsertion = FALSE;
+ adef->compressPatterns = TRUE;
+ adef->readTaxaOnly = FALSE;
+ adef->meshSearch = 0;
+ adef->useCheckpoint = FALSE;
+ adef->leaveDropMode = FALSE;
+ adef->slidingWindowSize = 100;
+#ifdef _BAYESIAN
+ adef->bayesian = FALSE;
+#endif
+
+}
+
+
+
+
+static int dataExists(char *model, analdef *adef)
+{
+ /********** BINARY ********************/
+
+ if(strcmp(model, "BIN\0") == 0)
+ {
+ adef->model = M_BINGAMMA;
+ return 1;
+ }
+
+ /*********** DNA **********************/
+
+ if(strcmp(model, "DNA\0") == 0)
+ {
+ adef->model = M_GTRGAMMA;
+ return 1;
+ }
+
+ /*************** AA GTR ********************/
+
+ if(strcmp(model, "PROT\0") == 0)
+ {
+ adef->model = M_PROTGAMMA;
+ return 1;
+ }
+
+ return 0;
+}
+
+/*********************************************************************************************/
+
+static void printVersionInfo(void)
+{
+ printf("\n\nThis is the parse-examl version %s released by Alexandros Stamatakis, Andre J. Aberer, and Alexey Kozlov in %s.\n\n", programVersion, programDate);
+}
+
+static void printREADME(void)
+{
+ printVersionInfo();
+ printf("\n");
+ printf("\nTo report bugs use the RAxML google group\n");
+ printf("Please send us all input files, the exact invocation, details of the HW and operating system,\n");
+ printf("as well as all error messages printed to screen.\n\n\n");
+
+ printf("parse-examl\n");
+ printf(" -s sequenceFileName\n");
+ printf(" -n outputFileName\n");
+ printf(" -m substitutionModel\n");
+ printf(" [-c]\n");
+ printf(" [-q]\n");
+ printf(" [-h]\n");
+ printf("\n");
+ printf(" -m Model of Nucleotide or Amino Acid Substitution:\n");
+ printf("\n");
+ printf(" For Binary data use: BIN\n");
+ printf(" For DNA data use: DNA\n");
+ printf(" For AA data use: PROT\n");
+ printf("\n");
+ printf(" -c disable site pattern compression\n");
+ printf("\n");
+ printf(" -q Specify the file name which contains the assignment of models to alignment\n");
+ printf(" partitions for multiple models of substitution. For the syntax of this file\n");
+ printf(" please consult the manual.\n");
+ printf("\n");
+ printf(" -h Display this help message.\n");
+ printf("\n");
+ printf("\n\n\n\n");
+
+}
+
+static int mygetopt(int argc, char **argv, char *opts, int *optind, char **optarg)
+{
+ static int sp = 1;
+ register int c;
+ register char *cp;
+
+ if(sp == 1)
+ {
+ if(*optind >= argc || argv[*optind][0] != '-' || argv[*optind][1] == '\0')
+ {
+ return -1;
+ }
+ }
+ else
+ {
+ if(strcmp(argv[*optind], "--") == 0)
+ {
+ *optind = *optind + 1;
+ return -1;
+ }
+ }
+
+ c = argv[*optind][sp];
+ if(c == ':' || (cp=strchr(opts, c)) == 0)
+ {
+ if(argv[*optind][++sp] == '\0')
+ {
+ *optind = *optind + 1;
+ sp = 1;
+ }
+ printf("\n Error: illegal option -- %c\n\n", c);
+ return('?');
+ }
+ if(*++cp == ':')
+ {
+ if(argv[*optind][sp+1] != '\0')
+ {
+ *optarg = &argv[*optind][sp+1];
+ *optind = *optind + 1;
+ }
+ else
+ {
+ *optind = *optind + 1;
+ if(*optind >= argc)
+ {
+ if ( c != 'h')
+ {
+ sp = 1;
+ printf("\n Error: option -- %c requires an argument\n\n", c);
+ return('?');
+ }
+ else
+ return ( c );
+ }
+ else
+ {
+ *optarg = argv[*optind];
+ *optind = *optind + 1;
+ }
+ }
+ sp = 1;
+ }
+ else
+ {
+ if(argv[*optind][++sp] == '\0')
+ {
+ sp = 1;
+ *optind = *optind + 1;
+ }
+ *optarg = 0;
+ }
+ return(c);
+ }
+
+
+/*********************************************************************************************/
+
+
+
+
+
+
+
+
+
+static void analyzeRunId(char id[128])
+{
+ int i = 0;
+
+ while(id[i] != '\0')
+ {
+ if(i >= 128)
+ {
+ printf("\n Error: run id after \"-n\" is too long, it has %d characters please use a shorter one\n\n", i);
+ assert(0);
+ }
+
+ if(id[i] == '/')
+ {
+ printf("\n Error character %c not allowed in run ID\n\n", id[i]);
+ assert(0);
+ }
+
+
+ i++;
+ }
+
+ if(i == 0)
+ {
+ printf("\n Error: please provide a string for the run id after \"-n\" \n\n");
+ assert(0);
+ }
+
+}
+
+
+static void get_args(int argc, char *argv[], analdef *adef, tree *tr)
+{
+ boolean
+ bad_opt =FALSE;
+
+ char
+ *optarg = (char*)NULL,
+ model[2048] = "";
+
+ int
+ optind = 1,
+ c,
+ nameSet = 0,
+ alignmentSet = 0,
+ modelSet = 0;
+
+
+ run_id[0] = 0;
+ seq_file[0] = 0;
+ model[0] = 0;
+ weightFileName[0] = 0;
+ modelFileName[0] = 0;
+
+ /*********** tr inits **************/
+
+#ifdef _USE_PTHREADS
+ NumberOfThreads = 0;
+#endif
+
+
+ tr->bootStopCriterion = -1;
+ tr->wcThreshold = 0.03;
+ tr->doCutoff = TRUE;
+ tr->secondaryStructureModel = SEC_16; /* default setting */
+ tr->searchConvergenceCriterion = FALSE;
+ tr->catOnly = FALSE;
+
+ tr->multiStateModel = GTR_MULTI_STATE;
+ tr->useGappedImplementation = FALSE;
+ tr->saveMemory = FALSE;
+
+
+
+
+ /********* tr inits end*************/
+
+
+ while( !bad_opt && ( ( c = mygetopt(argc,argv,"q:s:n:m:hc", &optind, &optarg ) ) != -1 ) )
+ {
+ switch(c)
+ {
+ case 'c':
+ adef->compressPatterns = FALSE;
+ break;
+ case 'h':
+ printREADME();
+ errorExit(-1);
+ break;
+ case 'q':
+ strcpy(modelFileName,optarg);
+ adef->useMultipleModel = TRUE;
+ break;
+ case 'n':
+ strcpy(run_id,optarg);
+ analyzeRunId(run_id);
+ nameSet = 1;
+ break;
+ case 's':
+ strcpy(seq_file, optarg);
+ alignmentSet = 1;
+ break;
+ case 'm':
+ strcpy(model,optarg);
+ if(dataExists(model, adef) == 0)
+ {
+ printf("\n Error: model %s does not exist\n\n", model);
+ errorExit(-1);
+ }
+ else
+ modelSet = 1;
+ break;
+ default:
+ errorExit(-1);
+ }
+ }
+
+ if(!adef->useMultipleModel && !modelSet)
+ {
+ if(processID == 0)
+ {
+ printREADME();
+ printf("\n Error, you must specify a data type for unpartitioned alignment with the \"-m\" option\n\n");
+ }
+ errorExit(-1);
+ }
+
+ if(!nameSet)
+ {
+ if(processID == 0)
+ {
+ printREADME();
+ printf("\n Error: please specify a name for this run with -n\n\n");
+ }
+ errorExit(-1);
+ }
+
+
+ if(!alignmentSet)
+ {
+ if(processID == 0)
+ {
+ printREADME();
+ printf("\n Error: please specify an alignment for this run with -s\n\n");
+ }
+ errorExit(-1);
+ }
+
+
+ strcat(infoFileName, "RAxML_info.");
+ strcat(infoFileName, run_id);
+
+ if(processID == 0)
+ {
+ int infoFileExists = 0;
+
+ infoFileExists = filexists(infoFileName);
+
+ if(infoFileExists)
+ {
+ printf("\n Error: output files with the run ID <%s> already exist... exiting\n\n", run_id);
+ exit(-1);
+ }
+ }
+
+ strcat(byteFileName, run_id);
+ strcat(byteFileName, ".binary");
+
+ if(filexists(byteFileName))
+ {
+ printf("\n Error: binary compressed file %s you want to generate already exists... exiting\n\n", byteFileName);
+ exit(0);
+ }
+
+ byteFile = fopen(byteFileName, "wb");
+
+ if ( !byteFile )
+ printf("%s\n", byteFileName);
+
+ return;
+}
+
+
+
+
+void errorExit(int e)
+{
+
+#ifdef _WAYNE_MPI
+ MPI_Finalize();
+#endif
+
+ exit(e);
+
+}
+
+
+
+
+
+
+
+
+
+
+/***********************reading and initializing input ******************/
+
+
+/********************PRINTING various INFO **************************************/
+
+
+
+
+
+void getDataTypeString(tree *tr, int model, char typeOfData[1024])
+{
+ switch(tr->partitionData[model].dataType)
+ {
+ case AA_DATA:
+ strcpy(typeOfData,"AA");
+ break;
+ case DNA_DATA:
+ strcpy(typeOfData,"DNA");
+ break;
+ case BINARY_DATA:
+ strcpy(typeOfData,"BINARY/MORPHOLOGICAL");
+ break;
+ case SECONDARY_DATA:
+ strcpy(typeOfData,"SECONDARY 16 STATE MODEL USING ");
+ strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+ break;
+ case SECONDARY_DATA_6:
+ strcpy(typeOfData,"SECONDARY 6 STATE MODEL USING ");
+ strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+ break;
+ case SECONDARY_DATA_7:
+ strcpy(typeOfData,"SECONDARY 7 STATE MODEL USING ");
+ strcat(typeOfData, secondaryModelList[tr->secondaryStructureModel]);
+ break;
+ case GENERIC_32:
+ strcpy(typeOfData,"Multi-State");
+ break;
+ case GENERIC_64:
+ strcpy(typeOfData,"Codon");
+ break;
+ default:
+ assert(0);
+ }
+}
+
+
+
+
+
+/************************************************************************************/
+
+
+
+
+
+
+
+
+
+static int iterated_bitcount(unsigned int n)
+{
+ int
+ count=0;
+
+ while(n)
+ {
+ count += n & 0x1u ;
+ n >>= 1 ;
+ }
+
+ return count;
+}
+
+static char bits_in_16bits [0x1u << 16];
+
+static void compute_bits_in_16bits(void)
+{
+ unsigned int i;
+
+ for (i = 0; i < (0x1u<<16); i++)
+ bits_in_16bits[i] = iterated_bitcount(i);
+
+ return ;
+}
+
+unsigned int precomputed16_bitcount (unsigned int n)
+{
+ /* works only for 32-bit int*/
+
+ return bits_in_16bits [n & 0xffffu]
+ + bits_in_16bits [(n >> 16) & 0xffffu] ;
+}
+
+
+
+
+
+
+
+static void smoothFreqs(const int n, double *pfreqs, double *dst, pInfo *partitionData)
+{
+ int
+ countScale = 0,
+ l,
+ loopCounter = 0;
+
+
+
+ for(l = 0; l < n; l++)
+ if(pfreqs[l] < FREQ_MIN)
+ countScale++;
+
+
+ /* for(l = 0; l < n; l++)
+ if(pfreqs[l] == 0.0)
+ countScale++;*/
+
+ if(countScale > 0)
+ {
+ while(countScale > 0)
+ {
+ double correction = 0.0;
+ double factor = 1.0;
+
+ for(l = 0; l < n; l++)
+ {
+ if(pfreqs[l] == 0.0)
+ correction += FREQ_MIN;
+ else
+ if(pfreqs[l] < FREQ_MIN)
+ {
+ correction += (FREQ_MIN - pfreqs[l]);
+ factor -= (FREQ_MIN - pfreqs[l]);
+ }
+ }
+
+ countScale = 0;
+
+ for(l = 0; l < n; l++)
+ {
+ if(pfreqs[l] >= FREQ_MIN)
+ pfreqs[l] = pfreqs[l] - (pfreqs[l] * correction * factor);
+ else
+ pfreqs[l] = FREQ_MIN;
+
+ if(pfreqs[l] < FREQ_MIN)
+ countScale++;
+ }
+ assert(loopCounter < 100);
+ loopCounter++;
+ }
+ }
+
+ for(l = 0; l < n; l++)
+ dst[l] = pfreqs[l];
+
+
+ if(partitionData->nonGTR)
+ {
+ int k;
+
+ assert(partitionData->dataType == SECONDARY_DATA_7 || partitionData->dataType == SECONDARY_DATA_6 || partitionData->dataType == SECONDARY_DATA);
+
+ for(l = 0; l < n; l++)
+ {
+ int count = 1;
+
+ for(k = 0; k < n; k++)
+ {
+ if(k != l && partitionData->frequencyGrouping[l] == partitionData->frequencyGrouping[k])
+ {
+ count++;
+ dst[l] += pfreqs[k];
+ }
+ }
+ dst[l] /= ((double)count);
+ }
+ }
+}
+
+
+static void genericBaseFrequencies(tree *tr, const int numFreqs, rawdata *rdta, cruncheddata *cdta, int lower, int upper, int model, boolean smoothFrequencies,
+ const unsigned int *bitMask)
+{
+ double
+ wj,
+ acc,
+ pfreqs[64],
+ sumf[64],
+ temp[64];
+
+ int
+ countStatesPresent = 0,
+ statesPresent[64],
+ i,
+ j,
+ k,
+ l;
+
+ unsigned char
+ *yptr;
+
+ for(l = 0; l < numFreqs; l++)
+ {
+ pfreqs[l] = 1.0 / ((double)numFreqs);
+ statesPresent[l] = 0;
+ }
+
+#ifdef _DEBUG_UNDET_REMOVAL
+ printf("bounds %d %d\n", lower, upper);
+
+ for(j = lower; j < upper; j++)
+ {
+ for(i = 0; i < rdta->numsp; i++)
+ {
+ unsigned int
+ code;
+
+ yptr = &(rdta->y0[((size_t)i) * (tr->originalCrunchedLength)]);
+
+ code = yptr[j];
+
+ printf("%c", inverseMeaningDNA[code]);
+ }
+ printf("\n");
+ }
+
+ printf("\n\n");
+#endif
+
+ for(i = 0; i < rdta->numsp; i++)
+ {
+ yptr = &(rdta->y0[((size_t)i) * (tr->originalCrunchedLength)]);
+
+ for(j = lower; j < upper; j++)
+ {
+ unsigned int
+ code = bitMask[yptr[j]];
+
+ switch(numFreqs)
+ {
+ case 2:
+ switch(code)
+ {
+ case 1:
+ statesPresent[0] = 1;
+ break;
+ case 2:
+ statesPresent[1] = 1;
+ break;
+ default:
+ ;
+ }
+ break;
+ case 4:
+ switch(code)
+ {
+ case 1:
+ statesPresent[0] = 1;
+ break;
+ case 2:
+ statesPresent[1] = 1;
+ break;
+ case 4:
+ statesPresent[2] = 1;
+ break;
+ case 8:
+ statesPresent[3] = 1;
+ break;
+ default:
+ ;
+ }
+ break;
+ case 20:
+ if(yptr[j] >= 0 && yptr[j] < 20)
+ statesPresent[yptr[j]] = 1;
+ break;
+ default:
+ assert(0);
+ }
+ }
+ }
+
+ for(i = 0, countStatesPresent = 0; i < numFreqs; i++)
+ if(statesPresent[i] == 1)
+ countStatesPresent++;
+
+ for (k = 1; k <= 8; k++)
+ {
+ for(l = 0; l < numFreqs; l++)
+ sumf[l] = 0.0;
+
+ for(i = 0; i < rdta->numsp; i++)
+ {
+ yptr = &(rdta->y0[((size_t)i) * (tr->originalCrunchedLength)]);
+
+ for(j = lower; j < upper; j++)
+ {
+ unsigned int
+ code = bitMask[yptr[j]];
+
+ assert(code >= 1);
+
+ for(l = 0; l < numFreqs; l++)
+ {
+ if((code >> l) & 1)
+ temp[l] = pfreqs[l];
+ else
+ temp[l] = 0.0;
+ }
+
+ for(l = 0, acc = 0.0; l < numFreqs; l++)
+ {
+ if(temp[l] != 0.0)
+ acc += temp[l];
+ }
+
+ wj = ((double)cdta->aliaswgt[j]) / acc;
+
+ for(l = 0; l < numFreqs; l++)
+ {
+ if(temp[l] != 0.0)
+ sumf[l] += wj * temp[l];
+ }
+ }
+ }
+
+ for(l = 0, acc = 0.0; l < numFreqs; l++)
+ {
+ if(sumf[l] != 0.0)
+ acc += sumf[l];
+ }
+
+ for(l = 0; l < numFreqs; l++)
+ pfreqs[l] = sumf[l] / acc;
+ }
+
+ if(countStatesPresent < numFreqs)
+ {
+ printf("Partition %s number %d has a problem, the number of expected states is %d the number of states that are present is %d.\n",
+ tr->partitionData[model].partitionName, model, numFreqs, countStatesPresent);
+ printf("Please go and fix your data!\n\n");
+ }
+
+ if(smoothFrequencies)
+ {
+ smoothFreqs(numFreqs, pfreqs, tr->partitionData[model].frequencies, &(tr->partitionData[model]));
+ }
+ else
+ {
+ boolean
+ zeroFreq = FALSE;
+
+ char
+ typeOfData[1024];
+
+ getDataTypeString(tr, model, typeOfData);
+
+ for(l = 0; l < numFreqs; l++)
+ {
+ if(pfreqs[l] == 0.0)
+ {
+ printBothOpen("Empirical base frequency for state number %d is equal to zero in %s data partition %s\n", l, typeOfData, tr->partitionData[model].partitionName);
+ printBothOpen("Since this is probably not what you want to do, RAxML will soon exit.\n\n");
+ zeroFreq = TRUE;
+ }
+ }
+
+ if(zeroFreq)
+ exit(-1);
+
+ for(l = 0; l < numFreqs; l++)
+ {
+ assert(pfreqs[l] > 0.0);
+ tr->partitionData[model].frequencies[l] = pfreqs[l];
+ }
+ }
+}
+
+
+
+
+
+
+
+static void baseFrequenciesGTR(rawdata *rdta, cruncheddata *cdta, tree *tr)
+{
+ int
+ model;
+
+ size_t
+ lower,
+ upper;
+
+ int
+ states;
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ lower = tr->partitionData[model].lower;
+ upper = tr->partitionData[model].upper;
+ states = tr->partitionData[model].states;
+
+ switch(tr->partitionData[model].dataType)
+ {
+ case GENERIC_32:
+ switch(tr->multiStateModel)
+ {
+ case ORDERED_MULTI_STATE:
+ case MK_MULTI_STATE:
+ {
+ int
+ i;
+ double
+ freq = 1.0 / (double)states,
+ acc = 0.0;
+
+ for(i = 0; i < states; i++)
+ {
+ acc += freq;
+ tr->partitionData[model].frequencies[i] = freq;
+ /*printf("%f \n", freq);*/
+ }
+ /*printf("Frequency Deviation: %1.60f\n", acc);*/
+ }
+ break;
+ case GTR_MULTI_STATE:
+ genericBaseFrequencies(tr, states, rdta, cdta, lower, upper, model, TRUE,
+ bitVector32);
+ break;
+ default:
+ assert(0);
+ }
+ break;
+ case GENERIC_64:
+ assert(0);
+ break;
+ case SECONDARY_DATA_6:
+ case SECONDARY_DATA_7:
+ case SECONDARY_DATA:
+ case AA_DATA:
+ case DNA_DATA:
+ case BINARY_DATA:
+ genericBaseFrequencies(tr, states, rdta, cdta, lower, upper, model,
+ getSmoothFreqs(tr->partitionData[model].dataType),
+ getBitVector(tr->partitionData[model].dataType));
+ break;
+ default:
+ assert(0);
+ }
+ }
+
+ return;
+}
+
+ // #define OLD_LAYOUT
+
+int main (int argc, char *argv[])
+{
+ int model;
+
+ rawdata *rdta;
+ cruncheddata *cdta;
+ tree *tr;
+ analdef *adef;
+
+ /* get the start time */
+
+ masterTime = gettime();
+
+ /* get some memory for the basic data structures */
+
+ adef = (analdef *)malloc(sizeof(analdef));
+ rdta = (rawdata *)malloc(sizeof(rawdata));
+ cdta = (cruncheddata *)malloc(sizeof(cruncheddata));
+ tr = (tree *)malloc(sizeof(tree));
+
+
+ /* the initialization below is required for the hash tables that are used */
+
+ compute_bits_in_16bits();
+
+ /* initialize the analysis parameters in struct adef to default values */
+
+ initAdef(adef);
+
+ /* parse command line arguments: this has a side effect on tr struct and adef struct variables */
+
+ get_args(argc,argv, adef, tr);
+
+ /* parse the phylip file: this should probably be re-done, perhaps using the relatively flexible parser
+ written in C++ by Marc Holder */
+
+ getinput(adef, rdta, cdta, tr);
+
+ printBothOpen("Pattern compression: %s\n", (adef->compressPatterns)?"ON":"OFF");
+
+ makeweights(adef, rdta, cdta, tr);
+
+ makevalues(rdta, cdta, tr, adef);
+
+
+ for(model = 0; model < tr->NumberOfModels; model++)
+ {
+ tr->partitionData[model].states = getStates(tr->partitionData[model].dataType);
+ tr->partitionData[model].maxTipStates = getUndetermined(tr->partitionData[model].dataType) + 1;
+ tr->partitionData[model].nonGTR = FALSE;
+
+ partitionLengths
+ *pl = getPartitionLengths(&(tr->partitionData[model]));
+
+ tr->partitionData[model].frequencies = (double*)malloc(pl->frequenciesLength * sizeof(double));
+ }
+
+ baseFrequenciesGTR(tr->rdta, tr->cdta, tr);
+
+
+
+
+
+ {
+ int
+ sizeOfSizeT = sizeof(size_t),
+ version = (int)programVersionInt,
+ magicNumber = 6517718;
+
+ size_t
+ i,
+ model;
+
+ /* NEW, we firstly write, how many bytes size_t comprises */
+
+ myBinFwrite(&(sizeOfSizeT), sizeof(sizeOfSizeT), 1);
+
+ //error checking for parser!
+ myBinFwrite(&version, sizeof(int), 1);
+ myBinFwrite(&magicNumber, sizeof(int), 1);
+ //error checking for correct parser end
+
+ myBinFwrite(&(tr->mxtips), sizeof(int), 1);
+ myBinFwrite(&(tr->originalCrunchedLength), sizeof(size_t), 1);
+ myBinFwrite(&(tr->NumberOfModels), sizeof(int), 1);
+ myBinFwrite(&(adef->gapyness), sizeof(double), 1);
+
+ myBinFwrite(tr->cdta->aliaswgt, sizeof(int), tr->originalCrunchedLength);
+
+ for(i = 1; i <= (size_t)tr->mxtips; i++)
+ {
+ int len = strlen(tr->nameList[i]) + 1;
+ myBinFwrite(&len, sizeof(int), 1);
+ myBinFwrite(tr->nameList[i], sizeof(char), len);
+ }
+
+ for(model = 0; model < (size_t)tr->NumberOfModels; model++)
+ {
+ int
+ len;
+
+ pInfo
+ *p = &(tr->partitionData[model]);
+
+
+ myBinFwrite(&(p->states), sizeof(int), 1);
+ myBinFwrite(&(p->maxTipStates), sizeof(int), 1);
+ myBinFwrite(&(p->lower), sizeof(size_t), 1);
+ myBinFwrite(&(p->upper), sizeof(size_t), 1);
+ myBinFwrite(&(p->width), sizeof(size_t), 1);
+ myBinFwrite(&(p->dataType), sizeof(int), 1);
+ myBinFwrite(&(p->protModels), sizeof(int), 1);
+ myBinFwrite(&(p->protFreqs), sizeof(int), 1);
+ myBinFwrite(&(p->nonGTR), sizeof(boolean), 1);
+ myBinFwrite(&(p->optimizeBaseFrequencies), sizeof(boolean), 1);
+
+
+
+ /* later on if adding secondary structure data
+
+ int *symmetryVector;
+ int *frequencyGrouping;
+ */
+
+ len = strlen(p->partitionName) + 1;
+ myBinFwrite(&len, sizeof(int), 1);
+ myBinFwrite(p->partitionName, sizeof(char), len);
+ myBinFwrite(tr->partitionData[model].frequencies, sizeof(double), tr->partitionData[model].states);
+
+
+
+
+ }
+
+#ifdef OLD_LAYOUT
+ myBinFwrite(rdta->y0, sizeof(unsigned char), (tr->originalCrunchedLength) * ((size_t)tr->mxtips));
+#else
+ /*
+ Write each partition, taxon by taxon. Thus, if unpartitioned,
+ nothing changes.
+ */
+
+ size_t
+ mem_reqs_cat = 0,
+ mem_reqs_gamma = 0,
+ unique_patterns = 0;
+
+ for(model = 0; model < (size_t) tr->NumberOfModels; ++model )
+ {
+ pInfo
+ *p = &(tr->partitionData[model]);
+
+ size_t
+ width = p->upper - p->lower;
+
+ unique_patterns += width;
+
+ //multiply partition width with number of states we need to store in each CLV entry
+
+ mem_reqs_cat += (size_t)tr->partitionData[model].states * width;
+
+ for(i = 0; i < (size_t)tr->mxtips; ++i)
+ {
+ myBinFwrite(rdta->y0
+ + sizeof(unsigned char) * ( (i * tr->originalCrunchedLength) + p->lower )
+ , sizeof(unsigned char), width);
+ }
+ }
+
+ printBothOpen("\n\nYour alignment has %zu %s\n", unique_patterns, (adef->compressPatterns == TRUE)?"unique patterns":"sites");
+
+ //multiply CLV vector length with number of tips and 8, since b bytes are needed to store an inner conditional probability vector
+ mem_reqs_cat *= (size_t)tr->mxtips * sizeof(double);
+
+ //mem reqs for gamma are 4 times higher than for CAT
+ mem_reqs_gamma = mem_reqs_cat * 4;
+
+ //now add the space for storing the tips:
+
+ mem_reqs_cat += (size_t)tr->mxtips * unique_patterns * sizeof(unsigned char);
+ mem_reqs_gamma += (size_t)tr->mxtips * unique_patterns * sizeof(unsigned char);
+
+ printBothOpen("\n\nUnder CAT the memory required by ExaML for storing CLVs and tip vectors will be\n%zu bytes\n%zu kiloBytes\n%zu MegaBytes\n%zu GigaBytes\n",
+ mem_reqs_cat,
+ mem_reqs_cat / 1024 ,
+ mem_reqs_cat / (1024 * 1024),
+ mem_reqs_cat / (1024 * 1024 * 1024));
+
+ printBothOpen("\n\nUnder GAMMA the memory required by ExaML for storing CLVs and tip vectors will be\n%zu bytes\n%zu kiloBytes\n%zu MegaBytes\n%zu GigaBytes\n",
+ mem_reqs_gamma,
+ mem_reqs_gamma / 1024 ,
+ mem_reqs_gamma / (1024 * 1024),
+ mem_reqs_gamma / (1024 * 1024 * 1024));
+
+ printBothOpen("\nPlease note that, these are just the memory requirements for doing likelihood calculations!\n");
+ printBothOpen("To be on the safe side, we recommend that you execute ExaML on a system with twice that memory.\n");
+
+#endif
+ }
+
+ fclose(byteFile);
+
+ printBothOpen("\n\nBinary and compressed alignment file written to file %s\n\n", byteFileName);
+ printBothOpen("Parsing completed, exiting now ... \n\n");
+
+ return 0;
+}
diff --git a/parser/axml.h b/parser/axml.h
new file mode 100644
index 0000000..7a97136
--- /dev/null
+++ b/parser/axml.h
@@ -0,0 +1,1295 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses
+ * with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+#include <assert.h>
+#include <stdint.h>
+#include "../versionHeader/version.h"
+
+
+#ifdef __AVX
+#define BYTE_ALIGNMENT 32
+#else
+#define BYTE_ALIGNMENT 16
+#endif
+
+
+
+
+
+#define MAX_TIP_EV 0.999999999 /* max tip vector value, sum of EVs needs to be smaller than 1.0, otherwise the numerics break down */
+#define smoothings 32 /* maximum smoothing passes through tree */
+#define iterations 10 /* maximum iterations of iterations per insert */
+#define newzpercycle 1 /* iterations of makenewz per tree traversal */
+#define nmlngth 256 /* number of characters in species name */
+#define deltaz 0.00001 /* test of net branch length change in update */
+#define defaultz 0.9 /* value of z assigned as starting point */
+#define unlikely -1.0E300 /* low likelihood for initialization */
+
+
+#define SUMMARIZE_LENGTH -3
+#define SUMMARIZE_LH -2
+#define NO_BRANCHES -1
+
+#define MASK_LENGTH 32
+#define GET_BITVECTOR_LENGTH(x) ((x % MASK_LENGTH) ? (x / MASK_LENGTH + 1) : (x / MASK_LENGTH))
+
+#define zmin 1.0E-15 /* max branch prop. to -log(zmin) (= 34) */
+#define zmax (1.0 - 1.0E-6) /* min branch prop. to 1.0-zmax (= 1.0E-6) */
+
+#define twotothe256 \
+ 115792089237316195423570985008687907853269984665640564039457584007913129639936.0
+ /* 2**256 (exactly) */
+
+#define minlikelihood (1.0/twotothe256)
+#define minusminlikelihood -minlikelihood
+
+
+
+
+/* 18446744073709551616.0 */
+
+/*4294967296.0*/
+
+/* 18446744073709551616.0 */
+
+/* 2**64 (exactly) */
+/* 4294967296 2**32 */
+
+#define badRear -1
+
+//#define NUM_BRANCHES 1
+
+#define TRUE 1
+#define FALSE 0
+
+
+
+#define LIKELIHOOD_EPSILON 0.0000001
+
+#define AA_SCALE 10.0
+#define AA_SCALE_PLUS_EPSILON 10.001
+
+/* ALPHA_MIN is critical -> numerical instability, eg for 4 discrete rate cats */
+/* and alpha = 0.01 the lowest rate r_0 is */
+/* 0.00000000000000000000000000000000000000000000000000000000000034878079110511010487 */
+/* which leads to numerical problems Table for alpha settings below: */
+/* */
+/* 0.010000 0.00000000000000000000000000000000000000000000000000000000000034878079110511010487 */
+/* 0.010000 yielded nasty numerical bugs in at least one case ! */
+/* 0.020000 0.00000000000000000000000000000044136090435925743185910935350715027016962154188875 */
+/* 0.030000 0.00000000000000000000476844846859006690412039180149775802624789852441798419292220 */
+/* 0.040000 0.00000000000000049522423236954066431210260930029681736928018820007024736185030633 */
+/* 0.050000 0.00000000000050625351310359203371872643495343928538368616365517027588794007897377 */
+/* 0.060000 0.00000000005134625283884191118711474021861409372524676086868566926568746566772461 */
+/* 0.070000 0.00000000139080650074206434685544624965062437960128249869740102440118789672851562 */
+/* 0.080000 0.00000001650681201563587066858709818343436959153791576682124286890029907226562500 */
+/* 0.090000 0.00000011301977332931251259273962858978301859735893231118097901344299316406250000 */
+/* 0.100000 0.00000052651925834844387815526344648331402709118265192955732345581054687500000000 */
+
+
+#define ALPHA_MIN 0.02
+#define ALPHA_MAX 1000.0
+
+#define RATE_MIN 0.0000001
+#define RATE_MAX 1000000.0
+
+#define INVAR_MIN 0.0001
+#define INVAR_MAX 0.9999
+
+#define TT_MIN 0.0000001
+#define TT_MAX 1000000.0
+
+#define FREQ_MIN 0.001
+
+/*
+ previous values between 0.001 and 0.000001
+
+ TO AVOID NUMERICAL PROBLEMS WHEN FREQ == 0 IN PARTITIONED MODELS, ESPECIALLY WITH AA
+ previous value of FREQ_MIN was: 0.000001, but this seemed to cause problems with some
+ of the 7-state secondary structure models with some rather exotic small toy test datasets,
+ on the other hand 0.001 caused problems with some of the 16-state secondary structure models
+
+ For some reason the frequency settings seem to be repeatedly causing numerical problems
+
+*/
+
+#define ITMAX 100
+
+
+
+#define SHFT(a,b,c,d) (a)=(b);(b)=(c);(c)=(d);
+#define SIGN(a,b) ((b) > 0.0 ? fabs(a) : -fabs(a))
+
+#define ABS(x) (((x)<0) ? (-(x)) : (x))
+#define MIN(x,y) (((x)<(y)) ? (x) : (y))
+#define MAX(x,y) (((x)>(y)) ? (x) : (y))
+#define NINT(x) ((int) ((x)>0 ? ((x)+0.5) : ((x)-0.5)))
+
+#ifdef _USE_FPGA_LOG
+extern double log_approx (double input);
+#define LOG(x) log_approx(x)
+#else
+#define LOG(x) log(x)
+#endif
+
+
+#ifdef _USE_FPGA_EXP
+extern double exp_approx (double x);
+#define EXP(x) exp_approx(x)
+#else
+#define EXP(x) exp(x)
+#endif
+
+
+#define LOGF(x) logf(x)
+
+
+#define PointGamma(prob,alpha,beta) PointChi2(prob,2.0*(alpha))/(2.0*(beta))
+
+//#define programName "the phylip file parser for ExaML"
+//#define programVersion "2.0.1"
+//#define programDate "June 3 2014"
+
+
+#define TREE_EVALUATION 0
+#define BIG_RAPID_MODE 1
+#define CALC_BIPARTITIONS 3
+#define SPLIT_MULTI_GENE 4
+#define CHECK_ALIGNMENT 5
+#define PER_SITE_LL 6
+#define PARSIMONY_ADDITION 7
+#define CLASSIFY_ML 9
+#define DISTANCE_MODE 11
+#define GENERATE_BS 12
+#define COMPUTE_ELW 13
+#define BOOTSTOP_ONLY 14
+#define COMPUTE_LHS 17
+#define COMPUTE_BIPARTITION_CORRELATION 18
+#define THOROUGH_PARSIMONY 19
+#define COMPUTE_RF_DISTANCE 20
+#define MORPH_CALIBRATOR 21
+#define CONSENSUS_ONLY 22
+#define MESH_TREE_SEARCH 23
+#define FAST_SEARCH 24
+#define MORPH_CALIBRATOR_PARSIMONY 25
+#define SH_LIKE_SUPPORTS 28
+
+#define M_GTRCAT 1
+#define M_GTRGAMMA 2
+#define M_BINCAT 3
+#define M_BINGAMMA 4
+#define M_PROTCAT 5
+#define M_PROTGAMMA 6
+#define M_32CAT 7
+#define M_32GAMMA 8
+#define M_64CAT 9
+#define M_64GAMMA 10
+
+
+#define DAYHOFF 0
+#define DCMUT 1
+#define JTT 2
+#define MTREV 3
+#define WAG 4
+#define RTREV 5
+#define CPREV 6
+#define VT 7
+#define BLOSUM62 8
+#define MTMAM 9
+#define LG 10
+#define MTART 11
+#define MTZOA 12
+#define PMB 13
+#define HIVB 14
+#define HIVW 15
+#define JTTDCMUT 16
+#define FLU 17
+#define STMTREV 18
+#define AUTO 19
+#define LG4M 20
+#define LG4X 21
+#define GTR 22 /* GTR always needs to be the last one */
+
+#define NUM_PROT_MODELS 23
+
+/* bipartition stuff */
+
+#define BIPARTITIONS_ALL 0
+#define GET_BIPARTITIONS_BEST 1
+#define DRAW_BIPARTITIONS_BEST 2
+#define BIPARTITIONS_BOOTSTOP 3
+#define BIPARTITIONS_RF 4
+
+
+
+/* bootstopping stuff */
+
+#define BOOTSTOP_PERMUTATIONS 100
+#define START_BSTOP_TEST 10
+
+#define FC_THRESHOLD 99
+#define FC_SPACING 50
+#define FC_LOWER 0.99
+#define FC_INIT 20
+
+#define FREQUENCY_STOP 0
+#define MR_STOP 1
+#define MRE_STOP 2
+#define MRE_IGN_STOP 3
+
+#define MR_CONSENSUS 0
+#define MRE_CONSENSUS 1
+#define STRICT_CONSENSUS 2
+
+
+
+/* bootstopping stuff end */
+
+
+#define TIP_TIP 0
+#define TIP_INNER 1
+#define INNER_INNER 2
+
+#define MIN_MODEL -1
+#define BINARY_DATA 0
+#define DNA_DATA 1
+#define AA_DATA 2
+#define SECONDARY_DATA 3
+#define SECONDARY_DATA_6 4
+#define SECONDARY_DATA_7 5
+#define GENERIC_32 6
+#define GENERIC_64 7
+#define MAX_MODEL 8
+
+#define SEC_6_A 0
+#define SEC_6_B 1
+#define SEC_6_C 2
+#define SEC_6_D 3
+#define SEC_6_E 4
+
+#define SEC_7_A 5
+#define SEC_7_B 6
+#define SEC_7_C 7
+#define SEC_7_D 8
+#define SEC_7_E 9
+#define SEC_7_F 10
+
+#define SEC_16 11
+#define SEC_16_A 12
+#define SEC_16_B 13
+#define SEC_16_C 14
+#define SEC_16_D 15
+#define SEC_16_E 16
+#define SEC_16_F 17
+#define SEC_16_I 18
+#define SEC_16_J 19
+#define SEC_16_K 20
+
+#define ORDERED_MULTI_STATE 0
+#define MK_MULTI_STATE 1
+#define GTR_MULTI_STATE 2
+
+
+
+
+
+#define CAT 0
+#define GAMMA 1
+#define GAMMA_I 2
+
+
+
+typedef int boolean;
+
+
+typedef struct {
+ double lh;
+ int tree;
+ double weight;
+} elw;
+
+struct ent
+{
+ unsigned int *bitVector;
+ unsigned int *treeVector;
+ unsigned int amountTips;
+ int *supportVector;
+ unsigned int bipNumber;
+ unsigned int bipNumber2;
+ unsigned int supportFromTreeset[2];
+ struct ent *next;
+};
+
+typedef struct ent entry;
+
+typedef unsigned int hashNumberType;
+
+typedef unsigned int parsimonyNumber;
+
+/*typedef uint_fast32_t parsimonyNumber;*/
+
+#define PCF 32
+
+/*
+ typedef uint64_t parsimonyNumber;
+
+ #define PCF 16
+
+
+typedef unsigned char parsimonyNumber;
+
+#define PCF 2
+*/
+
+typedef struct
+{
+ hashNumberType tableSize;
+ entry **table;
+ hashNumberType entryCount;
+}
+ hashtable;
+
+
+struct stringEnt
+{
+ int nodeNumber;
+ char *word;
+ struct stringEnt *next;
+};
+
+typedef struct stringEnt stringEntry;
+
+typedef struct
+{
+ hashNumberType tableSize;
+ stringEntry **table;
+}
+ stringHashtable;
+
+
+typedef struct
+{
+ unsigned int parsimonyScore;
+ unsigned int parsimonyState;
+}
+ parsimonyVector;
+
+
+typedef struct ratec
+{
+ double accumulatedSiteLikelihood;
+ double rate;
+}
+ rateCategorize;
+
+
+typedef struct
+{
+ int tipCase;
+ int pNumber;
+ int qNumber;
+ int rNumber;
+ //double qz[NUM_BRANCHES];
+ //double rz[NUM_BRANCHES];
+} traversalInfo;
+
+typedef struct
+{
+ traversalInfo *ti;
+ int count;
+ int functionType;
+ boolean traversalHasChanged;
+ boolean *executeModel;
+ double *parameterValues;
+} traversalData;
+
+
+struct noderec;
+
+typedef struct epBrData
+{
+ int *countThem;
+ int *executeThem;
+ unsigned int *parsimonyScores;
+ double *branches;
+ double *bootstrapBranches;
+ double *likelihoods;
+ double originalBranchLength;
+ char branchLabel[64];
+ int leftNodeNumber;
+ int rightNodeNumber;
+ int *leftScaling;
+ int *rightScaling;
+ parsimonyVector *leftParsimony;
+ parsimonyVector *rightParsimony;
+ //double branchLengths[NUM_BRANCHES];
+ double *left;
+ double *right;
+ int branchNumber;
+} epaBranchData;
+
+typedef struct
+{
+ epaBranchData *epa;
+
+ unsigned int *vector;
+ int support;
+ struct noderec *oP;
+ struct noderec *oQ;
+} branchInfo;
+
+
+
+
+
+
+
+
+typedef struct
+{
+ boolean valid;
+ int partitions;
+ int *partitionList;
+}
+ linkageData;
+
+typedef struct
+{
+ int entries;
+ linkageData* ld;
+}
+ linkageList;
+
+
+typedef struct noderec
+{
+
+ branchInfo *bInf;
+ // double z[NUM_BRANCHES];
+#ifdef _BAYESIAN
+ //double z_tmp[NUM_BRANCHES];
+#endif
+ struct noderec *next;
+ struct noderec *back;
+ hashNumberType hash;
+ int support;
+ int number;
+ char x;
+}
+ node, *nodeptr;
+
+typedef struct
+ {
+ double lh;
+ int number;
+ }
+ info;
+
+typedef struct bInf {
+ double likelihood;
+ nodeptr node;
+} bestInfo;
+
+typedef struct iL {
+ bestInfo *list;
+ int n;
+ int valid;
+} infoList;
+
+
+
+
+typedef struct
+{
+ int numsp;
+ int sites;
+ unsigned char **y;
+ unsigned char *y0;
+ int *wgt;
+} rawdata;
+
+typedef struct {
+ int *alias; /* site representing a pattern */
+ int *aliaswgt; /* weight by pattern */
+ int *rateCategory;
+ size_t endsite; /* # of sequence patterns */
+ double *patrat; /* rates per pattern */
+ double *patratStored;
+} cruncheddata;
+
+
+
+
+typedef struct {
+ int states;
+ int maxTipStates;
+ size_t lower;
+ size_t upper;
+ size_t width;
+ int dataType;
+ int protModels;
+ int autoProtModels;
+ int protFreqs;
+ int **expVector;
+ double **xVector;
+ size_t *xSpaceVector;
+
+ unsigned char **yVector;
+ char *partitionName;
+ double *sumBuffer;
+
+ double *gammaRates;
+
+ double *EIGN;
+ double *EV;
+
+
+
+ double *EI;
+
+
+
+
+
+ double *left;
+ double *right;
+
+
+
+
+ double *frequencies;
+ double *tipVector;
+ double *substRates;
+
+
+ double *perSiteRates;
+
+ double *wr;
+ double *wr2;
+
+
+
+ unsigned int *globalScaler;
+ double *globalScalerDouble;
+ int *wgt;
+
+ int *rateCategory;
+ int *symmetryVector;
+ int *frequencyGrouping;
+ boolean nonGTR;
+ boolean optimizeBaseFrequencies;
+ double alpha;
+
+
+ int gapVectorLength;
+ unsigned int *gapVector;
+ double *gapColumn;
+
+ int numberOfCategories;
+} pInfo;
+
+
+
+typedef struct
+{
+ int left;
+ int right;
+ double likelihood;
+} lhEntry;
+
+
+typedef struct
+{
+ int count;
+ int size;
+ lhEntry *entries;
+} lhList;
+
+
+typedef struct List_{
+ void *value;
+ struct List_ *next;
+} List;
+
+
+#define REARR_SETTING 1
+#define FAST_SPRS 2
+#define SLOW_SPRS 3
+
+typedef struct {
+
+ int state;
+
+ unsigned int vLength;
+
+ int rearrangementsMax;
+ int rearrangementsMin;
+ int thoroughIterations;
+ int fastIterations;
+ int treeVectorLength;
+ int mintrav;
+ int maxtrav;
+ int bestTrav;
+ int Thorough;
+ int optimizeRateCategoryInvocations;
+
+ double accumulatedTime;
+
+ double startLH;
+ double lh;
+ double previousLh;
+ double difference;
+ double epsilon;
+
+ boolean impr;
+ boolean cutoff;
+
+ double tr_startLH;
+ double tr_endLH;
+ double tr_likelihood;
+ double tr_bestOfNode;
+
+ double tr_lhCutoff;
+ double tr_lhAVG;
+ double tr_lhDEC;
+ int tr_NumberOfCategories;
+ int tr_itCount;
+ int tr_doCutoff;
+
+
+} checkPointState;
+
+
+typedef struct {
+ double EIGN[19] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double EV[400] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double EI[380] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double substRates[190];
+ double frequencies[20] ;
+ double tipVector[460] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double fracchange[1];
+ double left[1600] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+ double right[1600] __attribute__ ((aligned (BYTE_ALIGNMENT)));
+} siteAAModels;
+
+typedef struct {
+ boolean useGappedImplementation;
+ boolean saveMemory;
+
+ siteAAModels siteProtModel[2 * (NUM_PROT_MODELS - 2)];
+
+ boolean estimatePerSiteAA;
+
+ int *resample;
+
+ int numberOfBranches;
+ int numberOfTipsForInsertion;
+ int *inserts;
+ int branchCounter;
+
+
+
+
+
+
+
+ parsimonyNumber **parsimonyState_A;
+ parsimonyNumber **parsimonyState_C;
+ parsimonyNumber **parsimonyState_G;
+ parsimonyNumber **parsimonyState_T;
+ unsigned int *parsimonyScore;
+ int *ti;
+ unsigned int compressedWidth;
+
+ int numberOfTrees;
+
+ stringHashtable *nameHash;
+
+ pInfo *partitionData;
+ pInfo *initialPartitionData;
+ pInfo *extendedPartitionData;
+
+ int *dataVector;
+ int *initialDataVector;
+ int *extendedDataVector;
+
+ int *patternPosition;
+ int *columnPosition;
+
+ char *secondaryStructureInput;
+
+ boolean *executeModel;
+
+ double *perPartitionLH;
+
+ traversalData td[1];
+
+ int maxCategories;
+
+ double *wr;
+ double *wr2;
+
+ // double coreLZ[NUM_BRANCHES];
+ int modelNumber;
+ int numBranches;
+ int bootStopCriterion;
+ int consensusType;
+ double wcThreshold;
+
+
+
+
+
+
+
+
+ branchInfo *bInf;
+
+ int multiStateModel;
+
+
+ // boolean curvatOK[NUM_BRANCHES];
+ /* the stuff below is shared among DNA and AA, span does
+ not change depending on datatype */
+
+
+ double *fracchanges;
+
+ /* model stuff end */
+
+ unsigned char **yVector;
+ int secondaryStructureModel;
+ size_t originalCrunchedLength;
+ int fullSites;
+ int *originalModel;
+ int *originalDataVector;
+ int *originalWeights;
+ int *secondaryStructurePairs;
+
+
+ double *partitionContributions;
+ double fracchange;
+ double lhCutoff;
+ double lhAVG;
+ unsigned long lhDEC;
+ unsigned long itCount;
+ int numberOfInvariableColumns;
+ int weightOfInvariableColumns;
+ int rateHetModel;
+
+ double startLH;
+ double endLH;
+ double likelihood;
+ double *likelihoods;
+
+ node **nodep;
+ nodeptr nodeBaseAddress;
+ node *start;
+ int mxtips;
+ int *model;
+
+ int *constraintVector;
+ int numberOfSecondaryColumns;
+ boolean searchConvergenceCriterion;
+ int ntips;
+ int nextnode;
+ int NumberOfModels;
+ int parsimonyLength;
+
+ int checkPointCounter;
+ int treeID;
+ boolean bigCutoff;
+ // boolean partitionSmoothed[NUM_BRANCHES];
+ // boolean partitionConverged[NUM_BRANCHES];
+ boolean rooted;
+ boolean grouped;
+ boolean constrained;
+ boolean doCutoff;
+ boolean catOnly;
+ rawdata *rdta;
+ cruncheddata *cdta;
+
+ char **nameList;
+ char *tree_string;
+ char *tree0;
+ char *tree1;
+ int treeStringLength;
+ unsigned int bestParsimony;
+ double bestOfNode;
+ nodeptr removeNode;
+ nodeptr insertNode;
+
+ /*
+ double zqr[NUM_BRANCHES];
+ double currentZQR[NUM_BRANCHES];
+
+ double currentLZR[NUM_BRANCHES];
+ double currentLZQ[NUM_BRANCHES];
+ double currentLZS[NUM_BRANCHES];
+ double currentLZI[NUM_BRANCHES];
+ double lzs[NUM_BRANCHES];
+ double lzq[NUM_BRANCHES];
+ double lzr[NUM_BRANCHES];
+ double lzi[NUM_BRANCHES];
+ */
+
+ int mr_thresh;
+
+
+ unsigned int **bitVectors;
+
+ unsigned int vLength;
+
+ hashtable *h;
+
+
+} tree;
+
+
+/***************************************************************/
+
+typedef struct {
+ int partitionNumber;
+ int partitionLength;
+} partitionType;
+
+typedef struct
+{
+ // double z[NUM_BRANCHES];
+ nodeptr p, q;
+ int cp, cq;
+}
+ connectRELL, *connptrRELL;
+
+typedef struct
+{
+ connectRELL *connect;
+ int start;
+ double likelihood;
+}
+ topolRELL;
+
+
+typedef struct
+{
+ int max;
+ topolRELL **t;
+}
+ topolRELL_LIST;
+
+
+/**************************************************************/
+
+
+
+typedef struct conntyp {
+ // double z[NUM_BRANCHES]; /* branch length */
+ node *p, *q; /* parent and child sectors */
+ void *valptr; /* pointer to value of subtree */
+ int descend; /* pointer to first connect of child */
+ int sibling; /* next connect from same parent */
+ } connect, *connptr;
+
+typedef struct {
+ double likelihood;
+ int initialTreeNumber;
+ connect *links; /* pointer to first connect (start) */
+ node *start;
+ int nextlink; /* index of next available connect */
+ /* tr->start = tpl->links->p */
+ int ntips;
+ int nextnode;
+ int scrNum; /* position in sorted list of scores */
+ int tplNum; /* position in sorted list of trees */
+
+ } topol;
+
+typedef struct {
+ double best; /* highest score saved */
+ double worst; /* lowest score saved */
+ topol *start; /* starting tree for optimization */
+ topol **byScore;
+ topol **byTopol;
+ int nkeep; /* maximum topologies to save */
+ int nvalid; /* number of topologies saved */
+ int ninit; /* number of topologies initialized */
+ int numtrees; /* number of alternatives tested */
+ boolean improved;
+ } bestlist;
+
+typedef struct {
+ int categories;
+ int model;
+ int bestTrav;
+ int max_rearrange;
+ int stepwidth;
+ int initial;
+ boolean initialSet;
+ int mode;
+ long boot;
+ long rapidBoot;
+ boolean bootstrapBranchLengths;
+ boolean restart;
+ boolean useWeightFile;
+ boolean useMultipleModel;
+ boolean constraint;
+ boolean grouping;
+ boolean randomStartingTree;
+ boolean useInvariant;
+ int protEmpiricalFreqs;
+ int proteinMatrix;
+ int checkpoints;
+ int startingTreeOnly;
+ int multipleRuns;
+ long parsimonySeed;
+ boolean perGeneBranchLengths;
+ boolean likelihoodTest;
+ boolean permuteTreeoptimize;
+ boolean allInOne;
+ boolean generateBS;
+ boolean bootStopping;
+ boolean useExcludeFile;
+ boolean userProteinModel;
+ boolean computeELW;
+ boolean computeDistance;
+ boolean thoroughInsertion;
+ boolean compressPatterns;
+ boolean useSecondaryStructure;
+ double likelihoodEpsilon;
+ double gapyness;
+ int similarityFilterMode;
+ double *externalAAMatrix;
+ boolean readTaxaOnly;
+ int meshSearch;
+ boolean veryFast;
+ boolean useCheckpoint;
+ boolean leaveDropMode;
+ int slidingWindowSize;
+ boolean writeBinaryFile;
+ boolean readBinaryFile;
+#ifdef _BAYESIAN
+ boolean bayesian;
+#endif
+} analdef;
+
+typedef struct
+{
+ int leftLength;
+ int rightLength;
+ int eignLength;
+ int evLength;
+ int eiLength;
+ int substRatesLength;
+ int frequenciesLength;
+ int tipVectorLength;
+ int symmetryVectorLength;
+ int frequencyGroupingLength;
+
+ boolean nonGTR;
+ boolean optimizeBaseFrequencies;
+
+ unsigned char undetermined;
+
+ const char *inverseMeaning;
+
+ int states;
+
+ boolean smoothFrequencies;
+
+ const unsigned int *bitVector;
+
+} partitionLengths;
+
+/****************************** FUNCTIONS ****************************************************/
+
+
+
+extern void computePlacementBias(tree *tr, analdef *adef);
+
+extern int lookupWord(char *s, stringHashtable *h);
+
+extern void getDataTypeString(tree *tr, int model, char typeOfData[1024]);
+
+extern unsigned int genericBitCount(unsigned int* bitVector, unsigned int bitVectorLength);
+extern int countTips(nodeptr p, int numsp);
+extern entry *initEntry(void);
+extern void computeRogueTaxa(tree *tr, char* treeSetFileName, analdef *adef);
+extern unsigned int precomputed16_bitcount(unsigned int n);
+
+
+
+
+
+extern size_t discreteRateCategories(int rateHetModel);
+
+extern partitionLengths * getPartitionLengths(pInfo *p);
+extern boolean getSmoothFreqs(int dataType);
+extern const unsigned int *getBitVector(int dataType);
+extern unsigned char getUndetermined(int dataType);
+extern int getStates(int dataType);
+extern char getInverseMeaning(int dataType, unsigned char state);
+extern double gettime ( void );
+extern int gettimeSrand ( void );
+extern double randum ( long *seed );
+
+extern void getxnode ( nodeptr p );
+extern void hookup ( nodeptr p, nodeptr q, double *z, int numBranches);
+extern void hookupDefault ( nodeptr p, nodeptr q, int numBranches);
+extern boolean whitechar ( int ch );
+extern void errorExit ( int e );
+extern void printResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printBootstrapResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printBipartitionResult ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printLog ( tree *tr, analdef *adef, boolean finalPrint );
+extern void printStartingTree ( tree *tr, analdef *adef, boolean finalPrint );
+extern void writeInfoFile ( analdef *adef, tree *tr, double t );
+extern int main ( int argc, char *argv[] );
+extern void calcBipartitions ( tree *tr, analdef *adef, char *bestTreeFileName, char *bootStrapFileName );
+extern void initReversibleGTR (tree *tr, int model);
+extern double LnGamma ( double alpha );
+extern double IncompleteGamma ( double x, double alpha, double ln_gamma_alpha );
+extern double PointNormal ( double prob );
+extern double PointChi2 ( double prob, double v );
+extern void makeGammaCats (double alpha, double *gammaRates, int K);
+extern void initModel ( tree *tr, rawdata *rdta, cruncheddata *cdta, analdef *adef );
+extern void doAllInOne ( tree *tr, analdef *adef );
+
+extern void classifyML(tree *tr, analdef *adef);
+extern void doBootstrap ( tree *tr, analdef *adef, rawdata *rdta, cruncheddata *cdta );
+extern void doInference ( tree *tr, analdef *adef, rawdata *rdta, cruncheddata *cdta );
+extern void resetBranches ( tree *tr );
+extern void modOpt ( tree *tr, analdef *adef , double likelihoodEpsilon);
+
+
+extern void parsePartitions ( analdef *adef, rawdata *rdta, tree *tr);
+extern void computeBOOTRAPID (tree *tr, analdef *adef, long *radiusSeed);
+extern void optimizeRAPID ( tree *tr, analdef *adef );
+extern void thoroughOptimization ( tree *tr, analdef *adef, topolRELL_LIST *rl, int index );
+extern int treeOptimizeThorough ( tree *tr, int mintrav, int maxtrav);
+
+extern int checker ( tree *tr, nodeptr p );
+extern int randomInt ( int n );
+extern void makePermutation ( int *perm, int n, analdef *adef );
+extern boolean tipHomogeneityChecker ( tree *tr, nodeptr p, int grouping );
+extern void makeRandomTree ( tree *tr, analdef *adef );
+extern void nodeRectifier ( tree *tr );
+extern void makeParsimonyTreeThorough(tree *tr, analdef *adef);
+extern void makeParsimonyTree ( tree *tr, analdef *adef );
+extern void makeParsimonyTreeFastDNA(tree *tr, analdef *adef);
+extern void makeParsimonyTreeIncomplete ( tree *tr, analdef *adef );
+extern void makeParsimonyInsertions(tree *tr, nodeptr startNodeQ, nodeptr startNodeR);
+
+
+
+extern FILE *myfopen(const char *path, const char *mode);
+
+
+extern boolean initrav ( tree *tr, nodeptr p );
+extern void initravPartition ( tree *tr, nodeptr p, int model );
+extern boolean update ( tree *tr, nodeptr p );
+extern boolean smooth ( tree *tr, nodeptr p );
+extern boolean smoothTree ( tree *tr, int maxtimes );
+extern boolean localSmooth ( tree *tr, nodeptr p, int maxtimes );
+extern boolean localSmoothMulti(tree *tr, nodeptr p, int maxtimes, int model);
+extern void initInfoList ( int n );
+extern void freeInfoList ( void );
+extern void insertInfoList ( nodeptr node, double likelihood );
+extern boolean smoothRegion ( tree *tr, nodeptr p, int region );
+extern boolean regionalSmooth ( tree *tr, nodeptr p, int maxtimes, int region );
+extern nodeptr removeNodeBIG ( tree *tr, nodeptr p, int numBranches);
+extern nodeptr removeNodeRestoreBIG ( tree *tr, nodeptr p );
+extern boolean insertBIG ( tree *tr, nodeptr p, nodeptr q, int numBranches);
+extern boolean insertRestoreBIG ( tree *tr, nodeptr p, nodeptr q );
+extern boolean testInsertBIG ( tree *tr, nodeptr p, nodeptr q );
+extern void addTraverseBIG ( tree *tr, nodeptr p, nodeptr q, int mintrav, int maxtrav );
+extern int rearrangeBIG ( tree *tr, nodeptr p, int mintrav, int maxtrav );
+extern void traversalOrder ( nodeptr p, int *count, nodeptr *nodeArray );
+extern double treeOptimizeRapid ( tree *tr, int mintrav, int maxtrav, analdef *adef, bestlist *bt);
+extern boolean testInsertRestoreBIG ( tree *tr, nodeptr p, nodeptr q );
+extern void restoreTreeFast ( tree *tr );
+extern int determineRearrangementSetting ( tree *tr, analdef *adef, bestlist *bestT, bestlist *bt );
+extern void computeBIGRAPID ( tree *tr, analdef *adef, boolean estimateModel);
+extern boolean treeEvaluate ( tree *tr, double smoothFactor );
+extern boolean treeEvaluatePartition ( tree *tr, double smoothFactor, int model );
+
+extern void meshTreeSearch(tree *tr, analdef *adef, int thorough);
+
+extern void initTL ( topolRELL_LIST *rl, tree *tr, int n );
+extern void freeTL ( topolRELL_LIST *rl);
+extern void restoreTL ( topolRELL_LIST *rl, tree *tr, int n );
+extern void resetTL ( topolRELL_LIST *rl );
+extern void saveTL ( topolRELL_LIST *rl, tree *tr, int index );
+
+extern int saveBestTree (bestlist *bt, tree *tr);
+extern int recallBestTree (bestlist *bt, int rank, tree *tr);
+extern int initBestTree ( bestlist *bt, int newkeep, int numsp );
+extern void resetBestTree ( bestlist *bt );
+extern boolean freeBestTree ( bestlist *bt );
+
+
+extern char *Tree2String ( char *treestr, tree *tr, nodeptr p, boolean printBranchLengths, boolean printNames, boolean printLikelihood,
+ boolean rellTree, boolean finalPrint, int perGene, boolean branchLabelSupport, boolean printSHSupport);
+extern void printTreePerGene(tree *tr, analdef *adef, char *fileName, char *permission);
+
+
+
+extern int treeReadLen (FILE *fp, tree *tr, boolean readBranches, boolean readNodeLabels, boolean topologyOnly);
+extern void treeReadTopologyString(char *treeString, tree *tr);
+extern boolean treeReadLenMULT ( FILE *fp, tree *tr, analdef *adef );
+
+extern void getStartingTree ( tree *tr);
+extern double treeLength(tree *tr, int model);
+
+extern void computeBootStopOnly(tree *tr, char *bootStrapFileName, analdef *adef);
+extern boolean bootStop(tree *tr, hashtable *h, int numberOfTrees, double *pearsonAverage, unsigned int **bitVectors, int treeVectorLength, unsigned int vectorLength);
+extern void computeConsensusOnly(tree *tr, char* treeSetFileName, analdef *adef);
+extern double evaluatePartialGeneric (tree *, int i, double ki, int _model);
+extern void evaluateGeneric (tree *tr, nodeptr p, boolean fullTraversal);
+extern void newviewGeneric (tree *tr, nodeptr p, boolean masked);
+extern void newviewGenericMulti (tree *tr, nodeptr p, int model);
+extern void makenewzGeneric(tree *tr, nodeptr p, nodeptr q, double *z0, int maxiter, double *result, boolean mask);
+extern void makenewzGenericDistance(tree *tr, int maxiter, double *z0, double *result, int taxon1, int taxon2);
+extern double evaluatePartitionGeneric (tree *tr, nodeptr p, int model);
+extern void newviewPartitionGeneric (tree *tr, nodeptr p, int model);
+extern double evaluateGenericVector (tree *tr, nodeptr p);
+extern void categorizeGeneric (tree *tr, nodeptr p);
+extern double makenewzPartitionGeneric(tree *tr, nodeptr p, nodeptr q, double z0, int maxiter, int model);
+extern boolean isTip(int number, int maxTips);
+extern void computeTraversalInfo(nodeptr p, traversalInfo *ti, int *counter, int maxTips, int numBranches, boolean partialTraversal);
+
+
+
+extern void newviewIterative(tree *tr, int startIndex);
+
+extern void evaluateIterative(tree *);
+
+extern void *malloc_aligned( size_t size);
+
+extern void storeExecuteMaskInTraversalDescriptor(tree *tr);
+extern void storeValuesInTraversalDescriptor(tree *tr, double *value);
+extern void myBinFwrite(const void *ptr, size_t size, size_t nmemb);
+extern void myBinFread(void *ptr, size_t size, size_t nmemb);
+
+
+
+extern void makenewzIterative(tree *);
+extern void execCore(tree *, volatile double *dlnLdlz, volatile double *d2lnLdlz2);
+
+
+
+extern void determineFullTraversal(nodeptr p, tree *tr);
+/*extern void optRateCat(tree *, int i, double lower_spacing, double upper_spacing, double *lhs);*/
+
+extern unsigned int evaluateParsimonyIterative(tree *);
+extern void newviewParsimonyIterative(tree *);
+
+extern unsigned int evaluateParsimonyIterativeFast(tree *);
+extern void newviewParsimonyIterativeFast(tree *);
+
+extern unsigned int evaluatePerSiteParsimony(tree *tr, nodeptr p, unsigned int *siteParsimony);
+extern void initravParsimonyNormal(tree *tr, nodeptr p);
+
+extern double evaluateGenericInitravPartition(tree *tr, nodeptr p, int model);
+extern void evaluateGenericVectorIterative(tree *, int startIndex, int endIndex);
+extern void categorizeIterative(tree *, int startIndex, int endIndex);
+
+extern void fixModelIndices(tree *tr, int endsite, boolean fixRates);
+extern void calculateModelOffsets(tree *tr);
+extern void gammaToCat(tree *tr);
+extern void catToGamma(tree *tr, analdef *adef);
+extern void handleExcludeFile(tree *tr, analdef *adef, rawdata *rdta);
+
+extern nodeptr findAnyTip(nodeptr p, int numsp);
+
+extern void parseProteinModel(analdef *adef);
+
+
+
+extern void computeNextReplicate(tree *tr, long *seed, int *originalRateCategories, int *originalInvariant, boolean isRapid, boolean fixRates);
+/*extern void computeNextReplicate(tree *tr, analdef *adef, int *originalRateCategories, int *originalInvariant);*/
+
+extern void putWAG(double *ext_initialRates);
+
+extern void reductionCleanup(tree *tr, int *originalRateCategories, int *originalInvariant);
+extern void parseSecondaryStructure(tree *tr, analdef *adef, int sites);
+extern void printPartitions(tree *tr);
+extern void compareBips(tree *tr, char *bootStrapFileName, analdef *adef);
+extern void computeRF(tree *tr, char *bootStrapFileName, analdef *adef);
+
+
+extern unsigned int **initBitVector(tree *tr, unsigned int *vectorLength);
+extern hashtable *copyHashTable(hashtable *src, unsigned int vectorLength);
+extern hashtable *initHashTable(unsigned int n);
+extern void cleanupHashTable(hashtable *h, int state);
+extern double convergenceCriterion(hashtable *h, int mxtips);
+extern void freeBitVectors(unsigned int **v, int n);
+extern void freeHashTable(hashtable *h);
+
+
+
+extern void printBothOpen(const char* format, ... );
+extern void printBothOpenMPI(const char* format, ... );
+extern void initRateMatrix(tree *tr);
+
+extern void bitVectorInitravSpecial(unsigned int **bitVectors, nodeptr p, int numsp, unsigned int vectorLength, hashtable *h, int treeNumber, int function, branchInfo *bInf,
+ int *countBranches, int treeVectorLength, boolean traverseOnly, boolean computeWRF);
+
+extern int getIncrement(tree *tr, int model);
+
+extern void fastSearch(tree *tr, analdef *adef, rawdata *rdta, cruncheddata *cdta);
+extern void shSupports(tree *tr, analdef *adef, rawdata *rdta, cruncheddata *cdta);
+
+extern FILE *getNumberOfTrees(tree *tr, char *fileName, analdef *adef);
+
+extern void writeBinaryModel(tree *tr);
+extern void readBinaryModel(tree *tr);
+extern void treeEvaluateRandom (tree *tr, double smoothFactor);
+extern void treeEvaluateProgressive(tree *tr);
+
+extern void testGapped(tree *tr);
+
+extern boolean issubset(unsigned int* bipA, unsigned int* bipB, unsigned int vectorLen);
+extern boolean compatible(entry* e1, entry* e2, unsigned int bvlen);
+
+
+
+extern int *permutationSH(tree *tr, int nBootstrap, long _randomSeed);
+
+extern void updatePerSiteRates(tree *tr, boolean scaleRates);
+
+extern void restart(tree *tr);
+
+
+
+
+
+
+
diff --git a/parser/globalVariables.h b/parser/globalVariables.h
new file mode 100644
index 0000000..2e407a6
--- /dev/null
+++ b/parser/globalVariables.h
@@ -0,0 +1,195 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+
+
+
+#ifdef _USE_ZLIB
+
+#include <zlib.h>
+
+#endif
+
+
+
+#ifdef _FINE_GRAIN_MPI
+int processes;
+double *globalResult;
+#endif
+
+int processID;
+infoList iList;
+FILE *INFILE;
+
+#ifdef _USE_ZLIB
+gzFile byteFile;
+#else
+FILE *byteFile;
+#endif
+
+
+char run_id[128] = "",
+ seq_file[1024] = "",
+ weightFileName[1024] = "",
+ modelFileName[1024] = "",
+ byteFileName[1024] = "",
+ infoFileName[1024] = "",
+ secondaryStructureFileName[1024] = "",
+ excludeFileName[1024],
+ proteinModelFileName[1024];
+
+char *protModels[NUM_PROT_MODELS] = {"DAYHOFF", "DCMUT", "JTT", "MTREV", "WAG", "RTREV", "CPREV", "VT", "BLOSUM62", "MTMAM", "LG", "MTART", "MTZOA", "PMB",
+ "HIVB", "HIVW", "JTTDCMUT", "FLU", "STMTREV", "AUTO", "LG4M", "LG4X", "GTR"};
+
+const char inverseMeaningBINARY[4] = {'_', '0', '1', '-'};
+const char inverseMeaningDNA[16] = {'_', 'A', 'C', 'M', 'G', 'R', 'S', 'V', 'T', 'W', 'Y', 'H', 'K', 'D', 'B', '-'};
+const char inverseMeaningPROT[23] = {'A','R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S',
+ 'T', 'W', 'Y', 'V', 'B', 'Z', '-'};
+const char inverseMeaningGeneric32[33] = {'0', '1', '2', '3', '4', '5', '6', '7',
+ '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
+ 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
+ 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
+ '-'};
+const char inverseMeaningGeneric64[33] = {'0', '1', '2', '3', '4', '5', '6', '7',
+ '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
+ 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
+ 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
+ '-'};
+
+const unsigned int bitVectorIdentity[256] = {0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13 ,14 ,15 ,16 ,17 ,18 ,19 ,20 ,21 ,22 ,23 ,24 ,25 ,26 ,
+ 27 ,28 ,29 ,30 ,31 ,32 ,33 ,34 ,35 ,36 ,37 ,38 ,39 ,40 ,41 ,42 ,43 ,44 ,45 ,46 ,47 ,48 ,49 ,50 ,51 ,
+ 52 ,53 ,54 ,55 ,56 ,57 ,58 ,59 ,60 ,61 ,62 ,63 ,64 ,65 ,66 ,67 ,68 ,69 ,70 ,71 ,72 ,73 ,74 ,75 ,76 ,
+ 77 ,78 ,79 ,80 ,81 ,82 ,83 ,84 ,85 ,86 ,87 ,88 ,89 ,90 ,91 ,92 ,93 ,94 ,95 ,96 ,97 ,98 ,99 ,100 ,101 ,
+ 102 ,103 ,104 ,105 ,106 ,107 ,108 ,109 ,110 ,111 ,112 ,113 ,114 ,115 ,116 ,117 ,118 ,119 ,120 ,121 ,122 ,
+ 123 ,124 ,125 ,126 ,127 ,128 ,129 ,130 ,131 ,132 ,133 ,134 ,135 ,136 ,137 ,138 ,139 ,140 ,141 ,142 ,143 ,
+ 144 ,145 ,146 ,147 ,148 ,149 ,150 ,151 ,152 ,153 ,154 ,155 ,156 ,157 ,158 ,159 ,160 ,161 ,162 ,163 ,164 ,
+ 165 ,166 ,167 ,168 ,169 ,170 ,171 ,172 ,173 ,174 ,175 ,176 ,177 ,178 ,179 ,180 ,181 ,182 ,183 ,184 ,185 ,
+ 186 ,187 ,188 ,189 ,190 ,191 ,192 ,193 ,194 ,195 ,196 ,197 ,198 ,199 ,200 ,201 ,202 ,203 ,204 ,205 ,206 ,
+ 207 ,208 ,209 ,210 ,211 ,212 ,213 ,214 ,215 ,216 ,217 ,218 ,219 ,220 ,221 ,222 ,223 ,224 ,225 ,226 ,227 ,
+ 228 ,229 ,230 ,231 ,232 ,233 ,234 ,235 ,236 ,237 ,238 ,239 ,240 ,241 ,242 ,243 ,244 ,245 ,246 ,247 ,248 ,
+ 249 ,250 ,251 ,252 ,253 ,254 ,255};
+
+
+
+const unsigned int bitVectorAA[23] = {1, 2, 4, 8, 16, 32, 64, 128,
+ 256, 512, 1024, 2048, 4096,
+ 8192, 16384, 32768, 65536, 131072, 262144,
+ 524288, 12 /* N | D */, 96 /*Q | E*/, 1048575 /* - */};
+
+const unsigned int bitVectorSecondary[256] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
+ 10, 11, 12, 13, 14, 15, 0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192,
+ 208, 224, 240, 0, 17, 34, 51, 68, 85, 102, 119, 136, 153, 170, 187, 204, 221, 238,
+ 255, 0, 256, 512, 768, 1024, 1280, 1536, 1792, 2048, 2304, 2560, 2816, 3072, 3328,
+ 3584, 3840, 0, 257, 514, 771, 1028, 1285, 1542, 1799, 2056, 2313, 2570, 2827, 3084,
+ 3341, 3598, 3855, 0, 272, 544, 816, 1088, 1360, 1632, 1904, 2176, 2448, 2720, 2992,
+ 3264, 3536, 3808, 4080, 0, 273, 546, 819, 1092, 1365, 1638, 1911, 2184, 2457, 2730,
+ 3003, 3276, 3549, 3822, 4095, 0, 4096, 8192, 12288, 16384, 20480, 24576, 28672, 32768,
+ 36864, 40960, 45056, 49152, 53248, 57344, 61440, 0, 4097, 8194, 12291, 16388, 20485, 24582,
+ 28679, 32776, 36873, 40970, 45067, 49164, 53261, 57358, 61455, 0, 4112, 8224, 12336, 16448,
+ 20560, 24672, 28784, 32896, 37008, 41120, 45232, 49344, 53456, 57568, 61680, 0, 4113, 8226,
+ 12339, 16452, 20565, 24678, 28791, 32904, 37017, 41130, 45243, 49356, 53469, 57582, 61695,
+ 0, 4352, 8704, 13056, 17408, 21760, 26112, 30464, 34816, 39168, 43520, 47872, 52224, 56576,
+ 60928, 65280, 0, 4353, 8706, 13059, 17412, 21765, 26118, 30471, 34824, 39177, 43530, 47883,
+ 52236, 56589, 60942, 65295, 0, 4368, 8736, 13104, 17472, 21840, 26208, 30576, 34944, 39312,
+ 43680, 48048, 52416, 56784, 61152, 65520, 0, 4369, 8738, 13107, 17476, 21845, 26214, 30583,
+ 34952, 39321, 43690, 48059, 52428, 56797, 61166, 65535};
+
+const unsigned int bitVector32[33] = {1, 2, 4, 8, 16, 32, 64, 128,
+ 256, 512, 1024, 2048, 4096, 8192, 16384, 32768,
+ 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608,
+ 16777216, 33554432, 67108864, 134217728, 268435456, 536870912, 1073741824, 2147483648u,
+ 4294967295u};
+
+/*const unsigned int bitVector64[65] = {};*/
+
+const unsigned int mask32[32] = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072,
+ 262144, 524288, 1048576, 2097152, 4194304, 8388608, 16777216, 33554432, 67108864, 134217728,
+ 268435456, 536870912, 1073741824, 2147483648U};
+
+const char *secondaryModelList[21] = { "S6A (GTR)", "S6B", "S6C", "S6D", "S6E", "S7A (GTR)", "S7B", "S7C", "S7D", "S7E", "S7F", "S16 (GTR)", "S16A", "S16B", "S16C",
+ "S16D", "S16E", "S16F", "S16I", "S16J", "S16K"};
+
+double masterTime;
+double accumulatedTime;
+int partCount = 0;
+int optimizeRateCategoryInvocations = 1;
+
+
+
+
+
+partitionLengths pLengths[MAX_MODEL] = {
+
+ /* BINARY */
+
+ // {4, 4, 2, 4, 2, 1, 2, 8, 2, 2, FALSE, FALSE, 3, inverseMeaningBINARY, 2, FALSE, bitVectorIdentity},
+ //eiLength changed from 2 -> 4
+ {4, 4, 2, 4, 4, 1, 2, 8, 2, 2, FALSE, FALSE, 3, inverseMeaningBINARY, 2, FALSE, bitVectorIdentity},
+ /* DNA */
+ {16, 16, 4, 16, 16, 6, 4, 64, 6, 4, FALSE, FALSE, 15, inverseMeaningDNA, 4, FALSE, bitVectorIdentity},
+
+ /* AA */
+ {400, 400, 20, 400, 400, 190, 20, 460, 190, 20, FALSE, FALSE, 22, inverseMeaningPROT, 20, TRUE, bitVectorAA},
+
+ /* SECONDARY_DATA */
+
+ {256, 256, 16, 256, 256, 120, 16, 4096, 120, 16, FALSE, FALSE, 255, (char*)NULL, 16, TRUE, bitVectorSecondary},
+
+
+ /* SECONDARY_DATA_6 */
+ {36, 36, 6, 36, 36, 15, 6, 384, 15, 6, FALSE, FALSE, 63, (char*)NULL, 6, TRUE, bitVectorIdentity},
+
+
+ /* SECONDARY_DATA_7 */
+ {49, 49, 7, 49, 49, 21, 7, 896, 21, 7, FALSE, FALSE, 127, (char*)NULL, 7, TRUE, bitVectorIdentity},
+
+ /* 32 states */
+ {1024, 1024, 32, 1024, 1024, 496, 32, 1056, 496, 32, FALSE, FALSE, 32, inverseMeaningGeneric32, 32, TRUE, bitVector32},
+
+ /* 64 states */
+ {4096, 4096, 64, 4096, 4096, 2016, 64, 4160, 64, 2016, FALSE, FALSE, 64, (char*)NULL, 64, TRUE, (unsigned int*)NULL}
+};
+
+partitionLengths pLength;
+
+
+
+
+
+
+#ifdef _USE_PTHREADS
+volatile int NumberOfJobs;
+volatile int jobCycle = 0;
+volatile int threadJob = 0;
+volatile int NumberOfThreads;
+volatile double *reductionBuffer;
+volatile double *reductionBufferTwo;
+volatile char *barrierBuffer;
+#endif
diff --git a/parser/parsePartitions.c b/parser/parsePartitions.c
new file mode 100644
index 0000000..23cef3e
--- /dev/null
+++ b/parser/parsePartitions.c
@@ -0,0 +1,1427 @@
+/* RAxML-VI-HPC (version 2.2) a program for sequential and parallel estimation of phylogenetic trees
+ * Copyright August 2006 by Alexandros Stamatakis
+ *
+ * Partially derived from
+ * fastDNAml, a program for estimation of phylogenetic trees from sequences by Gary J. Olsen
+ *
+ * and
+ *
+ * Programs of the PHYLIP package by Joe Felsenstein.
+ *
+ * This program is free software; you may redistribute it and/or modify its
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ *
+ * For any other enquiries send an Email to Alexandros Stamatakis
+ * Alexandros.Stamatakis at epfl.ch
+ *
+ * When publishing work that is based on the results from RAxML-VI-HPC please cite:
+ *
+ * Alexandros Stamatakis:"RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models".
+ * Bioinformatics 2006; doi: 10.1093/bioinformatics/btl446
+ */
+
+
+#ifndef WIN32
+#include <sys/times.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <unistd.h>
+#endif
+
+#include <math.h>
+#include <time.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <strings.h>
+
+
+
+
+#include "axml.h"
+
+/*****************************FUNCTIONS FOR READING MULTIPLE MODEL SPECIFICATIONS************************************************/
+
+
+extern char modelFileName[1024];
+extern char excludeFileName[1024];
+extern char proteinModelFileName[1024];
+extern char secondaryStructureFileName[1024];
+
+
+extern char seq_file[1024];
+
+extern char *protModels[NUM_PROT_MODELS];
+
+static boolean lineContainsOnlyWhiteChars(char *line)
+{
+ int i, n = strlen(line);
+
+ if(n == 0)
+ return TRUE;
+
+ for(i = 0; i < n; i++)
+ {
+ if(!whitechar(line[i]))
+ return FALSE;
+ }
+ return TRUE;
+}
+
+
+static int isNum(char c)
+{
+
+ return (c == '0' || c == '1' || c == '2' || c == '3' || c == '4' ||
+ c == '5' || c == '6' || c == '7' || c == '8' || c == '9');
+}
+
+
+static void skipWhites(char **ch)
+{
+ while(**ch == ' ' || **ch == '\t')
+ *ch = *ch + 1;
+}
+
+static void analyzeIdentifier(char **ch, int modelNumber, tree *tr)
+{
+ char
+ *start = *ch,
+ ident[2048] = "";
+ char model[128] = "";
+ char thisModel[1024];
+ int i = 0, n, j;
+ int containsComma = 0;
+
+ while(**ch != '=')
+ {
+ if(**ch == '\n' || **ch == '\r')
+ {
+ printf("\nPartition file parsing error!\n");
+ printf("Each line must contain a \"=\" character\n");
+ printf("Offending line: %s\n", start);
+ printf("ExaML will exit now.\n\n");
+ exit(-1);
+ }
+
+ if(**ch != ' ' && **ch != '\t')
+ {
+ ident[i] = **ch;
+ i++;
+ }
+ *ch = *ch + 1;
+ }
+
+ n = i;
+ i = 0;
+
+ for(i = 0; i < n; i++)
+ if(ident[i] == ',')
+ containsComma = 1;
+
+ if(!containsComma)
+ {
+ printf("Error, model file must have format: DNA or AA model, then a comma, and then the partition name\n");
+ exit(-1);
+ }
+ else
+ {
+ boolean found = FALSE;
+ i = 0;
+ while(ident[i] != ',')
+ {
+ model[i] = ident[i];
+ i++;
+ }
+
+ /* AA */
+
+ for(i = 0; i < NUM_PROT_MODELS && !found; i++)
+ {
+ strcpy(thisModel, protModels[i]);
+
+ if(strcasecmp(model, thisModel) == 0)
+ {
+ tr->initialPartitionData[modelNumber].protModels = i;
+ tr->initialPartitionData[modelNumber].protFreqs = 0;
+ tr->initialPartitionData[modelNumber].dataType = AA_DATA;
+ found = TRUE;
+ }
+
+ strcpy(thisModel, protModels[i]);
+ strcat(thisModel, "F");
+
+ if(strcasecmp(model, thisModel) == 0)
+ {
+ tr->initialPartitionData[modelNumber].protModels = i;
+ tr->initialPartitionData[modelNumber].protFreqs = 1;
+ tr->initialPartitionData[modelNumber].dataType = AA_DATA;
+ found = TRUE;
+
+ if(tr->initialPartitionData[modelNumber].protModels == AUTO)
+ {
+ printf("\nError: Option AUTOF has been deprecated, exiting\n\n");
+ errorExit(-1);
+ }
+
+ if(tr->initialPartitionData[modelNumber].protModels == LG4M || tr->initialPartitionData[modelNumber].protModels == LG4X)
+ {
+ printf("\nError: Options LG4MF and LG4XF have been deprecated.\n");
+ printf("They shall only be used with the given base frequencies of the model, exiting\n\n");
+ errorExit(-1);
+ }
+ }
+
+ strcpy(thisModel, protModels[i]);
+ strcat(thisModel, "X");
+
+ if(strcasecmp(model, thisModel) == 0)
+ {
+ tr->initialPartitionData[modelNumber].protModels = i;
+ tr->initialPartitionData[modelNumber].protFreqs = 0;
+ tr->initialPartitionData[modelNumber].optimizeBaseFrequencies = TRUE;
+ tr->initialPartitionData[modelNumber].dataType = AA_DATA;
+ found = TRUE;
+
+ if(tr->initialPartitionData[modelNumber].protModels == AUTO)
+ {
+ printf("\nError: Option AUTOX has been deprecated, exiting\n\n");
+ errorExit(-1);
+ }
+
+ if(tr->initialPartitionData[modelNumber].protModels == LG4M || tr->initialPartitionData[modelNumber].protModels == LG4X)
+ {
+ printf("\nError: Options LG4MX and LG4XX have been deprecated.\n");
+ printf("They shall only be used with the given base frequencies of the model, exiting\n\n");
+ errorExit(-1);
+ }
+
+ }
+
+ /*if(found)
+ printf("%s %d\n", model, i);*/
+ }
+
+ if(!found)
+ {
+ if(strcasecmp(model, "DNA") == 0)
+ {
+ tr->initialPartitionData[modelNumber].protModels = -1;
+ tr->initialPartitionData[modelNumber].protFreqs = -1;
+ tr->initialPartitionData[modelNumber].dataType = DNA_DATA;
+ tr->initialPartitionData[modelNumber].optimizeBaseFrequencies = FALSE;
+ found = TRUE;
+ }
+ else
+ {
+ if(strcasecmp(model, "DNAX") == 0)
+ {
+ tr->initialPartitionData[modelNumber].protModels = -1;
+ tr->initialPartitionData[modelNumber].protFreqs = -1;
+ tr->initialPartitionData[modelNumber].dataType = DNA_DATA;
+ tr->initialPartitionData[modelNumber].optimizeBaseFrequencies = TRUE;
+ found = TRUE;
+ }
+ else
+ {
+ if(strcasecmp(model, "BIN") == 0)
+ {
+ tr->initialPartitionData[modelNumber].protModels = -1;
+ tr->initialPartitionData[modelNumber].protFreqs = -1;
+ tr->initialPartitionData[modelNumber].dataType = BINARY_DATA;
+ tr->initialPartitionData[modelNumber].optimizeBaseFrequencies = FALSE;
+ found = TRUE;
+ }
+ else
+ {
+ if(strcasecmp(model, "BINX") == 0)
+ {
+ tr->initialPartitionData[modelNumber].protModels = -1;
+ tr->initialPartitionData[modelNumber].protFreqs = -1;
+ tr->initialPartitionData[modelNumber].dataType = BINARY_DATA;
+ tr->initialPartitionData[modelNumber].optimizeBaseFrequencies = TRUE;
+ found = TRUE;
+ }
+ else
+ {
+ if(strcasecmp(model, "MULTI") == 0)
+ {
+ tr->initialPartitionData[modelNumber].protModels = -1;
+ tr->initialPartitionData[modelNumber].protFreqs = -1;
+ tr->initialPartitionData[modelNumber].dataType = GENERIC_32;
+
+ found = TRUE;
+ }
+ else
+ {
+ if(strcasecmp(model, "CODON") == 0)
+ {
+ tr->initialPartitionData[modelNumber].protModels = -1;
+ tr->initialPartitionData[modelNumber].protFreqs = -1;
+ tr->initialPartitionData[modelNumber].dataType = GENERIC_64;
+
+ found = TRUE;
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+
+ if(!found)
+ {
+ printf("ERROR: you specified the unknown model %s for partition %d\n", model, modelNumber);
+ exit(-1);
+ }
+
+
+ i = 0;
+ while(ident[i++] != ',');
+
+ tr->initialPartitionData[modelNumber].partitionName = (char*)malloc((n - i + 1) * sizeof(char));
+
+ j = 0;
+ while(i < n)
+ tr->initialPartitionData[modelNumber].partitionName[j++] = ident[i++];
+
+ tr->initialPartitionData[modelNumber].partitionName[j] = '\0';
+ }
+}
+
+
+
+static void setModel(int model, int position, int *a)
+{
+ if(a[position] == -1)
+ a[position] = model;
+ else
+ {
+ printf("ERROR trying to assign model %d to position %d \n", model, position);
+ printf("while already model %d has been assigned to this position\n", a[position]);
+ exit(-1);
+ }
+}
+
+
+static int myGetline(char **lineptr, int *n, FILE *stream)
+{
+ char *line, *p;
+ int size, copy, len;
+ int chunkSize = 256 * sizeof(char);
+
+ if (*lineptr == NULL || *n < 2)
+ {
+ line = (char *)realloc(*lineptr, chunkSize);
+ if (line == NULL)
+ return -1;
+ *lineptr = line;
+ *n = chunkSize;
+ }
+
+ line = *lineptr;
+ size = *n;
+
+ copy = size;
+ p = line;
+
+ while(1)
+ {
+ while (--copy > 0)
+ {
+ register int c = getc(stream);
+ if (c == EOF)
+ goto lose;
+ else
+ {
+ *p++ = c;
+ if(c == '\n' || c == '\r')
+ goto win;
+ }
+ }
+
+ /* Need to enlarge the line buffer. */
+ len = p - line;
+ size *= 2;
+ line = realloc (line, size);
+ if (line == NULL)
+ goto lose;
+ *lineptr = line;
+ *n = size;
+ p = line + len;
+ copy = size - len;
+ }
+
+ lose:
+ if (p == *lineptr)
+ return -1;
+ /* Return a partial line since we got an error in the middle. */
+ win:
+ *p = '\0';
+ return p - *lineptr;
+}
+
+
+
+void parsePartitions(analdef *adef, rawdata *rdta, tree *tr)
+{
+ FILE *f;
+ int numberOfModels = 0;
+ int nbytes = 0;
+ char *ch;
+ char *cc = (char *)NULL;
+ char **p_names;
+ int n, i, l;
+ int lower, upper, modulo;
+ char buf[256];
+ int **partitions;
+ int pairsCount;
+ int as, j;
+ int k;
+
+ f = myfopen(modelFileName, "rb");
+
+
+ while(myGetline(&cc, &nbytes, f) > -1)
+ {
+ if(!lineContainsOnlyWhiteChars(cc))
+ {
+ numberOfModels++;
+ }
+ if(cc)
+ free(cc);
+ cc = (char *)NULL;
+ }
+
+ rewind(f);
+
+ p_names = (char **)malloc(sizeof(char *) * numberOfModels);
+ partitions = (int **)malloc(sizeof(int *) * numberOfModels);
+
+
+
+ tr->initialPartitionData = (pInfo*)malloc(sizeof(pInfo) * numberOfModels);
+
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ tr->initialPartitionData[i].protModels = adef->proteinMatrix;
+ tr->initialPartitionData[i].protFreqs = adef->protEmpiricalFreqs;
+ tr->initialPartitionData[i].dataType = -1;
+ }
+
+ for(i = 0; i < numberOfModels; i++)
+ partitions[i] = (int *)NULL;
+
+ i = 0;
+ while(myGetline(&cc, &nbytes, f) > -1)
+ {
+ if(!lineContainsOnlyWhiteChars(cc))
+ {
+ n = strlen(cc);
+ p_names[i] = (char *)malloc(sizeof(char) * (n + 1));
+ strcpy(&(p_names[i][0]), cc);
+ i++;
+ }
+ if(cc)
+ free(cc);
+ cc = (char *)NULL;
+ }
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ ch = p_names[i];
+ pairsCount = 0;
+ skipWhites(&ch);
+
+ if(*ch == '=')
+ {
+ printf("Identifier missing prior to '=' in %s\n", p_names[i]);
+ exit(-1);
+ }
+
+ analyzeIdentifier(&ch, i, tr);
+ ch++;
+
+ numberPairs:
+ pairsCount++;
+ partitions[i] = (int *)realloc((void *)partitions[i], (1 + 3 * pairsCount) * sizeof(int));
+ partitions[i][0] = pairsCount;
+ partitions[i][3 + 3 * (pairsCount - 1)] = -1;
+
+ skipWhites(&ch);
+
+ if(!isNum(*ch))
+ {
+ printf("%c Number expected in %s\n", *ch, p_names[i]);
+ exit(-1);
+ }
+
+ l = 0;
+ while(isNum(*ch))
+ {
+ /*printf("%c", *ch);*/
+ buf[l] = *ch;
+ ch++;
+ l++;
+ }
+ buf[l] = '\0';
+ lower = atoi(buf);
+ partitions[i][1 + 3 * (pairsCount - 1)] = lower;
+
+ skipWhites(&ch);
+
+ /* NEW */
+
+ if((*ch != '-') && (*ch != ','))
+ {
+ if(*ch == '\0' || *ch == '\n' || *ch == '\r')
+ {
+ upper = lower;
+ goto SINGLE_NUMBER;
+ }
+ else
+ {
+ printf("'-' or ',' expected in %s\n", p_names[i]);
+ exit(-1);
+ }
+ }
+
+ if(*ch == ',')
+ {
+ upper = lower;
+ goto SINGLE_NUMBER;
+ }
+
+ /* END NEW */
+
+ ch++;
+
+ skipWhites(&ch);
+
+ if(!isNum(*ch))
+ {
+ printf("%c Number expected in %s\n", *ch, p_names[i]);
+ exit(-1);
+ }
+
+ l = 0;
+ while(isNum(*ch))
+ {
+ buf[l] = *ch;
+ ch++;
+ l++;
+ }
+ buf[l] = '\0';
+ upper = atoi(buf);
+ SINGLE_NUMBER:
+ partitions[i][2 + 3 * (pairsCount - 1)] = upper;
+
+ if(upper < lower)
+ {
+ printf("Upper bound %d smaller than lower bound %d for this partition: %s\n", upper, lower, p_names[i]);
+ exit(-1);
+ }
+
+ skipWhites(&ch);
+
+ if(*ch == '\0' || *ch == '\n' || *ch == '\r') /* PC-LINEBREAK*/
+ {
+ goto parsed;
+ }
+
+ if(*ch == ',')
+ {
+ ch++;
+ goto numberPairs;
+ }
+
+ if(*ch == '\\')
+ {
+ ch++;
+ skipWhites(&ch);
+
+ if(!isNum(*ch))
+ {
+ printf("%c Number expected in %s\n", *ch, p_names[i]);
+ exit(-1);
+ }
+
+ if(adef->compressPatterns == FALSE)
+ {
+ printf("\nError: You are not allowed to use interleaved partitions, that is, assign non-contiguous sites\n");
+ printf("to the same partition model, when pattern compression is disabled via the -c flag!\n\n");
+ exit(-1);
+ }
+
+ l = 0;
+ while(isNum(*ch))
+ {
+ buf[l] = *ch;
+ ch++;
+ l++;
+ }
+ buf[l] = '\0';
+ modulo = atoi(buf);
+ partitions[i][3 + 3 * (pairsCount - 1)] = modulo;
+
+ skipWhites(&ch);
+ if(*ch == '\0' || *ch == '\n' || *ch == '\r')
+ {
+ goto parsed;
+ }
+ if(*ch == ',')
+ {
+ ch++;
+ goto numberPairs;
+ }
+ }
+
+
+ printf("\nError: You may be using \"/\" for specifying interleaved partitions in the model file, while it should be \"\\\" !\n\n");
+ assert(0);
+
+ parsed:
+ i = i;
+ }
+
+ fclose(f);
+
+ /*********************************************************************************************************************/
+
+ for(i = 0; i <= rdta->sites; i++)
+ tr->model[i] = -1;
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ as = partitions[i][0];
+
+ for(j = 0; j < as; j++)
+ {
+ lower = partitions[i][1 + j * 3];
+ upper = partitions[i][2 + j * 3];
+ modulo = partitions[i][3 + j * 3];
+
+ if(modulo == -1)
+ {
+ for(k = lower; k <= upper; k++)
+ setModel(i, k, tr->model);
+ }
+ else
+ {
+ for(k = lower; k <= upper; k += modulo)
+ {
+ if(k <= rdta->sites)
+ setModel(i, k, tr->model);
+ }
+ }
+ }
+ }
+
+
+ for(i = 1; i < rdta->sites + 1; i++)
+ {
+
+ if(tr->model[i] == -1)
+ {
+ printf("ERROR: Alignment Position %d has not been assigned any model\n", i);
+ exit(-1);
+ }
+ }
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ free(partitions[i]);
+ free(p_names[i]);
+ }
+
+ free(partitions);
+ free(p_names);
+
+ tr->NumberOfModels = numberOfModels;
+
+
+}
+
+/*******************************************************************************************************************************/
+
+void handleExcludeFile(tree *tr, analdef *adef, rawdata *rdta)
+{
+ FILE *f;
+ char buf[256];
+ int
+ ch,
+ j, value, i,
+ state = 0,
+ numberOfModels = 0,
+ l = -1,
+ excludeRegion = 0,
+ excludedColumns = 0,
+ modelCounter = 1;
+ int
+ *excludeArray, *countArray, *modelList;
+ int
+ **partitions;
+
+ printf("\n\n");
+
+ f = myfopen(excludeFileName, "rb");
+
+ while((ch = getc(f)) != EOF)
+ {
+ if(ch == '-')
+ numberOfModels++;
+ }
+
+ excludeArray = (int*)malloc(sizeof(int) * (rdta->sites + 1));
+ countArray = (int*)malloc(sizeof(int) * (rdta->sites + 1));
+ modelList = (int *)malloc((rdta->sites + 1)* sizeof(int));
+
+ partitions = (int **)malloc(sizeof(int *) * numberOfModels);
+ for(i = 0; i < numberOfModels; i++)
+ partitions[i] = (int *)malloc(sizeof(int) * 2);
+
+ rewind(f);
+
+ while((ch = getc(f)) != EOF)
+ {
+ switch(state)
+ {
+ case 0: /* get first number */
+ if(!whitechar(ch))
+ {
+ if(!isNum(ch))
+ {
+ printf("exclude file must have format: number-number [number-number]*\n");
+ exit(-1);
+ }
+ l = 0;
+ buf[l++] = ch;
+ state = 1;
+ }
+ break;
+ case 1: /*get the number or detect - */
+ if(!isNum(ch) && ch != '-')
+ {
+ printf("exclude file must have format: number-number [number-number]*\n");
+ exit(-1);
+ }
+ if(isNum(ch))
+ {
+ buf[l++] = ch;
+ }
+ else
+ {
+ buf[l++] = '\0';
+ value = atoi(buf);
+ partitions[excludeRegion][0] = value;
+ state = 2;
+ }
+ break;
+ case 2: /*get second number */
+ if(!isNum(ch))
+ {
+ printf("exclude file must have format: number-number [number-number]*\n");
+ exit(-1);
+ }
+ l = 0;
+ buf[l++] = ch;
+ state = 3;
+ break;
+ case 3: /* continue second number or find end */
+ if(!isNum(ch) && !whitechar(ch))
+ {
+ printf("exclude file must have format: number-number [number-number]*\n");
+ exit(-1);
+ }
+ if(isNum(ch))
+ {
+ buf[l++] = ch;
+ }
+ else
+ {
+ buf[l++] = '\0';
+ value = atoi(buf);
+ partitions[excludeRegion][1] = value;
+ excludeRegion++;
+ state = 0;
+ }
+ break;
+ default:
+ assert(0);
+ }
+ }
+
+ if(state == 3)
+ {
+ buf[l++] = '\0';
+ value = atoi(buf);
+ partitions[excludeRegion][1] = value;
+ excludeRegion++;
+ }
+
+ assert(excludeRegion == numberOfModels);
+
+ for(i = 0; i <= rdta->sites; i++)
+ {
+ excludeArray[i] = -1;
+ countArray[i] = 0;
+ modelList[i] = -1;
+ }
+
+ for(i = 0; i < numberOfModels; i++)
+ {
+ int lower = partitions[i][0];
+ int upper = partitions[i][1];
+
+ if(lower > upper)
+ {
+ printf("Misspecified exclude region %d\n", i);
+ printf("lower bound %d is greater than upper bound %d\n", lower, upper);
+ exit(-1);
+ }
+
+ if(lower == 0)
+ {
+ printf("Misspecified exclude region %d\n", i);
+ printf("lower bound must be greater than 0\n");
+ exit(-1);
+ }
+
+ if(upper > rdta->sites)
+ {
+ printf("Misspecified exclude region %d\n", i);
+ printf("upper bound %d must be smaller than %d\n", upper, (rdta->sites + 1));
+ exit(-1);
+ }
+ for(j = lower; j <= upper; j++)
+ {
+ if(excludeArray[j] != -1)
+ {
+ printf("WARNING: Exclude regions %d and %d overlap at position %d (already excluded %d times)\n",
+ excludeArray[j], i, j, countArray[j]);
+ }
+ excludeArray[j] = i;
+ countArray[j] = countArray[j] + 1;
+ }
+ }
+
+ for(i = 1; i <= rdta->sites; i++)
+ {
+ if(excludeArray[i] != -1)
+ excludedColumns++;
+ else
+ {
+ modelList[modelCounter] = tr->model[i];
+ modelCounter++;
+ }
+ }
+
+ printf("You have excluded %d out of %d columns\n", excludedColumns, rdta->sites);
+
+ if(excludedColumns == rdta->sites)
+ {
+ printf("Error: You have excluded all sites\n");
+ exit(-1);
+ }
+
+ if(adef->useSecondaryStructure && (excludedColumns > 0))
+ {
+ char mfn[2048];
+ int countColumns;
+ FILE *newFile;
+
+ assert(adef->useMultipleModel);
+
+ strcpy(mfn, secondaryStructureFileName);
+ strcat(mfn, ".");
+ strcat(mfn, excludeFileName);
+
+ newFile = myfopen(mfn, "wb");
+
+ printBothOpen("\nA secondary structure file with analogous structure assignments for non-excluded columns is printed to file %s\n", mfn);
+
+ for(i = 1, countColumns = 0; i <= rdta->sites; i++)
+ {
+ if(excludeArray[i] == -1)
+ fprintf(newFile, "%c", tr->secondaryStructureInput[i - 1]);
+ else
+ countColumns++;
+ }
+
+ assert(countColumns == excludedColumns);
+
+ fprintf(newFile,"\n");
+
+ fclose(newFile);
+ }
+
+
+ if(adef->useMultipleModel && (excludedColumns > 0))
+ {
+ char mfn[2048];
+ FILE *newFile;
+
+ strcpy(mfn, modelFileName);
+ strcat(mfn, ".");
+ strcat(mfn, excludeFileName);
+
+ newFile = myfopen(mfn, "wb");
+
+ printf("\nA partition file with analogous model assignments for non-excluded columns is printed to file %s\n", mfn);
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ {
+ boolean modelStillExists = FALSE;
+
+ for(j = 1; (j <= rdta->sites) && (!modelStillExists); j++)
+ {
+ if(modelList[j] == i)
+ modelStillExists = TRUE;
+ }
+
+ if(modelStillExists)
+ {
+ int k = 1;
+ int lower, upper;
+ int parts = 0;
+
+ switch(tr->partitionData[i].dataType)
+ {
+ case AA_DATA:
+ {
+ char AAmodel[1024];
+
+ strcpy(AAmodel, protModels[tr->partitionData[i].protModels]);
+ if(tr->partitionData[i].protFreqs)
+ strcat(AAmodel, "F");
+
+ fprintf(newFile, "%s, ", AAmodel);
+ }
+ break;
+ case DNA_DATA:
+ fprintf(newFile, "DNA, ");
+ break;
+ case BINARY_DATA:
+ fprintf(newFile, "BIN, ");
+ break;
+ case GENERIC_32:
+ fprintf(newFile, "MULTI, ");
+ break;
+ case GENERIC_64:
+ fprintf(newFile, "CODON, ");
+ break;
+ default:
+ assert(0);
+ }
+
+ fprintf(newFile, "%s = ", tr->partitionData[i].partitionName);
+
+ while(k <= rdta->sites)
+ {
+ if(modelList[k] == i)
+ {
+ lower = k;
+ while((modelList[k + 1] == i) && (k <= rdta->sites))
+ k++;
+ upper = k;
+
+ if(lower == upper)
+ {
+ if(parts == 0)
+ fprintf(newFile, "%d", lower);
+ else
+ fprintf(newFile, ",%d", lower);
+ }
+ else
+ {
+ if(parts == 0)
+ fprintf(newFile, "%d-%d", lower, upper);
+ else
+ fprintf(newFile, ",%d-%d", lower, upper);
+ }
+ parts++;
+ }
+ k++;
+ }
+ fprintf(newFile, "\n");
+ }
+ }
+ fclose(newFile);
+ }
+
+
+ {
+ FILE *newFile;
+ char mfn[2048];
+
+
+ strcpy(mfn, seq_file);
+ strcat(mfn, ".");
+ strcat(mfn, excludeFileName);
+
+ newFile = myfopen(mfn, "wb");
+
+ printf("\nAn alignment file with excluded columns is printed to file %s\n\n\n", mfn);
+
+ fprintf(newFile, "%d %d\n", tr->mxtips, rdta->sites - excludedColumns);
+
+ for(i = 1; i <= tr->mxtips; i++)
+ {
+ unsigned char *tipI = &(rdta->y[i][1]);
+ fprintf(newFile, "%s ", tr->nameList[i]);
+
+ for(j = 0; j < rdta->sites; j++)
+ {
+ if(excludeArray[j + 1] == -1)
+ fprintf(newFile, "%c", getInverseMeaning(tr->dataVector[j + 1], tipI[j]));
+ }
+
+ fprintf(newFile, "\n");
+ }
+
+ fclose(newFile);
+ }
+
+
+ fclose(f);
+ for(i = 0; i < numberOfModels; i++)
+ free(partitions[i]);
+ free(partitions);
+ free(excludeArray);
+ free(countArray);
+ free(modelList);
+}
+
+
+void parseProteinModel(analdef *adef)
+{
+ FILE *f;
+ int doublesRead = 0;
+ int result = 0;
+ int i, j;
+ double acc = 0.0;
+
+ assert(adef->userProteinModel);
+ printf("User-defined prot mod %s\n", proteinModelFileName);
+
+ adef->externalAAMatrix = (double*)malloc(420 * sizeof(double));
+
+ f = myfopen(proteinModelFileName, "rb");
+
+
+
+ while(doublesRead < 420)
+ {
+ result = fscanf(f, "%lf", &(adef->externalAAMatrix[doublesRead++]));
+
+ if(result == EOF)
+ {
+ printf("Error protein model file must consist of exactly 420 entries \n");
+ printf("The first 400 entries are for the rates of the AA matrix, while the\n");
+ printf("last 20 should contain the empirical base frequencies\n");
+ printf("Reached End of File after %d entries\n", (doublesRead - 1));
+ exit(-1);
+ }
+ }
+
+ fclose(f);
+
+ /* CHECKS */
+ for(i = 0; i < 20; i++)
+ for(j = 0; j < 20; j++)
+ {
+ if(i != j)
+ {
+ if(adef->externalAAMatrix[i * 20 + j] != adef->externalAAMatrix[j * 20 + i])
+ {
+ printf("Error user-defined Protein model matrix must be symmetric\n");
+ printf("Entry P[%d][%d]=%f at position %d is not equal to P[%d][%d]=%f at position %d\n",
+ i, j, adef->externalAAMatrix[i * 20 + j], (i * 20 + j),
+ j, i, adef->externalAAMatrix[j * 20 + i], (j * 20 + i));
+ exit(-1);
+ }
+ }
+ }
+
+ acc = 0.0;
+
+ for(i = 400; i < 420; i++)
+ acc += adef->externalAAMatrix[i];
+
+ if((acc > 1.0 + 1.0E-6) || (acc < 1.0 - 1.0E-6))
+ {
+ printf("Base frequencies in user-defined AA substitution matrix do not sum to 1.0\n");
+ printf("the sum is %1.80f\n", acc);
+ exit(-1);
+ }
+
+}
+
+
+
+
+void parseSecondaryStructure(tree *tr, analdef *adef, int sites)
+{
+ if(adef->useSecondaryStructure)
+ {
+ FILE *f = myfopen(secondaryStructureFileName, "rb");
+
+ int
+ i,
+ k,
+ countCharacters = 0,
+ ch,
+ *characters,
+ **brackets,
+ opening,
+ closing,
+ depth,
+ numberOfSymbols,
+ numSecondaryColumns;
+
+ unsigned char bracketTypes[4][2] = {{'(', ')'}, {'<', '>'}, {'[', ']'}, {'{', '}'}};
+
+ numberOfSymbols = 4;
+
+ tr->secondaryStructureInput = (char*)malloc(sizeof(char) * sites);
+
+ while((ch = fgetc(f)) != EOF)
+ {
+ if(ch == '(' || ch == ')' || ch == '<' || ch == '>' || ch == '[' || ch == ']' || ch == '{' || ch == '}' || ch == '.')
+ countCharacters++;
+ else
+ {
+ if(!whitechar(ch))
+ {
+ printf("Secondary Structure file %s contains character %c at position %d\n", secondaryStructureFileName, ch, countCharacters + 1);
+ printf("Allowed Characters are \"( ) < > [ ] { } \" and \".\" \n");
+ errorExit(-1);
+ }
+ }
+ }
+
+ if(countCharacters != sites)
+ {
+ printf("Error: Alignment length is: %d, secondary structure file has length %d\n", sites, countCharacters);
+ errorExit(-1);
+ }
+
+ characters = (int*)malloc(sizeof(int) * countCharacters);
+
+ brackets = (int **)malloc(sizeof(int*) * numberOfSymbols);
+
+ for(k = 0; k < numberOfSymbols; k++)
+ brackets[k] = (int*)calloc(countCharacters, sizeof(int));
+
+ rewind(f);
+
+ countCharacters = 0;
+ while((ch = fgetc(f)) != EOF)
+ {
+ if(!whitechar(ch))
+ {
+ tr->secondaryStructureInput[countCharacters] = ch;
+ characters[countCharacters++] = ch;
+ }
+ }
+
+ assert(countCharacters == sites);
+
+ for(k = 0; k < numberOfSymbols; k++)
+ {
+ for(i = 0, opening = 0, closing = 0, depth = 0; i < countCharacters; i++)
+ {
+ if((characters[i] == bracketTypes[k][0] || characters[i] == bracketTypes[k][1]) &&
+ (tr->extendedDataVector[i+1] == AA_DATA || tr->extendedDataVector[i+1] == BINARY_DATA ||
+ tr->extendedDataVector[i+1] == GENERIC_32 || tr->extendedDataVector[i+1] == GENERIC_64))
+ {
+ printf("Secondary Structure only for DNA character positions \n");
+ printf("I am at position %d of the secondary structure file and this is not part of a DNA partition\n", i+1);
+ errorExit(-1);
+ }
+
+ if(characters[i] == bracketTypes[k][0])
+ {
+ depth++;
+ /*printf("%d %d\n", depth, i);*/
+ brackets[k][i] = depth;
+ opening++;
+ }
+ if(characters[i] == bracketTypes[k][1])
+ {
+ brackets[k][i] = depth;
+ /*printf("%d %d\n", depth, i); */
+ depth--;
+
+ closing++;
+ }
+
+ if(closing > opening)
+ {
+ printf("at position %d there is a closing bracket too much\n", i+1);
+ errorExit(-1);
+ }
+ }
+
+ assert(depth == 0 && countCharacters == sites);
+
+
+ if(closing != opening)
+ {
+ printf("Number of opening brackets %d should be equal to number of closing brackets %d\n", opening, closing);
+ errorExit(-1);
+ }
+ }
+
+ for(i = 0, numSecondaryColumns = 0; i < countCharacters; i++)
+ {
+ int checkSum = 0;
+
+ for(k = 0; k < numberOfSymbols; k++)
+ {
+ if(brackets[k][i] > 0)
+ {
+ checkSum++;
+
+ switch(tr->secondaryStructureModel)
+ {
+ case SEC_16:
+ case SEC_16_A:
+ case SEC_16_B:
+ case SEC_16_C:
+ case SEC_16_D:
+ case SEC_16_E:
+ case SEC_16_F:
+ case SEC_16_I:
+ case SEC_16_J:
+ case SEC_16_K:
+ tr->extendedDataVector[i+1] = SECONDARY_DATA;
+ break;
+ case SEC_6_A:
+ case SEC_6_B:
+ case SEC_6_C:
+ case SEC_6_D:
+ case SEC_6_E:
+ tr->extendedDataVector[i+1] = SECONDARY_DATA_6;
+ break;
+ case SEC_7_A:
+ case SEC_7_B:
+ case SEC_7_C:
+ case SEC_7_D:
+ case SEC_7_E:
+ case SEC_7_F:
+ tr->extendedDataVector[i+1] = SECONDARY_DATA_7;
+ break;
+ default:
+ assert(0);
+ }
+
+ numSecondaryColumns++;
+ }
+ }
+ assert(checkSum <= 1);
+ }
+
+ assert(numSecondaryColumns % 2 == 0);
+
+ /*printf("Number of secondary columns: %d merged columns: %d\n", numSecondaryColumns, numSecondaryColumns / 2);*/
+
+ tr->numberOfSecondaryColumns = numSecondaryColumns;
+ if(numSecondaryColumns > 0)
+ {
+ int model = tr->NumberOfModels;
+ int countPairs;
+ pInfo *partBuffer = (pInfo*)malloc(sizeof(pInfo) * tr->NumberOfModels);
+
+ for(i = 1; i <= sites; i++)
+ {
+ for(k = 0; k < numberOfSymbols; k++)
+ {
+ if(brackets[k][i-1] > 0)
+ tr->model[i] = model;
+ }
+
+ }
+
+ /* now make a copy of partition data */
+
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ {
+ partBuffer[i].partitionName = (char*)malloc((strlen(tr->extendedPartitionData[i].partitionName) + 1) * sizeof(char));
+ strcpy(partBuffer[i].partitionName, tr->extendedPartitionData[i].partitionName);
+ partBuffer[i].dataType = tr->extendedPartitionData[i].dataType;
+ partBuffer[i].protModels= tr->extendedPartitionData[i].protModels;
+ partBuffer[i].protFreqs= tr->extendedPartitionData[i].protFreqs;
+ }
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ free(tr->extendedPartitionData[i].partitionName);
+ free(tr->extendedPartitionData);
+
+ tr->extendedPartitionData = (pInfo*)malloc(sizeof(pInfo) * (tr->NumberOfModels + 1));
+
+ for(i = 0; i < tr->NumberOfModels; i++)
+ {
+ tr->extendedPartitionData[i].partitionName = (char*)malloc((strlen(partBuffer[i].partitionName) + 1) * sizeof(char));
+ strcpy(tr->extendedPartitionData[i].partitionName, partBuffer[i].partitionName);
+ tr->extendedPartitionData[i].dataType = partBuffer[i].dataType;
+ tr->extendedPartitionData[i].protModels= partBuffer[i].protModels;
+ tr->extendedPartitionData[i].protFreqs= partBuffer[i].protFreqs;
+ free(partBuffer[i].partitionName);
+ }
+ free(partBuffer);
+
+ tr->extendedPartitionData[i].partitionName = (char*)malloc(64 * sizeof(char));
+
+ switch(tr->secondaryStructureModel)
+ {
+ case SEC_16:
+ case SEC_16_A:
+ case SEC_16_B:
+ case SEC_16_C:
+ case SEC_16_D:
+ case SEC_16_E:
+ case SEC_16_F:
+ case SEC_16_I:
+ case SEC_16_J:
+ case SEC_16_K:
+ strcpy(tr->extendedPartitionData[i].partitionName, "SECONDARY STRUCTURE 16 STATE MODEL");
+ tr->extendedPartitionData[i].dataType = SECONDARY_DATA;
+ break;
+ case SEC_6_A:
+ case SEC_6_B:
+ case SEC_6_C:
+ case SEC_6_D:
+ case SEC_6_E:
+ strcpy(tr->extendedPartitionData[i].partitionName, "SECONDARY STRUCTURE 6 STATE MODEL");
+ tr->extendedPartitionData[i].dataType = SECONDARY_DATA_6;
+ break;
+ case SEC_7_A:
+ case SEC_7_B:
+ case SEC_7_C:
+ case SEC_7_D:
+ case SEC_7_E:
+ case SEC_7_F:
+ strcpy(tr->extendedPartitionData[i].partitionName, "SECONDARY STRUCTURE 7 STATE MODEL");
+ tr->extendedPartitionData[i].dataType = SECONDARY_DATA_7;
+ break;
+ default:
+ assert(0);
+ }
+
+ tr->extendedPartitionData[i].protModels= -1;
+ tr->extendedPartitionData[i].protFreqs= -1;
+
+ tr->NumberOfModels++;
+
+ if(adef->perGeneBranchLengths)
+ {
+ /*if(tr->NumberOfModels > NUM_BRANCHES)
+ {
+ printf("You are trying to use %d partitioned models for an individual per-gene branch length estimate.\n", tr->NumberOfModels);
+ printf("Currently only %d are allowed to improve efficiency.\n", NUM_BRANCHES);
+ printf("Note that the number of partitions has automatically been incremented by one to accomodate secondary structure models\n");
+ printf("\n");
+ printf("In order to change this please replace the line \"#define NUM_BRANCHES %d\" in file \"axml.h\" \n", NUM_BRANCHES);
+ printf("by \"#define NUM_BRANCHES %d\" and then re-compile RAxML.\n", tr->NumberOfModels);
+ exit(-1);
+ }
+ else*/
+ {
+ tr->numBranches = tr->NumberOfModels;
+ }
+ }
+
+ assert(countCharacters == sites);
+
+ tr->secondaryStructurePairs = (int*)malloc(sizeof(int) * countCharacters);
+ for(i = 0; i < countCharacters; i++)
+ tr->secondaryStructurePairs[i] = -1;
+ /*
+ for(i = 0; i < countCharacters; i++)
+ printf("%d", brackets[i]);
+ printf("\n");
+ */
+ countPairs = 0;
+
+ for(k = 0; k < numberOfSymbols; k++)
+ {
+ i = 0;
+
+
+ while(i < countCharacters)
+ {
+ int
+ j = i,
+ bracket = 0,
+ openBracket,
+ closeBracket;
+
+ while(j < countCharacters && ((bracket = brackets[k][j]) == 0))
+ {
+ i++;
+ j++;
+ }
+
+ if(j == countCharacters)
+ {
+ assert(bracket == 0);
+ break;
+ }
+
+ openBracket = j;
+ j++;
+
+ while(bracket != brackets[k][j] && j < countCharacters)
+ j++;
+ assert(j < countCharacters);
+ closeBracket = j;
+
+ assert(closeBracket < countCharacters && openBracket < countCharacters);
+
+ assert(brackets[k][closeBracket] > 0 && brackets[k][openBracket] > 0);
+
+ /*printf("%d %d %d\n", openBracket, closeBracket, bracket);*/
+ brackets[k][closeBracket] = 0;
+ brackets[k][openBracket] = 0;
+ countPairs++;
+
+ tr->secondaryStructurePairs[closeBracket] = openBracket;
+ tr->secondaryStructurePairs[openBracket] = closeBracket;
+ }
+
+ assert(i == countCharacters);
+ }
+
+ assert(countPairs == numSecondaryColumns / 2);
+
+
+ /*for(i = 0; i < countCharacters; i++)
+ printf("%d ", tr->secondaryStructurePairs[i]);
+ printf("\n");*/
+
+
+ adef->useMultipleModel = TRUE;
+
+ }
+
+
+ for(k = 0; k < numberOfSymbols; k++)
+ free(brackets[k]);
+ free(brackets);
+ free(characters);
+
+ fclose(f);
+ }
+}
diff --git a/testData/140 b/testData/140
new file mode 100644
index 0000000..3b370ce
--- /dev/null
+++ b/testData/140
@@ -0,0 +1,142 @@
+ 140 1104
+Seq1 MDFIDTESSSESDSELHRQLLL-----QQRDETDTVLRELVQGGCNRKRRASSTCTVPQTVPKAMRLYSSNFFCLTSKNRKLSALAAFKKMYNASYYEVAREYKSDKTQSYEWVLGCSPERVSALNCLKGVTEFILYD-YSALFYLEFYCSKNREGVRRLLNVDTDSILLLNPPNKRSVLAALFYQKLVLAHGD--FPDWCR-D-ILSNFELSQMIQWALDNKHHDEGSIAYHYAIHAEQDNNAKLWLQSNQQAKYVRDAATMVRHFVKGRLHSITMSEHIAAQDGWKKILVFLTFQHINFKEFISILCMWLKGPKKSCITIAGVPDSGKSMFAYSLIKFLNGSVLSFANSKSHFWLQPLTECKAALIDDVTLPCWDYVDTFLRNALDGNAICDCKHRAPVQTKCPPLLLTSNYDPRLHYLNSRIQFLLFNRVIPLYG-TQPRFYIEPADWRSFFQKYSEDLQLYDGEGELLQKLLQLQERESQLLE [...]
+Seq2 MDFITIESSQCEEEESHVQLL-------QDFEIIPKR--VAGHYTRKRRRTRSPGDPKKTVPKTIRQYNRPHKCLVSKNKTLTALAVFKELYTASFTEVTRTFKSDKTQSYEWVLGCSHIALEAVKVLIHNTEHVILD-HLGVYYVGFTVSKSREGLLRFLNIFTENVVLSNPPNKRSVLSALFFDKLVQVSGD--KPQWMI-D-IITSFELSKMIQWALDNNMYDEGAIAYNYALLADTDLNAQLWLKHNSQAKYVRDAATMCRHYRRGQMQAIGVMEHLATRG-WKRIIVFLRYQHVDHHTFINDLKYWIVNPKRSTIAIVGIPDSGKSMFGMSLIQFLDGRVLSFSNHKSHFWLQPLSETRYALVDDVTWPAWDYMDVYMRNALDGNPICDCKHRAPIQTKCPPLLLTSNYDPRERYLLSRITFMSFNRSIPCIG-GQPRFLISPADWRSFMLKFRKELDI---TGELRESLERLQRREAEILE [...]
+Seq3 MTDPNSKGFGDWCLLDISDLLDQGN-SLELFHQQECEQSEEQLQKLKRKY-LSPSPRLESIKSKRRLFDSGLELLRSSNKKATLMAKFKESFGVGFNELTRQFKSHKTCCKDWVYAVHDDLFESSKLLQQHCDYIWVR--MSLYLLCFKAGKNRGTVHKLILNVHEQQILSEPPKLRNTAAALFWYKGCMGSGAGPYPDWIAQQTILGHFDFSAMVQWAFHNHLLDEADIAYQYARLAPEDANAVAWLAHNNQAKFVRECAYMVRFYKKGQMRDMSISEWIYTKGHWSDIVKFIRYQNINFIVFLTALKEFLHSPKKNCILIYGPPNSGKSSFAMSLIRVLKGRVLSFVNSKSQFWLQPLSECKIALLDDVTDPCWIYMDTYLRNGLDGHYVSDCKYRAPTQMKFPPLLLTSNINVHGEYLHTTIKGFEFPNPFPMKADNTPQFELTDQSWKSFFTRLWTQLDLEEEFQCLSERFNALQDQLMNIYE [...]
+Seq4 MADPKGSGFGDWCILDISDLIDQGN-SLELFHQQECKQSEEQLQKLKRKC-LSPSPRLQSIKSKRRLFDSGVELLRSSNKKATLMAKFKAAFGVGFNELTRQFKSHKTCCNHWVYAVHDDLFESSKLLQQHCDYLWVR--MSLYLLCFKAGKNRGTVHKLMLNVHEQQILSEPPKLRNTAAALFWYKGCMGSGVGPYPDWIAQQTILGHFDFSQMVQWAFDNQLVDEGDIAYRYARLAPEDANAVAWLAHNSQAKFVRECAAMVRFYKKGQMRDMSMSEWIYTKGHWSDIVKFLRYQEVNFIMFLAAFKDFLHSPKKNCILIHGPPNSGKSSFAMSLIRVLKGRVLSFVNSKSQFWLQPLSECKIALIDDVTDPCWLYMDNYLRNGLDGHYVSDCKYKAPMQTKFPPLLLTSNINVHEEYLHSRIKGFAFPNPFPMKSDDTPQFELTDQSWKSFFERLWTQLELEDEFQCLSERFNALQDLLMNIYE [...]
+Seq5 MADSKGSGFGDWCILDISDLLDQGN-SRELFHQQECKQSEEQLQKLKRKY-LSPSPRLESIKSKRRLFDSGLELLRSSNQKATLLAKFKQAFGVGFNELTRQFKSYKTCCNHWVYAVHDDLFESSKLLQQHCDYIWVR--MSLYLLCFKAGKNRGTVHKLILNVHEQQILSEPPKLRNTAAALFWYKGCMGPGVGPYPEWIAQLTILGHFDLSVMVQWAFDNNLFEEADIAYGYARLAPEDSNAVAWLAHNNQAKYVRECAMMVRYYKKGQMRDMSMSEWIYTRGQWSSIVKFLRYQEINFISFLAALKDLLHSPKRNCILFHGPPNTGKSSFGMSLIKVLRGRVLSFVNSKSQFWLQPLGECKIALLDDVTDPCWVYMDQYLRNGLDGHFVSDCKYRAPMQTKFPPLILTSNINVHAEYLHSRIKGFEFKNPFPMKADNTPQFELTDQSWKSFFTRLWTHLDLEDEFQCLSERFNALQEQLMNIYE [...]
+Seq6 MADHKGSGLSEWCILDISDLLDQGN-SLELFHQQECEQSEEQLQKLKRKY-LSPSPRLQSIKSKRRLFDSGVELLRSSNTKATLMAKFKEAFGDGFNELTRQFKSYKTCCNYWVYAVHD-VYESSKLLQQHCDYIWVR--ITLYLLSFKAGKNRGTVHKLMLNVQEQQILSEPPKLRHTAAALFWYKGGMGTGTGSYPDWIAHQTILGHFDFSVMVQWAFDNNHFEEADIAYGYAKLAPEDANAVAWLAHNSQAKFVRECAAMVRFYKRGQMREMTMSEWIYTRGHWSSIVKFVRYQGINFITFLAALKDFLHSPKRNCLLIYGPPNTGKSTFAMSLIQVLKGRVLSYVNSKSQFWLQPLGDCKIALLDDVTDPCWLYMDTFLRNGLDGHVVSDCKYKAPMQIKFPPLLLTSNINLHEEYLHSRVRGFEFPNPFPMKPDNTPEFELTDQSWKSFFARLWTQLELEDEFQCLSERFNVLQDQLMNIYE [...]
+Seq7 MADSKGSGLSDWCILDVSDLLDQGN-SLELFHQQECEQSEEQLQILKRKY-LSPSPRLESIKSKRRLFDSGLELLRSSNIKATLMAKFKESFGVGFNELTRQFKSYKTCCNDWVYAVHDDLFESSKLLQQHCDYIWVR--MTLYLLCFKAGKNRGTVHKLMLNVQEQQILSEPPKLRNTAAALFWYKGGMGSGAGTYPDWIAHQTILGHFDFSAMVQWAFDNNYLEEPDIAYQYAKLAPEDSNAVAWLAHNQQAKFVRECAAMVRFYKKGQMKEMSMSEWIHTKGHWSDIVKFLRYQDVNFITFLAAFKNFLHAPKHNCILIYGPPNSGKSSFAMSLIKVLKGRVLSFVNSKSQFWLQPLGESKIALLDDVTDPCWVYIDTYLRNGLDGHFVSDCKYKAPVQIKFPPLLLTSNINVHGEYLHSRIKGFEFPHPFPMKPDNTPQFQLTDQSWKSFFERLWTQLDLEEEFQCLSERFNVLQDQLMNIYE [...]
+Seq8 MADPKGSGLGDWCILDVSDLLDQGN-SLELFHQQECKQSEEQLQILKRKY-LSPSPRLELMKSKRRLFDSGLELLRSSNRKATLMAKFKDAFGVGFNELTRQFKSYKTCCNHRVYAVHDDLFESSKLLQQHCDYIWVR--MTLYLLCFKAGKNRGTVHKLLLNVQEQQILSEPPKLRNTAAALFWYKGGMGSGAGKYPDWIAQQTVLGHFDFSVMVQWAFDNNHVDEADIAYQYARLAPEDSNAVAWLAHNSQAKFVRDCAAMVRFYKNLQMREMSMSEWIYTRGHWSSIVKFLGYQGVNFIMFLAALKNFLHAPKQNCILIHGPPNSGKSSFAMSLIKVLKGRVLSFVNSRSQFWLQPLGECKIALIDDVTDPCWLYMDTYLRNGLDGHFVSDCKYKAPVQTKFLPLLLTSNINVHEEYLHSRIKGFEFPNPFPMKSDNTPQFELTDQSWKSFFERLWTQLELEEEFQCLSERFNVLQDQLMNIYE [...]
+Seq9 MADPKGSGLDDWCIVDISELLDQGN-SRELFHQQESKESEEHLQKLKRKY-LSPSPRLESIKSKRRLFDSGLELLRASNNKAILMAKFKEAFGVGFNDLTRQFKSYKTCCNHWVYAVHDDLLESSKLLQQHCDYVWIR--MSLFLLCFKVGKNRGTVHKLMLNVHEKQILSEPPKLRNVAAALFWYKGAMGSGTGPYPDWMAHQTIVGHFDMSVMVQWAFDNNYLDEADIAYQYAKLAPEDSNAVAWLAHNNQARFVRECASMVRFYKKGQMKEMSMSEWIHTRGHWSTIAKFLRYQQVNFIMFLAALKDMLHSPKRNCILIYGPPNTGKSAFTMSLIRVLRGRVLSFVNSKSQFWLQPMSECKIALIDDVTDPCWLYMDTYLRNGLDGHYVSDCKHKAPIQTKFPALLLTSNINVHNEYLHSRIKGFEFPNPFPMKADNTPEFELTDQSWKSFFTRLWNQLELEDEFQCLSDRFNALQDQLMNIYE [...]
+Seq10 MADPKGSGLEDWCIVDISELLDQGN-SRELFHQQESKESEEQLQKLKRKY-LSPSPRLESIKSKRRLFDSGLELLRASNNKAILMAKFKEFFGVGFNDLTRQFKSYKTCCNAWVYAVHDDLLESSKLLQQHCDYIWIR--MSLFLLCFKVGKNRGTVHKLMLNVHEKQIISEPPKLRNVAAALFWYKGAMGSGAGPYPDWIAQQTIVGHFDMSAMVQWAFDNNYLDEADIAYQYAKLAPEDSNAVAWLAHNNQARYVREVASMVRFYKKGQMKEMSMSEWIHTRGHWSTIAKFLRYQQVNFIMFLAALKDMLHSPKRNCILIYGPPNTGKSAFTMSLIHVLRGRVLSFVNSKSQFWLQPMSECKIALIDDVTDPCWIYMDTYLRNGLDGHVVSDCKHKAPMQTKFPALLLTSNINVHNEYLHSRIKGFEFPNPFPMKADNTPEFELTDQSWKSFFTRLWNQLELEDEFQCLSDRFNVLQDQLMNIY [...]
+Seq11 MADPKGSGLDDWCIVDISELLDQGN-SRELFHQQECKDSEEQLQKLKRKY-ISPSPRLESIKSKRRLFDSGLELLRASNHKAILLAKFKEAFGIGFNDLTRQFKSYKTCCNDWVYAVHEDLLESSKLLQQHCDYIWIR--MSLFLLCFKAGKNRGTVHKLMLNVHEKQILSEPPKLRNVAAALFWYKGAMGSGAGPYPNWMAQQTIVGHFDLSEMIQWAFDHNYLDEADIAFQYAKLAPENSNAVAWLAHNNQARFVRECASMVRFYKKGQMKEMSMSEWIYARGHWSSIAKFLRYQQVNVIMFLAALKDMLHSPKHNCILIHGPPNTGKSAFTMSLIHVLKGRVLSFVNSKSQFWLQPMSETKIALIDDVTDPCWVYMDTYLRNGLDGHYVSDCKHKAPIQTKFPALLLTSNINVHNEYLHSRIKGFEFPNPFPMKPDNTPEFELTDQSWKSFFTRLWKQLELEDEFQCLSKRFNALQDQLMNIY [...]
+Seq12 MAESKGSGFGDWCILDISDLLDQGN-SRELFHQQECQESEEHLQKLKRKY-LSPSPRFESIKSKRRLFDSGLELLRANNNRAILMAKFKEAFGVGFYDLTRQFKSYKTCCNAWVYAVHDDLLESSKLLQQHCDYVWIR--MSLFLLCFKVGKNRGTVHKLMLNVHEKQILSEPPKLRNTAAALFWYKGCMGSGGGPYPDWIAQQTILGHFDLSEMIQWAFDNNHMDESDIAYQYAKLAPENSNAVAWLAHNNQARFVRECAAMVRFYKKGQMKEMSMSEWIYARGHWSTIAKFLRYQQVNFIMFLAALKDLLHAPKRNCILIYGPPNTGKSAFTMSLIRVLKGRVISFVNSKSQFWLQPLSECKIALLDDVTDPCWIYMDTYLRNGLDGHVVSDCKHKAPIQTKFPALLLTSNINVHNEYLHSRIQGFEFPNPFPMKADNTPQFELTDQSWKSFFTRLWQQLELEEEFQCLSERFNVLQDQLMNIY [...]
+Seq13 MADPKGSGFNDWCILDISDLLDQGN-SRELFHLQECQESEEQLQKLKRKY-LSPSPRFESIKSKRRLFDSGLELLRASNNKAILMAKFKEAFGVGFNDLTRQFKSYKTCCNAWVYAVHDDLIESSKLLQQHCDYVWIR--MSLFLVCFKAGKNRGTVHKLMLNVHEKQILSEPPKLRNVAAALFWYKGSMGSGVGSYPDWIAHQTILGHFDLSDMVQWAFDNNYLDEADIAYQYAKLAPDNSNAVAWLAHNNQAKFVRECASMVRFYKKGQMKEMSMSEWIYTKGQWSTIVQFLRYQQVNFIMFLAALKDLLHSPKRNCILFYGPPNTGKSAFTMSLIKVLKGRVLSFCNSKSQFWLQPLSECKIALLDDVTDPCWVYMDTYLRNGLDGHYVSDCKHKAPMQTKFPALLLTSNINVHNEYLHSRIKGFEFPNPFPMKADNTPQFDLTDQSWKSFFTRLWHQLDLEDEFQCLSERFNVLQDQLMNIY [...]
+Seq14 MADNKGSGLHEWCLLDVSDLISQGN-SRELFQQQELEESNALLQSLKRKY-ISPSPQLESIKTKRKLFDSGVELMRCSNLKATLLSKFKNAFGVSFVELTRQFRSNKTCCNDWVYGVNYDLFESSKLLQQHCDYIWVT--MFLYLLCFKAGKNRQTVIRLLLYVAEEQILSEPPKLRSTVSALFWYKGSSNAATGSYPKWIIEQTLIGHFDMSTMVQWAFDNDLTEEADIAFQYAKLAPDDVNATAWLAHNNQARFVRECANMVRYYKKGQMREMSMSAWIHFKGQWSTIVKFIRYQGINFISFLSALKDFLHGPKKNCLLIYGPPNTGKSAFTMSLIKVLHGRVISFVNSKSHFWLQPMSEAKIALLDDATDPCWIYMDTYLRNGLDGHLVSDCKHKAPIQIRFPPLLITSNINAMAEYLHSRLVAFEFPNPFPMKDDDTPEFELTDQSWKSFFKRLWRQLDLEDEFRCLKKRFDVLQDLLMNIY [...]
+Seq15 MADNKGSGLSDWCLLDVSDLLNQGN-SRELFQQQELEDSETLLQSLKRKY-ISPSPQLESIKSKRKLFDSGVELMRCSNLKATLLAKFKSAFGVSFAELTRQYKSNKTCCNDWVYGVNNDLFEGSKLLQQHCDYIWLT--MYLYLLCFKAGKNRHTVIRLLLHVAEEQILSEPPKLRSTVAALFWYKGSSNSGTGSYPKWIVEQTLIGHFDMSTMVQWAFDNNLTEEADIAFQYAKLAPDDVNATAWLAHNNQARFVREVAAMVRFYKKGQMREMSMSAWIHFRGHWSSIVKFIRYQGINFISFLSALKDFLHAPKKNCLLIYGPPNTGKSAFTMSLIKVLNGRVISFVNSKSHFWLQPMSECKIALLDDATDPCWVYMDTYLRNGLDGHLVSDCKHRAPMQIKFPPLLITSNINAMAEYLHSRLVAFEFPNPFPMKDDDTPEFELTDQSWKSFFTRLWTQLELEDEFRCLRERFDVLQDQLMNIY [...]
+Seq16 MADDKGSENDNWCLLDISDLVDQGN-SRELLHQQQCNDSELQVQKLKRKY-LSPSPRLESIKSKRKLFDSGLELMRCSNVKATLLCKFKLAFGVSFSELTRQYKSNKTCCNDWVYGIRDELYEGSKLLQQHCDYIWVY--MSLFLLCFKAGKNRTTVHRLLLDVQEQQILSEPPKLRSTVAALFWYKGSFGSKAGAYPQWIVQQTMVGHFELSTMVQWAFDNNLTDEADIAYKYANMAFEDVNAAAWLAHNNQARFVRECASMVRFYKRGQMREMSISEWIHHKGHWSSIVKFIRYQEINFICFLAALKDFLHSPKRNCLLIYGPPNTGKSAFTMSLIKVLGGRVISFVNSRSQFWLQPLSECKIALLDDATDPCWTYMDTYLRNGLDGHMVSDCKHKAPMQTKFPPMLVTSNINVLEEYLHSRIVGFKFPNPFPLKPDNTPEFELTDQSWKSFFERLWSQLDLEEEFQCLSERFNALQDELMNIY [...]
+Seq17 MEDNKGTGCSDWFLVDISDLIDQGN-SRELLCQQETEESEQQVQLLKRKY-FSPSPRLQSIKSKKRLFDSGLELLKSSNVKATLMGKFKDAFGVGFNELTRQYKSNKTCCKDWVYCVQDDLLEASKLLQKHCNYIWMH--MTLYLLCFNAGKSRETVCRLLLQIDDMQALLEPPRLRSVLSALFWYKGSMNPNVGTYPDWIVAQTMISHFSLSRMVQWAFDNEHLEEADIAYNYAKLAETDSNAKAFLDSNSQANFVRQCALMVRHYKRGQMRDMSMSCWIHTRGHWSEIVKFIRYQNLNFIMFLDKFRTFLKNPKRNCMCFYGPPDSGKSMFTMSLINVLKGRVLSFANSRSQFWLQPLSETKLALLDDATQECWNYIDTFLRNGVDGNYVSDIKHRAPLQIKFPPLMITTNMNILKEYLHTRIEFFEFPNKFPFDNNNKPQFHLTDQSWKSFFERLWTQLELEDEFHCLSERFTALQDKLMDIY [...]
+Seq18 MSDNKGTCCSAWLSLDISDLIDQGN-SRELFCQQESEESEQQTQLLKRKY-ISPSPQLESIKPKRRLFDSGLELLKHSNVKAVLMAKFKEAFGVGFAELTRQYKSNKTCCRDWVYAVNDDLIESSKLLLQHCAYIWLH--MCLYLLCFNVGKSRETVCRLLLQVSEVQLLSEPPKLRSVCAALFWYKGSMNPNVGAYPEWILTQTLINHFDLSTMIQFAYDHEYFDEATIAYQYAKLAETDANARAFLQSNSQARLVKECATMVRHYMRGEMKEMSMSTWIHRKGQWSDIVRFIRYQDINFIEFLTVFKAFLQNPKQNCLLFHGPPDTGKSMFTMSLISVLKGKVLSFANCKSTFWLQPIADTKLALIDDVTHVCWEYIDQYLRNGLDGNYVCDMKHRAPCQMKFPPLMLTSNIDITKDYLHSRVKSFAFNNKFPLDANHKPQFELTDQSWKSFFKRLWTQLDLEDEFQCLSARFNALQETLMDLY [...]
+Seq19 MSDDKGTGCSDWFVLDISDLIDQGN-SRELLCQQESEESEQQIHWLKRKY-ISSSPRLQCIKSKRRLFDSGLELLKCNNVKAMLLAKFKEAFGVGFMELTRQYKSSKTCCRDWVYAVQDELLESSKLLIQHCAYIWLH--MCLYLLCFNVGKSRETVLRLLLQVSEIQIIAEPPKLRSTLSALFWYKGSMNPNVGEYPEWIMTQTMINHFDLSTMVQYAYDNELSEEAEIAWHYAKLADTDANARAFLQHNSQARLVKDCAIMVRHYRRGEMKEMSMSSWIHKKGHWSDIVKFVRYQDINFIQFLDSFKSFLHNPKKSCMLIYGPPDTGKSMFTMSLIKVLKGKVLSFANYKSTFWLQPVADTKIALIDDVTYVCWDYIDQYLRNALDGGVVCDMKHRAPCQIRFPPLMLTSNIDIMKEYLRSRVQAFAFPHKFPFDSDNNPQFKLTDQSWKSFFERLWRQLELEDEFQCLSERFNALQENLMDIY [...]
+Seq20 MSDEKGTGCSEWFDLDISDLIDQGN-SRELLCQQESEESEQQIHWLKRKY-ISSSPRLQSIKSKRRLFDSGLELLKCSNVKAMLLAKFKEAFGVGYMELTRQYRSSKTCCRDWVYAVQDELLESSKLLIQHCAYIWLH--MCLYLLCFNVGKSRETVLRLLLQVSEVQIIAEPPKLRSTLSALFWYKGSMNPNVGEYPEWIMTQTMISHFDLSTMVQYAYDNELTDEAEIAYHYAKLADTDANARAFLQHNSQARLVKDCAIMVRHYRRGEMKEMSMSAWIHKKGHWSDIVKFIRYQEINFIQFLNAFKLFLHNPKKSCLLFYGPPDTGKSMFTMSLIKLLKGKVLSFANYKSTFWLQPVADTKVALIDDVTYVCWDYIEQYLRNALDGNTVCDMKHRAPCQIRFPPLMLTSNIDIMKEYLYSRIQAFAFPHKFPFDSDNKPQFKLTDQSWKSFFERLWRQLDLEDEFQCLSERFNALQENLMDIY [...]
+Seq21 MTDDNKGGCSQWCILEISDLIDQGN-SRELLCQQESEESEQQIQLLKRKY-LSSSPRLQSIKSKRRLFDSGLELLKCSNVKAMLLAKFKEAFGVGYMDLTRQYKSSKTCCRDWVYAVQDELIESSKLLLQHCAYIWLQ--MCLYLLCFNVGKSRETVSRLLLQVAEVQMLAEPPKLRSMLSALFWYKGSMNPNVGEYPEWILTQTMINHFDLSTMIQFAYDNEYLQEDEIAYHYAKLADTDANARAFLQHNSQARFVKECAIMVRHYKRGEMKEMSISTWVHRKGHWSDIVKFIRYQDINFIRFLDIFKSFLHNPKKNCILIHGPPDTGKSMFTMSLIKVLKGKVLSFANCRSNFWLQPLADTKLALIDDVTFVCWDYIDQYLRNGLDGNVVCDLKHRAPCQIKFPPLLLTSNIDVMKEYLHSRIQSFAFPNKFPFDNNNMPQFRLTDQSWKSFFERLWHQLDLEEEFQCLSERFNVLQENLMDIY [...]
+Seq22 MTDDTKGGCSDWFVLDISDLIDQGN-SRELLCQQQSEESEQQIHLLKRKY-FSSSPRLQSIKSKRRLFDSGLELLKCSNVKAMLLAKFKEAFGVGFMELTRQYKSCKTCCRDWVYAVQDELIESSKLLLQHCAYIWLQ--MCLYLLCFNVGKSRETVFRLLLQVAEVQILAEPPKLRSTLSALFWYKGSMNPNVGEYPEWIMTQTMINHFDLSTMIQYAYDNDLINEDEIAYNYAKLADTDANARAFLQHNSQARFVRECALMVRYYKRGEMKDMSISAWIHNKGHWSDIVKFVRFQDINFIRFLDVFKSFLHNPKKNCLLFYGPPDTGKSMFTMSLIKVLKGKVLSFANYKSNFWLQPLADTKIALIDDVTHVCWDYIDQYLRNGLDGNFVCDLKHRAPCQIKFPPLLLTSNMDIMKEYLHSRVHAFAFPNKFPFDSNNKPQFRLTDQSWKSFFERLWKQLDLEDEFQCLSDRFNALQENLMDIY [...]
+Seq23 MDDDKGTGCSGWFMLDVSDLINQGN-SRELLCQQQSEECEQQIQYLKRKY-FSPSPRLQSMKSKRRLFDSGLELLRCNNVKAVLLGKFKDAFGVSYNELTRQFRSNKTCCKHWVYAAKDELIDASKLLQQHCTYLWLQ--MSLYLCCFNVGKSRETVMRLLLQVNENHILSEPPKIRSMIAALFWYKGSMNPNVGEYPEWIMTQTMIHHFDLSEMIQWAYDQDYVDECTIAYQYARLADSNSNARAFLAHNSQAKYVRECAQMVRYYKRGEMRDMSISAWIHHCGHWQDIVKFLRYQGLNFIVFLDKFRTFLKNPKKNCLLICGPPDTGKSMFSMSLMKALRGQVVSFANSKSHFWLQPLADAKLALLDDATEVCWQYIDAFLRNGLDGNMVSDMKHRAPCQMKFPPLIITSNISLKKEYLHSRIYEFEFPNKFPFDANDTPLFKLTDQSWASFFKRLWTQLELEEEFQCLSERFSALQEKLMDLY [...]
+Seq24 MDDDKGTGCSTWCLLDVSDLINQGN-SRELLCQQESEECEQQIQYLKRKYNISPSPRLQSLKSKRRLFDSGLELLRCKNAKAVLLHKFKEGFGISYNELTRQFKSNKTCCKHWVYGAKEELIDASKLLQQHCSYIWLQ--MSLYLCCFNVAKSRETVVKLLLQIHENHILSEPPKNRSVPVALFWYKGSMNPNVGEYPEWIVTQTMIQHFDLSRMIQWAYDNDHLDECSIAYNYAKLADTDSNARAFLAQNSQAKHVRDCAQMVKHYKRGEMREMTISAWVHHCGQWQDIVKFLRYQGLNFIVFLDKFRTFLQNPKKNCLLIYGPPDTGKSMFTMSLMKALRGQVISFANSKSQFWLQPLADAKIALLDDATEVCWQYIDMFLRNGLDGNVVSDMKHRAPCQMKFPPLIITSNISLKKEYLHSRIYEFEFPNRFPFDSDDKPLFKLTDQSWASFFKRLWIQLGLEDEFQCLSERFSALQDKLMDLY [...]
+Seq25 MADDKGTGCSDFIY-DISDLINQGN-SRELLCQQEREESELQVQYLKRKC-FSPSPRLQSMKSKRRLFDSGLELLRSSNSRATLLSKFKDSFGVSFTELTRQYKSNKTCCHHWVYAAKDDLIDASKLLQQHCFYIWLQ--MSLYLCCFNVGKSRDTVVRLILQVHENHILSEPPKNRSIPAALFWYKGSLNSNVGEAPDWILSQTMIQHFDLSRMIQWAYDNDHIDESIIAYQYAKLADIDSNAKAFLAHNSQVKYVKECALMVRYYKRGEMKEMSISAWIHHCGNWQHIVRFIRYQNLNFIMFLDKFRTFLKNPKKNCLLIYGPPDTGKSMFAMSLIKLLKGSVVSFANSKSQFWLQPLADGKIGLLDDATDVCWQYIDSFLRNGLDGNLVSDIKHKAPCQMKFPPLIITSNINLLKEFLHSRVTQIDFPNKFPFDSDNKPLFELTDQSWASFFKRLWTQLELEDEFHCLSARFTVLQEKLMDIY [...]
+Seq26 MADDKGTGCSEWFIDNISNLLNQGN-SRDLLRQQEFEESAEQVQKLKRKY-FSPSPRLQSMKSKRRLFDSGLELMRCNNSRAKLLSKVKEYFGVGFYELARQYKSDKTCCKDWVYGVREELVESAKLLLNHCSYVWIN--MTLYLLCFNHAKSRETVGRLLLDVQLLQLICEPPKLRSVVSALYWYKGSMDSSVGAYPDWIVNQTMISHFDLSEMIQWAYDSDLTDEADIAYLYAKMANSDSNARAWLAHNNQARYLRECAQMVRHYRRGEMRDMSMSEWIHHRGHWSEIVKFIRFQEINFIIFLDAFKQFIHGPKKSCLLIHGPPDCGKSMFAMSLLKVLKGKVISFVNAKSQFWLSPLSECKIGLLDDATDPCWQYIDTYLRNGLDGNVVSDCKHKTPMQIRFPPLLITSNYNIKANFLYSRIAIFEFKHKFPFKEDGTPVFQLTDQSWKSFFERLWTQLELEDEFQCLNARFNVLQEMLMDIY [...]
+Seq27 MADDKGTGCSEWFLDNISELIDQGN-SRDLFRLQEFEESAEQVQMLKRKY-FSPSPRLQSLKSKRRLFDSGLELMRCSNSRARLLSKVKEYFGVGFYELARQYKSNKTCCRDWVYGVREELLEGAKLLLNHCSYVWIN--MSLFLLCFNNAKSRETVGRLLLDVQLLQLICEPPKLRSVVSALYWYKGSMDSSVGTYPDWIVNQTMLTHFDLSQMIQWAYDTDLTDEADIAYGYAKMAESDSNARAWLAHNSQAKFVRECAQMVRHYRRGEMRDMSISEWIHYRGHWSEIVKFIRFQEINFILFLDAFKQFLHGPKKSCLLIYGPPDCGKSMFAMSLIRVLKGRVISFVNAKSQFWLSPLAECKIGLLDDATDPCWQYIDAYLRNGIDGNIVSDCKHKTPLQIRFPPLLITSNYNIKDNYLYSRIVIFEFKHKFPFKEDGSPEFLLTDQSWKSFFKRLWSQLELEDEFQCLNDRFNALQDKLMDIY [...]
+Seq28 MADDKGTGCSEWFLDNISELIDQGN-SRDLLRQQEFEESAEQVQKLKRKY-FSPSPRLQSLKSKRRLFDSGLELMRCSNSRARLLSKVKEYFGVGFYELARQYKSDKTCCRDWVYGVREELLEGAKLLINHCSYVWIN--MSLFLLCFNNAKSRETVGRLLLDVQLLQLICEPPKLRSVVSALYWYKGSMDSSVGTYPDWIVNQTMLTHFDLSEMIQWAYDTDLTDEADIAYGYAKMAESDSNARAWLAHNSQAKFVRECAQMVRHYRRGEMRDMSISEWIHYRGHWSEIVKFIRYQGINFILFLDAFKQFLHGPKKSCLLIYGPPDCGKSMFAMSLIRVLRGRVISFVNAKSQFWLSPLAECKIGLLDDATDPCWQYIDTYLRNGIDGNIVSDCKHKTPLQIRFPPLLITSNYNIKDNYLYSRIVIFEFKHKFPFKEDGSPEFLLTDQSWKSFFKRLWNQLDLEDEFQCLNDRFNALQDKLMDIY [...]
+Seq29 MAD-KGIGCSTWCLIDISDLLDELGNPQELLCLQEREESDLQLQQLKRKY-FSPSPQLESIKSKRRLFDSGLELLQCSNARATLLSKFKAAFGVSFTELTRRYKSDNTCCRDWAYGL-QDIIEGSKLFQQHCEYIWLH--ISLYLLCFKTGKSRNTVKNLLLNVGDAQLIADPPQIRSVVAALFWYKESMNKNVGEYPEWIANQTLLSHFDLSRMIQWAYDNEYTEDSDIAYHYAKLADEDSNARAFLAHNSQAKFVRECGQMVRHYKRGEMKNMSMSAWIYTRGHWSDIVKFIRFQQINFIMFLDVFKQFLASPKRNCLLIYGAPDCGKSMFCMSLIKALKGKVISFVNARSQFWLSPLVESKIALLDDATECCWNYIDNYLRNGIDGNMVSDCKHKNPVQIRFPPLLITSNNNIMSDYLHSRIKAFEFVNKFPFKDDGSPLFELTDQSWKSFFQRLWRQLDLEDEFQCLSDRFNALQDKLMTIY [...]
+Seq30 MADNRGIGCSNWF-SDLSDLIDEQGISRDLFRQQGSEEFEQQIQDLKRKY-FSPSPRLEAIKCKKRLFDSGLELLKCSNLRATMLSKFKNSFGVGFMELCRKFNSNKTCCRDWVYGVKEELLEGCKLLQEHCGYIWLH--MSLFLLCFKTGKSRDTVVRLLLSIHKEQLLTEPPKLRSVMAVLYWYKGSMNPNIGEYPDWIVQQTMISHFELSPMVQWAYDNDYIEDSDIAYNYAKLADEDINARAFLAHNNQAKIVRDCAWMVRHYKRGEMRYMSISKWIWYKGHWSNIVKFVRFQGINFIMFLDAFKHFLLSTKKNCILFYGPSDCGKTMFCMSLIKALGGRVISFANAKSQFWLQPLTESKIAMLDDATEACWNYLDTYLRNGLDGNWVSDCKHKAPIQIRFPPLLITSNYDILKNFLVSRIKIFEFKNKFPFNEDGTPMFELTDQSWKSFFQRLWKQLDLEDEFQCLNQRFNALEDQLMDIY [...]
+Seq31 M-EGK-KSFTSPFIIDIS-FIDEGN-TAQLFAQHQALDAAQEISAVKRKL-PLTGS-------KKGKLDSGYAWVNASSEKGAKLAIFKQTYGVTFASLTRVFKSDKTCCHNWVFSASEEVIEGSKQLRQYCDFYYASGHCVLYLLDFKASKNRETVIRLFLAVPDHCILSDPPKLKHVPAALFWLKTSNQPHVGQLPNWICQQTMLNYFELRKMVQWALDHNLTDDSMIAYNYAQLAEEDENANAWLNSNSQARYLKECALMTRHFLRAQRLEMTMAKWLTRCGDWKAIIKFLKYQNVNIVNFLSMFRDFMNSPKKNCLVICGAPNTGKSIFAMSLMQFLQGKTISFANHKSHFWLQPLADCKFAVLDDATLPCWSYLDIYCRNALDGNYVCDSKHKNPVQIKIPPLLITTNYNILQEYLHSRLLFLEFNNAFPLDEEGNPQFDLNDQNWKSFFIKLRRQLDLEEEV*RLEARFEEVQEKLLELL [...]
+Seq32 M-DPNLK-GQ-SFLDDLSDLIDQGN-SAELFAQQEAFAFQEHIRTTKRKLKLSFTSQ--SNAPKRRVLDSGYNLLAASSHRAVQLAIFKEKFGISLNSLTRIFKNDKTCCSNWVFGAREELLAASQILQRVCDSIMLLGFMGLYLLEFKNAKSRDTVRHLFLQVENNDMLLEPPKIKSLPAATFWWKLRHSSAAGNLPDWIARQTSITHFDLSAMVQWAYDHNFVDEAQIAYYYARLASEDSNAAAFLRCNNQVKHVKECAQMTRYYKTAEMREMSMSKWIKKCGDWKQIINFIKYQNINFLSFLACFRDLLHSPKRNCLVIVGPPNTGKSMFVMSLMRTLKGRVLSFVNSKSHFWLQPLNAAKIAILDDATRPTWSYIDTYLRNGLDGTPVSDMKHRAPMQICFPPLIITTNVDVAKDYLHSRLMSFEFANAFPLDENGKPALILNELSWKSFFERLWNQLDLEDEFRCLQARFDAVQEQLLEIY [...]
+Seq33 M-DPNEK--VLSFIDDLSDLIDQGN-SAELFAQQEALAVQEHIRASKRKLKLSFTSH--SNAPKRRKLDSGYNFLRAGSRRATQLAIFKDKFNISFNSLTRPFKNDKTCCNNWVFGARDELLEASKLLQRHCDYLMLLGFMALYLLEFKHAKSRETIRHLFLQIEKEEMFLEPPKLKSLPAATFWWKISHSASAGELPDWIARQTSLSHFDLSQMVQWAYDHNYTDEPTIAYNYARMASEDSNAAAFLRCNSQVKFVKECAQMTKYYKTAEMREMSMGKWIKRAGDWKDIINFLKYQGINFLSFLASFKDLLHGPKRNCLVIVGPPNTGKSMFVMSLMKALKGRVLSFVNSKSHFWLQPLNAAKIAVLDDATKATWSYIDTYLRNGLDGTPVSDMKHRAPIQICFPPLLITTNVQVMKDYLHSRLMCFEFPNPFPLNEAGQPALILNELSWKSFFARLWRQLDL-EDFRCLQARFDAVQDQLIDIY [...]
+Seq34 M-DP--K-TVLDFIEDISDLIDQGN-SAELFAQQQAFDFHKDICTTKRNLKRSLTSQ--SNAPKRRLLDSGYNLLRAGSRRAAYLGVFKEKFTISFTALTRIFKNDKTCCRNWVYRAREELLEASKILQKCCDFILLLGFLALFLLEFKTAKSRETVQRLFLQVEKEDMLLEPPKLKSLPAATFWWKIQHSNNSGTLPDWIARQTMISHFSLSVMVQWAYDHNYTEESTIAYHYAKLASEDSNAAAFLKCNNQVKHVKECAQMTRYYKTAEMTEMSMGQWIKKCGDWKQICKFLKFQNVNFLSFMSALKDLLHRPKRNCMVICGPPNTGKSMFVMSFMKALQGKVLSFVNSKSHFWLQPLRGAKVAVLDDATRATWTYFDTYLRNGLDGTPVSDMKHRAPLQICFPPLVITTNVNVMQDYLHSRIVCFEFPNTFPLDEAGNPLLLIDELSWKSFFERLWTQLDLEEDFRCLEARFDAVQDQLLQVY [...]
+Seq35 MAEDKGTVSGSWYLDDDPEFISEGNSSELLHNNHMLAKDGEQIQLLKRKY-MSPSPRLALVSSKRRLFETKDKILKSKNQKATALAQFKEAFGVSFTDLTRSFISNKTCTQHWVFGPNSDILDGTGLLEPHCTFLLKCGPIILLLIEFKASKCRDTVQNLLMRVEHHQMLLEPPKIRSQLTAFFFYKKTMAGGCGKLPDWLTRLTVLSHFELSRMVQWAYDNDMLEDSEIAYYYAQHADVDSNAAAWLKTNNQAKYVRDCGNMVRLYKQQEMKNLTMSEYIYKRGDWKHIFKLLRYQDVNMIQFLTSFRDLLSCPKRQCLVIYGPPDTGKSYFLYSLISFLKGKVISFTNSKSHFWLQPLLNAKVALLDDATKACWNYMDCYMRTALDGNAVSDSKFKAPVQVRLPPLLISTNVELPLLYLHSRTMCYCFAKPCLYDDEGNPLFNLTDRHWKGFFLHLEQQLGL-SEFRCLAEHLDACQEQMLELI [...]
+Seq36 MEDKDNKYNAIDFIDDISDLIDEGN-SLALLNKQQLEDDTQQLKILKRKYFSPSSPRLQQLK-RRLVFDSGLGLLHSSNREATAYAKFKATFDVGYKELTRPFISNKSCCCSWIFGVVAEILEAAKLLQPHCEYLQIIGVTVMMLFQFYAAKCRDTVINLLLHVREWQIITNPPKHRSVAVALYFYKTSMSNVSGAMPEWIKKQTLVNHFEFATMVQWAYDNNIRDEAEVAYGYASLADDDTNAAAWLKCNNQFKYVKDCVQMVAMYKRYEMRNMTIGQWLVKCGNWKNIINFLKYQEISIVAFLTTLRYFLQGPKKNCLAIWGPPDTGKSMFCYSLIKYTQGKVVSFVNSRSQFWLQPLVDGKIGLIDDATFACYQYMDVYMRNGLDGNAVSDVKHKAPIQLKLPPLLLTSNIEVHAEYLHSRIQEYKFPNKLRLDANGNPIITITDADWKSFFSKLWKQLDL-DDFRCLVDRFDAVQDRLLGIY [...]
+Seq37 MEDKDTKANACEFIEDISELIERGN-PQALLNRQQLEEDSQLLTVLKRKYVSPSSPRLEAL-SKRRLFDSGLGILHSSNRQATALTKFKNVFGVGYKEITRPFQSNKSCCHSWVFGVVAEMLEAAKLFKVHCDYLQIIGVTVLCLFEFSSSKCRDTVQKLILNVQEHQIITDPPRHRSVPVALFFYKQSMSNTSGTMPDWLKRQTMLNHFELSHMVQWAYDNNIWDEAELAYQYACLADVEPNAAAWLKSNQQYKYVSDCAKMVRMYKKYEMQQMSMAQWIKKCGDWKKIINLLKYQEISVIAFLTSFRMFLKGPKKNCIALWGPPDTGKSMFCYSLIRYVKGKVVSYANSKSQFWLQPLTDAKLGLIDDATFPCFQFMDVYMRSALDGNEVSDCKHRVPVQIKLPPLLVTSNIDMHSEYLQSRITSFKFPHKLPLDTNGNPIFIITDTDWKCFFSKLEQQLDL-EDFRCLVERFDAVQEKLLGLY [...]
+Seq38 MAEGGERLDAGWFVVNVSDLIDN-EGHAGVLNQQLLEESEQQTAYLKRKYCTPSSPRLQAVHSKRRLFDSGI-LQGSNQEATIL-AKFKGCFGVSLKELTRPYKSSKTCCNEWVFGIREELLTASKLLQPHCDFFLADGYVCLYLITFKAAKNRETITKLFLNCYDYQLRADPPKNRSVAVALFFYKLGLSGGCGDFPPWLAKQLLVSHFELSKMVQWAYDNDHTDESEIAFHYACLADEDSNAAAWLKSNAQAKYVADCSKMVRHYKKQEMRNMSMSQWIYRCGDWTVVAKYLKYQGVSFLGFLTALRHLFEGPKKQCLLIYGPPDTGKSWFCFSLLNFLRGKVVSYQNSRSHFWLSPLADCKVGMLDDATHACFQFIDVNMRSAFDGNYVSDCKHKAPIQIKLPPMLVTSNVNLPGEYLHSRVTGFEFPRKFPIDQDGSPVFSLTHSVWKAFFKRLHHQLGLEDEFRCLARRFDVLQEVMLHHY [...]
+Seq39 M-DAVNK-G-WCFIE----FLDDRGNHLALFTQQLFSEDDQHIAALKRKYAATPSPRLHSCQSRRRLFDSGIGLLRMSNRVAASLARFKDAFSVSFSDLTRSFKSDKTCSVNWVFGAREPLLEALLVLKPQCDYFQTVRRVDIILFEFKVGKSRNTLRKQMLGLDEKLIMADPPNHRSTLAALFFYKKVLFGAAGQTPAWIAQQTILEHFDFSKMVQWAYDNQLIEESEIAYRYACEAETDANAQAWLKCTNQVKHVRDCCAMVRLYKRQEMRDMTMAQWVRKCGDWKTIAGFLRYQEVNMVLFLTALRHMFKGPKKHCLVISGPPDTGKSYFCNSLNTFLHGRVISFMNSKSQFWLQPLVDAKMGFLDDATNACWTFMDIYMRNALDGNPMQDMKHRAPLQLKLPPLLITTNVDVMHNYLHSRLQCFAFEKPMPLNNDGHPQFPLHAANWKSFFTRLAKQLGIEEEFRCLEARFDAVQEQILSLY [...]
+Seq40 M-D-NDK-YRWAFLD----FLDQGN-SLALLTSQLFEQDEQHITALKRKYVTTPSPRLNAVTSRRRLFDSGVGLLRANNVYNACLARFKEAFGVGFTDLTRSFKSNKTCSQHWVFGAPETLIEAAKQLGEQSLFLQHQKRVDLFLFQFKAEKCRLTLTKQVLGVAERLVLAEPPNCRSNLAAFYFYKKTLGKEPGSSPEWIVKQVLIEHFDFSKMVQWAYDNNYVEESEIAYNYALEAETDSNAESWLKTTSQVKYVKDCAQMVRMYKRQQMREMSNTQWIRKCGDWKVIAAFLRYQEVNLVMFLSALRNMLKGPKRHCLVITGPPDTGKSYFCTTLVSFLKGRVISFMNSKSQFWLQPLADAKIGFLDDATHTCWTYMDTYLRNALDGNPVQDMKHRASIQLKLPPLLITSNIDVMNMYLHSRLQSFEFTKRMPLDSKGQPEFVLSAANWASFFTRLAKQLGLEEEFRCLQDRFDALQEQILNLY [...]
+Seq41 MAD-KGTGCSGWYIVETNSFIDQGN-SLSLFHEQLFLSSEEQIACLKRKYAATPSPRLESVSSRRKLFDSGIGLLKSSNIYATCLSKFKTAYGCSFAELTRQFKSDKTCSPHWVFGAPEQLVEASKLLPQHCEYAQLSSKVLLFLFEFKASKNRETVRKLLLGVQECLIIAEPPKERSVLSALFFYKKVMFQGSGQLPEWVAKQTLVEHFDLSRMIQWAYDNDYAEESAIAYNYALYAEADANAEAFLKSNCQAKYVKDCATMVRLYKRQEMRDMSMSQWVKKCGDWKVIAAFLRYQEVNVVLFLAALRHLFLGPKKHCLVIYGPPDTGKSYFCTTLVGFLKGKMISFMNSKSQFWLQPLVDSKIGFLDDATTACWQYMDVFMRNALDGNPISDMKHRAPTQIKLPPLLITTNVNVQANFLHSRLQFFAFNKPMLFDDSGNPQYPLSKANWRSFFTRLGKQLGIDDEFRCLTERFDAVQDQILNLI [...]
+Seq42 MADKGTD-GNNWYIVNISNLIDQGN-SLALYNAQINEDCDNALAHLKRKYNKSPSPQLQAVHSKRRLFDSGIFLLQSSNRRATMLAKFKEWYGVSYNEITRIYKSDKSCSDNWVFRAAVEVLESSKVLKQHCTYIQVK--SALYLVQFKSAKSRETVQKLMLNIQEYQMLCDPPKLRSVPTALYFYKHAMLTESGQTPDWIAKQTLVSHFELSRMVQWAYDNNYVDECDIAYHYAMYAEEDANAAAYLKSNNQVKHVRDCSTMVRMYKRYEMRDMSMSEWIYKCGDWKPISQFLKYQGVNILSFLIVLKSFLKGPKKNCIVIHGPPDTGKSLFCYSFIKFLKGKVVSYVNRSSHFWLQPLMDCKVGFMDDATYVCWTYIDQNLRNALDGNPMCDAKHRAPQQLKLPPMLITSNIDIKQEYLHSRIQCFNFPNKMPILDDGSPMYTFTDGTWKSFFQKLGRQLELEEEFRCLVARFDALQEAILTHI [...]
+Seq43 MADKGTE-GSSWYIVNVSNLIDQGN-SLALYNAKITDDCDNAIAHLKRKYNKSPSPQLQAVNSKRRLFDSGIFLLQTSNRRATMLAKFKDWYGVSYNEITRVYKSDKSCSDNWVFRAAVEVLESSKVLQQHCTYIQVK--SALYLLQFKSAKSRETVQKLMLNIQEFQILTDPPKLRSVPTALYFYKQAMLTESGQTPDWIAKQTLVSHFELSKMVQWAYDNNLLEECDIAYHYAMYADEDANAAAYLKSNNQVKHVRDCSTMVRMYKRYEMRDMSMSEWIYKCGDWKPISQFLKYQGVNILSFLIVLKSFLKGPKKNCIVIHGPPDTGKSLFCYSLVKFLKGKVVSYVNRSSHFWLQPLMDCKVGFMDDATYVCWTYIDQNLRNALDGNPMCDAKHRAPQQLKLPPMLITSNIDVKQEYLHSRVQCFSFPNKMPFLDDGSPMYTFTDATWKSFFQKLGRQLELEEEFRCLVARFDALQEAILTHI [...]
+Seq44 MADKGTE-GSSWYFVNISNLIDQGN-SLALYNTQITDACENAIAALKRKYTKSPSPQLQAVKSKRRLFDSGICLLQDNNRRATMLAKFKDWYGVSYTEITRLYKSDKSCSDNWVFKAPVEVLESSKVLQQHCQYIQVK--SALYLLQFKSSKSRETVYKLLLNIQEFQILADPPKLRSVPAALYFYKHALLTECGQTPDWIAKQTIVSHFELSRMVQWAYDNNHLEECDIAYHYALYADEDANAAAYLKSNNQVKHVKDCSTMVRMYKRYEMREMSMSEWIHKCGDWKPISHFLKYQGVNILSFLIVLKSFLKGPKKNCILIHGPPDTGKSLFCYSLIKFLRGKVVSYVNRTSQFWLQPLMDGKIGFLDDATYVCWTYIDQNLRNALDGNPMSDAKHRAPQQLKLPPMLITSNINVKQEYLHSRVQSFEFPNKMPFLDDGSPLYTFTDATWKSFFEKLGRQLDLEEEFRCLVARFDALQEEILTHI [...]
+Seq45 MADPNKGH-SEWYVVIVSNLIDEGN-SLALYNEQLTEDCNRAILALKRKLTKTPSPRLEAVQSKRRLFDSGLGIFRSTNRKATLLAKFKEYFGVAYGDLTRPFKSDRSCCENWVCAAAEEVIEASKVMQQHCDFLQVI--YALYLVKFKTAKSRDTIMKLFLNVQEQQLMCDPPKSRSTPTALYFYRRSFGNASGPFPDWLAKLTMLDHFELAQMIQFAYDNNLTTESEIAYKYALLADSDANAAAFLKSNQQVKYVRDCYAMLRYYKRQEMKDMSISEWIWKCGNWKLIAQFLRYQEVNFISFLCALKTLFKGPKRNCLVFWGPPDTGKSYICSSLTRFMQGKVVSFMNRHSQFWLQPLQDCKLGFLDDATFQCWQYMDVNMRNALDGNHISDLKHKAPLQIKLPPLLITTNVDVENEYLKSRLVFFKFPNKLPLKENDEVLYEITDASWKCFFIKFASHLELGDEFRCERS-DALQEQI-LNLY [...]
+Seq46 MAELKGTI-NELFDNTISNLIDQGN-SHALLNAQLSEEYDKDLVTVKRKFYATPSPRLSAVQSKRRLFDSGI-LLHSNNRRAALLCKFKEKYGIPFNEITRTFKSNKSCTQNWIFACAEDLIEASKTMQNHVSYLQMI--SALYIICFKAAKSRETVVKLILNTKEEQVLCDPPKIKSMAAALYCYKKVIADTCGDFPDWIATHTVINHFKFSDMVQWAYDNDMLDEAAIAYNYACYASENENAAAFLQTNSQLKYVKECCAMVRLYKKQEMRNMTMPEWIKSCDDWKVIVRYLKYQNINFLEFLLALKLLLKGPKKMCLVIYGPPDTGKSYFCYQFIQFMRGKVVSFMNKNSHFWLMPLLDSKIGFLDDATQCCWMYLDTHMRNAFDGNAVSDVKHKNLQQIVLPPMLITTNCDVCRDYLRSRLTCFNFPNKLPLYENGEPKFKFTDNCWTSFFSKFWKHLDLDPDFSCLSARFLAQQDIQLNLI [...]
+Seq47 MADHKGTLDGSWCLIVVSNLLD------SIIQGN-SEESDRCIQELKRKLNVTPSPRLSAVASKRRLFDSGVVILRSNNIRATVLCKFKDKFGVSFNELTRSFKSDKTCTPNWVIGIREDLRDACKLLQQHVEFLEMI--SVLLLVEFKVTKNRETVLKLMLNAKEEQILCEPPKLKSTAAALYFYKKIITDTCGTLPSWVSRLTIVEHFSLSEMVQWAYDNDFTEEASVAYNYACYATENTNAAAFLASNMQVKYVKDCVAMVRMYKRQEMKSMTMSEWISKCEEWKEIVQFLKYQGVNFLEFLIALKQFFKCPKKMCIVIYGPPDTGKSMFCFKLVQFLKGQVVSYINKSSQFWLMPLQDAKIGLLDDATHNCWIYLDTYLRNAFDGNTFCDIKHKNLQQTKLPPMIITTNVNVTTDYLRSRLTCFNFPNKLPMSDKDEPLFTISDKSWTCFFRKFWNQLELDA-FCCLSTRFAAQQEIQLTLI [...]
+Seq48 MTD-RGT-NDDWYIVDISDLLDQGN-SLELFHLQEHLQNEQDLNTLKRKYLNSPSPRLESIKARKQLFDSGIEILKCSNTRSALLAKFKDTVGVSFTDLTRAYKNNKTCCSYWVWGVTSTSVDVVKVFQVQCNYMHVENKFLIVLAGFKAQKSRETVLNLVLNVQSNYIMAEPPKNRSMAAALYWYRRSMSPAVGEMPDWMAQQTLLNHFELSQMVQWAYDNGYTDESDIAYYYAILAEEDENAKAFLASNAQAKYVKDCARMVSHYKRAEMSSMSMSAWIYKRGDWKHIVKFLRFQEVEFISFMIAFKELLSGPKKNCLVIYGPPNTGKSMFCMSLLRVLKGKVISYVNSKSQFWLQPLASTKIALLDDATKPAWDYIDLFLRNALDGNPICDLKHKAPQQIKCPPLMITSNINVKADYLHSRITCFEFKQPFPFDENGQPAFSLTDINWKSFFERFWSQLDLEDELRLLNNRLDWLQEQLLTLY [...]
+Seq49 MADNKGT--NDWFLVDLSDLLDQGN-SLELFHKQESLESEQELNALKRKLLYSPSPRLETIRYRRQLFDSGLEILKASNIRAALLSRFKDTAGVSFTDLTRSYKSNKTCCGDWVWGVRENLIDSVKLLQTHCVYIQLENRFLFLLVRFKAQKSRETVIKLILPVDASYILSEPPKSRSVAAALFWYKRSMSSTVGTTLEWIAQQTLINHFELCKMVQWAYDNGHTEECKIAYYYAVLADEDENARAFLSSNSQAKYVKDCAQMVRHYLRAEMAQMSMSEWIFRKGNWKEIVRFLRFQEVEFISFMIAFKDLLCGPKKNCLLIFGPPNTGKSMFCTSLLKLLGGKVISYCNSKSQFWLQPLADAKIGLLDDATKPCWDYMDIYMRNALDGNTICDLKHRAPQQIKCPPLLITSNIDVKSDYLHSRISAFKFAHEFPFKDNGDPGFSLTDENWKSFFERFWQQLELEDELRLLSSRLDLLQEQLMNLY [...]
+Seq50 MAE--GTGSSGWFLVTIGDLIDQGN-SLELFHQQETAEVLAEIAQLKRKYCDSPSPRMQSVSVKKRLFNEAVSVFTQSSSRIAQLAIFKEAHTVSFAELTRPFKSDKTVCGDWVSGVHCALGDSLKSLRSHCMFFLYDSTSILLLLRFKSQKSRDTVTSLLLGVDHIQVMLDPPKTRSVPAALFWYKRAMVTAVGPFPEWITQQTQVNHFELSTMIQWAYDNHITEESKIAYQYALLADSDENAKAFLASNAQAKYVKDCAAMVRLYFRAEMQEMSISAWIHYRGDWKEIVRFLRFQGIEYIPFMISMKKFLKGPKKNCIVIYGPPNSGKSYFCMSLLRLMGGKVISFANSKSHFWLQPLADAKIGLLDDATKPCWDFIDTYLRNALDGNPISDCKHRAPTELKCPPLLVTTNVDVMGDYLHSRIVFLRFMNKMPLKSDGTPGYNLDDKNWKSFFTRFWETLELEEELRLLSQRLDSVQEQLLNLY [...]
+Seq51 MAE--GT-DCGGFLDTVSSLLDQGN-SLEPFQHHEATETLKSIEHLKRKYVDSPSPRLQAFAVKKRLFDEAASANTARVKH-LLL--FRQAHSVSFSELTRTFQSDKTMSWDWVADIHVSVLESLQSLRSHCVYVQYDASSLLLLLRFKAQKCRDGVKALLLGVQDLKVLLEPPKTRSVAVALFWYKRAMVSGVGPMPEWITQQTNVNHFQLSVMVQWAYDNHLQDESSIAYKYAMLAETDENARAFLASNSQAKYVRDCCNMVRLYLRAEMRQMTMSAWINYRGDWKVVVHFLRHQRVEFIPFMVKLKAFLRGPKKNCMVFYGPPNSGKSYFCMSLIRLLAGRVLSFANSRSHFWLQPLADAKLALVDDATSACWDFIDTYLRNALDGNPISDLKHKAPIEIKCPPLLITTNVDVKSDYLFSRICVFNFLQELPIR-NGTPVYELNDANWKSFFKRFWSTLELEDELRLLSQRLDSIQEELLSLY [...]
+Seq52 MAARKGTTEDGGWVLNVSDLVDQG-LSLQLFQQQELTECEEQLQQLKRKFVQSPSPQLASIKVKKQLFDSGIQLFKVRDKRAFLYSKFKSSFGISFTDLTRVYNSDKTCSSDWVYHVSDDRREAGKLLQDHCEYFFLH--CTLLLLCLFVPKCRNTLFKLCFHISNVQMLADPPKTRSPAVALYWYKKGFASGTGELPSWIAQQTLITHFDLSEMVQWAYDNDLKDESEIAYKYAALAETDENALAFLKSNNQPKHVKDCATMCRYYKKAEMKRLSMSQWIDERGDWKEVVKFLRHQGIEFILFLADFKRFLRGPKKNCLVFWGPPNTGKSMFCMSLLSFLHGVVISYVNSKSHFWLQPLTEGKMGLLDDATRPCWLYIDTYLRNALDGNTFSDCKHKAPLQLKCPPLLITTNVNVCGDYLRSRCSFFHFPQEFPLDDNGNPGFQLNDQSWASFFKRFWKHLDLED-LRLLSEALDLLQEELLSLY [...]
+Seq53 MADKKGT-DLSDWVLDISDLVDQG-LSLQLFRLQEQKESDEQLQQLKRKYIASPSPQLEAVKAKKQLFDSGIDLFRAKNSRVFALGKFKETYGLSFMDLVRVFQSDKTCSLDWVLYMNPERSEAAKLLQDHCAYIFFT--AALMLLCFKYQKSRETVMKLLFDCSAQQILAEPPKTRSTAAALYWYKKSLIAGAGAFPEWIAKQTLINHFDLSAMVQWAYDNDLYEECEIAYQYASLADTEENAAAFLKSNSQAKHVRDCATMCRYYKRAEMQRMTMSEWISRQGDWKDIVKFLRYQDLEFTSFLSAFTKFLKGPKKNCLVFWGPPNTGKSMFCMSLMQFLKGKVLSFVNSKSQFWLQPLADAKVALLDDATAPCWTYFDTFLRNALDGNPICDAKHKAPYQVKCPPLMVTTNVDVIGDYLRSRLSAFCFATEFPFKEDGSPGFCLSDQSWASFFTRFWSRLELEDELRLISKALDSIQEQLLTLY [...]
+Seq54 MADKKGT-DLSDWVLDISDLVDQG-LSLQLFRLQEQTESDEQLQQLKRKYFHSPSPRLQAVKAKKQLLDSGIDLFRAKNRRLFSLGKFKETYGLSFMDLVRVFQSDKTCSLDWVLYLHEERSEAAKLLQDHCSYVFCN--TTLMLLSFKSQKSRETVLKLLFDCKGEQFLAEPPKTRSTAAALYWYKKSVVSGTGILPEWIARQTLINHFDLSAMVQWAYDNDVYEECEIAYRYACLGETEENAAAFLKSNNQAKHVRDCATMCRYYKRAEMQRMSISEWIHRQGDWKDVVRFLRYQGLEFMEFLGAFTKFLKGPKRNCLVFWGPPNTGKSMFCMSLLRFLRGKVISYVNSKSQFWLQPLADAKVALLDDATVPCWNYFDVYLRNALDGNPVCDAKHKAPYQIKCPPLMVTTNVDVLADYLRSRLSAFCFATEFPFKEDGSPGFLLNDQSWASFFTRFWLRLELEDELRVISKALESVQEQLLTLY [...]
+Seq55 MADKKGT-DLSDWVLDISDLVDQG-LSLQLFRLQEQTESDEQLQQLKRKYLASPSPRLQSVKAKKKLWDSGIELLRSKNRRLFSLGKFKETYGLSFLDLVRVFQSDKTCSMDWVLYLNEERAEAAKLLQDHCSYVFLT--VSLMLLSFKSQKSRETVSKLLFDCRGEQFLAEPPKTRSTAAALYWYKKSTVSGAGMLPDWIAKQTLINHFDLSAMVQWAYDNDLYEECEIAYQYACCAETDENAAAFLKSNSQAKHVRDCATMCRYYKRAEMQRMSISEWIHRQGDWKEVVKFLRHQGLEFIEFLSAFTKFLKGPKKNCIVFWGPPNTGKSMFCMSLLNFLKGKVISYANSKSHFWLQPLADAKLALLDDATAPCWNYIDTFMRNALDGNPVCDAKHKAPFQIKCPPLLVTTNVDVLGDYLRSRLSAFCFAAEFPFNEDGSPGFHLNDQSWASFFERFWPRLELEDELRVISLALESVQEQLLTLY [...]
+Seq56 MADKKGT-DLSDWVLDISDLVDQG-LSLQLFRLQEQTESDDQLQQLKRKYIASPSPRLQAVKAKKQLVDSGVDLFKSKNRRLFALGKFKENFGISYMDLVRVFQSHKTCSMDWVLYLHDERSEAAKLLQDHCAYIFFT--VTLMLLAFTSQKSRETVFKLLFDCKEEQFLAEPPKTRSTAAALYWYKKSLLAGAGIFPEWIAKQTLINHFDLSVMIQWAYDNNITEESEIAYQYAMFADTDENAAAFLNSNNQAKHVRDCATMCRYYRRAEMQRMTMSEWIHKQGDWKEVVKFLRYQGLEFVEFLSAFTKFLKGPKRNCLVFWGPPNTGKSLFCMTLLKFLRGKVISYVNSRSQFWLQPLADAKVALLDDATVPCWNYFDTFLRNALDGNPVCDCKHKAPFQIKCPPLLVTTNLNVKGDYLHSRLSAFCFANEFPFKEDGSPGFNLNDQSWASFFKRFWLRLELEDELRVVSKALECVQEQLLHLY [...]
+Seq57 MVD-KGT-EESDWVLSLADFVDEG-LSLELFRQQEADREEEHLLQLKRKYIRSPSPRLESIKAKKQLKDSGLGLFKSKNQRAVLFAKFKECFGLSFTDLTRNFKSDKTCTADWVIYISEARAEAGKLLQDHCEYVFVS--CALCLLSFKAQKNRETVLKLLFGVRDCQLMAEPPRTRSAAAALYWYKRGMSNCAGQLPEWIAKQTLLGHFDLSQMVQWAYDNDLVEESEIAYQYALLGEEDENAAAFLNSNNQTKHVKDCAVMCRYYKKAERESMSMSEWIHRSELWREIVKFLRHQTVSFVSFIAAFKRFLRGPKSNCIVIWGPPNTGKSLFCMTLLKFLKGRVISFVNSKSQFWLQPLADAKIGLLDDATRPCWDYFDAYMRNALDGNPICDCKHKAPSQIKCPPLLITTNLNVMGDYLRSRLSSFCFPTEFPFHDDGSPGFILNDESWASFFARFWTHLELEDELRLLRQALDSVQEELLNLY [...]
+Seq58 ?ETKIKTLG-CSYIV-TEDFVDAGE-HLSLLQTQMRASDAQQIASLKRKYVKSPSPKLEQCKARKQLFDSGISDERSRVLYMYRR--FNDMYGVKYTDLIRAFKSDKTMSANWVYVPLLEDGKAAATLQQQCSWYFME--IQLFNVEFNAQKCRATVIKLFFNFSSKRLMADPPKLRSAPACLFWYQKVLKKVGGELPDYIHTQCALGSFELTKMVQWALDNNLTEESSIALKYAMLAEEDENAQAFLKSNNQPKLVKDCCTMVRMFQTALMRDMSISQYVDHR--WRSIVHFLRYQGVQFLSFMIDLKNLLHHPKKCTIVVCGPPNTGKSYFVLSFVKFMNGCVISFVNYGSHFWLTPLRTARIGMIDDATNSFWKYCDTYMRTLLDGNDVSDCKHRNPIQLRCPPLVITTNEDIKNDYLQTRLRFLYFNKPFPLHDRGDPVYKIESLQWASFFRKFWRHLDLLEELRLLQHRLDYTQEKILTLY [...]
+Seq59 MAMKTKREARCSYIL-VEDLVDQGN-SLSLFHAQTVEEYEGEIQSLKRKFILSPSPRLAGVKARKSLFDSGIDLFQSRQRCTHMYSKFKAVYGVSFTDITRPFKSDKTTSQHWVYYLAFDSEISAMLLRQQCQFLYID--IILFFLEYNVQKSRTTVYNWFFHYNENRMLANPPRTRNMPAALFFYHRFMGTGGGAMPEIIVNQCVVSNFELSRMVQWALDNDLQDEHMLALEYALLAESDGNARAFLKQNNQPMIVKNCSIMVRHYKTALVAKMSISQYVNKRNSWRGIVHFLRYQGQEFLPFMCKMHNFLHHPKKSTLVLCGPSDTGKSYFANGLNKFLDGHVLSFVSNGSHFWLSPLRGARCCLIDDATLTFWRYADQNMRALLDGYEISDAKHRNPMQTRAPPLIITTNEDIMRLYLQTRTMYVYFNKPFPLKGNGQPLYYIDGYTWNSFFRKFWRHLNL-EDIRLLLERLDYIQEQILTLY [...]
+Seq60 MEDLEEGGCSGWF-DSIADMFDQGN-SLELFHTQEKEETRTQIQALKRKYI-PSSPRLRAIPSRRLFEDSG-NLLQSHNRVARLLAVFKEAYGVSYKELTREYKSDKTCNPDWVYSLSEPILNAARTLQGICEYVFMQATVALLTVRFKCSKSRETVRKQMFHSDPLLCLCDPPKVQSVPAALYWYKSSMYSGTGEAPEWIKRQTMITCFDLSEMVQWAYDNNYEDESQIAFEYARTATESPNANAWLASNAQAKHVRDCATMVRHYKRAEMKAMSMSQWVWKCGTWTPISLYLASEGVEVIRFLSAMKSWLRGPKKNCLVFYGPPNTGKSLFTMSLIKFLRGRVISFANSKSHFWMQPLAEAKVVLLDDATRATWDYVDTYMRNAMDGNPLSDCKYRTPVQVKCPPMLVTTNEDVHLNYLHSRIQVFHLKEPMPIDTAGNPEYSFSNRHWKAFFEKLQKPLDLEGDFSCIHSRLAAVQEELMCMY [...]
+Seq61 MASQRGTLGGIDFIDSVSNLFDPGN-PLQLLQQQEAAEDERLVALIKRKHLTTPSPKLDAM-AKKKLFDSGVGILRAANRRVCMLARFKEVYGVSVTDLTRQFKSDKTCCKNWVFGLCEPYYITLTVLPDHCCYSHMQGGIALMLCDFKAMKNRDTVIKLIVPVSDDLIMVQPPNVRSPAAALYWVQRAQSNASGEYPSWITKQTMLSHFDFSNMVQWAYDQGYTEESKIAYHYAQLAEEDKNAMAWLSSAAQAQHVKHCAQMVRYYMQAQTAEMTMAQYIHERGDWKHIIAFLRHQDIEIIPWLRTTRDWLKGPKRNCVCYHGPPDTGKSMFGMSLMRFMRGAIISYVNSRSHFWLQPLVSAKVAMLDDATDACWQYIDTNLRNLLDGNPLSDLKHRAPIQATCPPLLITSNIDITQDYLRSRVKCFAFHCPLPVGEDGMPTLVLTEASWKCFFRKFASTLEV-EEFRCLRDRLDVLQDQILGHY [...]
+Seq62 MTDKSG---E-YFLLNGVDFIDG-NTLAEYNRKEADRHKRDLEQLKRRHVR-RPVGGSPSSSSKRRCL-GL-ELLRSANRQATFLGKFKDTYGISFTELTRPFKSDKTCCEDWVYGISGPLYEGAKLLEGHVIYMQLTGLLLLMLLRLKHAKSRATLRRLLFNISEMQLLAEPPKTRSVPVALFWYKGTLSSLSGTCPEWIHRQTLINHFDLSSMIQWAYDYDYDDECTIAYQYARLAETDANANAWLNSPAQARYVKDTATMVKYYKRAQMREMTMGEWIKHRGDWKKIVQFIRFQGIEFPLFFGALKKFLHGPKHNCIVIWGPPDTGKSMFCMNLIKLLGGKIISFANSKSQFWLQPLADAKVGLLDDATGVCWDYIDQYLRNALDGNPISDLKHRAPTQMKCPPLLITTNLDITANYLVSRVACFKFSEPFPFTDRDTPTYPLTECNWKALLERLWKQLDFQEEFKCLLDRLNAVQSKILDLY [...]
+Seq63 MEGDIDSSRGGGFILDLINDSLQGN-SQALLHQQIMREDNRQVQDLKRKYVSPKSPRLRAISAKRRLFDSGLELMRASNQRATQLALFKKGYGISLTELTRVFKSNKTCNPDWVFGVHHNTYSDLVRLEKHCEYVQCSGYIVLMLLRFTAHKNRNTLIKLMLSVSDIQILADPPKIRSVPAALYWYRNSMSTAVGPLPDWVARQTLVQHFVLSTMVQWAYDNDHTEESDIAYHYALLADEDTNAAAWLGTNSQAKHVRDCAVMVKHYRRAIMSAMSMSEWI-NRGDWKNIGNFLRYQGIEVITFIGALRDMLKGPKRTCMCIVGPPDTGKSAFCLSLLDFFGGRVLSFTNYKSQFWLQPLADTRIALIDDATKSTWDYIDEYMRNALDGNAICDLKHKNPLQIRCPPLLITSNINIKHNYLYSRIHIVEFKHAFPFNEEGEPVYQLTKGNWKSFFKRLWLRLDLEDEFRCLKKRLDAIQDELLTIY [...]
+Seq64 MDAEEAGEGSSWFLQE--DLIDQGN-SLLLFQQQEAQADEQHLSVFKRKYC-SPSPRLGAIQVKRRLFDSGLDILRSANRKATMLGLFKDAFGVPYGELTRQFRSDKTGCFDWVYAVREPFFESGKQLRQHCRYTHVTGTVLLMLVSFNNQKCRDTVNKLIFNVHELLLMLEPPKIRSVAAAMYWYKQSLTNATGELPEWIKKLILINHFDFSQFVQWAYDNEYQEEHEIAYNYASIADEDSNAAAWLGLTGQAKVVKDVATMVRYYRRAEMNRMSMSNWIHNRGQWQPIVNFLKYQGVAMVTFINALKSFLKGPKKNCLVIWGPPNTGKSWFCMSLMHFLGGRVLSHVNSNSHFWLQPLGDAKVALLDDATTVVWDYFDRYMRNACDGNPISDMKHKAPVQIKCPPLLITSNIDVKADYLHSRLVTFHFPNLFPFEDDGSPVYQFNDENWNSLFTRLWRALDLEDEFRCLRQRLDAVQEKLMNLL [...]
+Seq65 MDEKPGS-GGTSFILSDEDLVDRGN-HLELFQTQEKEAGEKQISILKRKFCLSPPGLAGIRVVRRRLFTGGPDVVQDFNMAATIQKLFKTLYVATFGEITRIFQSNKTNNQQWVYGVPELLYTASFLLNNHCNYLLANGSLSLYLAVFNVGKSRDTVCKLVLNTTNQNLLLQPPKVRGLCSALFWYKLSLSPATGSTPDWIQQQTNVANFDFGTMVQWAYDHHLTEECKIAYQYAKCAGSDVNAKAFLASTNQARLVKDCCTMVRHYLRAEEQALSISAYIKKRGSWLSIMNLLKFHGIEPIHFVNALNPWLKGPKHNCIAIVGPPNSGKSLLCNSLITFLGGKVLTFANHSSHFWLAPLSDCRVALIDDATHACWRYFDTYLRNVLDGYPVCDRKHKSAVQMKAPPLLLTSNIDVHADYLQSRVKSFYFTEPCCASDNGEPLFVITDADWRNFFERLWERLDLEDELTCASEHLLAAQETQMTLI [...]
+Seq66 MAETAGS-GGGAYICSDEDLVDPGN-HLELFQTQEKEAGERQISLLKRKFCLSPPGLAGIRVVRRRLFDAGGRPASDGNMAAVMHKLFKTLYIAGFGEITRVFQSDKTNNNQWVHGASEVLYAASFILSKHCSYLQASGSMSLFLAVFNVGKSRETVRKLILNTPCSRLLLQPPKIRGLCPALFWFKLGLSPATGTTPDWIKQQTNVAYFDFGTMVQWAYDHRLTEECKIAYQYAKCAGTDLNAKAFLASTNQARLVKDCCTMVKHYLRAEEQSLTISAFIKRRGSWLSIMNLLKFQGIEPINFVNALKPWLKGPKHNCIAIVGPPNSGKSLLCNTLMSFLGGKVLTFANHSSHFWLAPLTDCRVALIDDATHACWRYFDTYLRNVLDGYPVCDRKHKSAVQLKAPPLLLTSNIDVHADYLQSRVKTFYFKEPCPASDTGEPLFFITDADWKNFFERLWERLDLEDEFTCASDHLLAAQETQMQCI [...]
+Seq67 MDKENAG-GGDSFILD-EDLLDPGN-HLELFQTQEKEAGERQISILKRKLCLSPSWACCHKVVRRRLFDPGGASSAEPNMAACIQKLFKTLYIASHGEITRVFQSNKTVNHQWVYGVSEVLYSASFLFGKQCNCLQTSGSISVYRCMFNVAKSRDTVQKLMLNVTAGNLLLQPPKIRGLGPALFWFKLTLSPATGTTPEWIQQATNVASFDLGTMVQWAYDHGFTEESKIAYEYALCAGSDCNAKAFLASTSQARLVKDCCTMVRHYLRAEVQALTMSGYIKRRGSWLSIMNLLKYHGIEHIQFVNALKPWLKGPKYNCITIVGPPNSGKSLLCNSLIAFLGGKVLTFANHHSHFWLAPLADCRVALIDDATTACWRYFDTHLRNVLDGYPFGDRKHNTAVQMKAPPLLVTSNIDVHAEYSHSRVKPFYFKEPCPASDNGEPMFSITDADWKHFFERLWGRLDLEDEVTCAKEQLLAAQETQMTLI [...]
+Seq68 MSDEPGSGKGSEFILDLEDFVDQGN-HRELFQTQEKEAGEKAIQKLKRKLALSPPGLAAISLVKRRLFIGPKAILKSKNSAACKLKLFKTIFACSFCDLTRVFQSNKTTNLQWVYGPSETMYEASFLLKKACSYVLAVGTIALILACFNNAKSRDTVQKLFLNVHHEQLLMQPPKIRGVCAALFWFRLTFSPATGTLPQWIRTQTIAAEFDFGTMVQWAYDNSYCEESKIAYEYAMLANCDTNAKAFLASNNQAKMVKDCATMVRHYKRAEVQAMTMSEYIKRRGSWLPIMNLFKFQGIEPIRFVNSMRQWLRGPKKNCICIVGPPNSGKSLLCNSLISFLGGRVLTFAMHKSHFWLAPLSEARVALIDDATYACWKYFDTYLRNALDGYPICDRKHKTAVQMKAPPLLVTSNIDVHADYLHSRIVSFYFKETCT-TANGEPMFSITNADWKIFFERLWGRLELEEEFACASERLRAAQEQQMLLI [...]
+Seq69 MSDEPGSGKGSEFILDLEDFVDQGN-HRELFQIQEKEAGDKAIQKLKRKLALSPPGLAAITLVKRRLFIGPKAILKSKNSAACKLKLFKTIFACSYSDLTRVFQSNKTTNLQWVYGPSETMFEASFLLKKACSYLLSVGTVALFLACFNNAKSRDTVRKLFLNVHPEQLLMQPPKIRGVCAALFWFRLTFSPATGTLPQWIRTQTIAAEFDFGTMVQWAYDNSYCEESKIAYEYAMLANCDSNAKAFLASNNQAKMVKDCATMVRHYKRAEVQAMSISEYIKRRGSWLPIMNLFKFQGIEPIRFVNSMRQWLRGPKKNCICIVGPPNSGKSLLCNSLISFLGGRVLTFAMHKSHFWLAPLSEARVALIDDATYACWKYFDTYLRNALDGYPICDRKHKTAVQMKAPPLLVTSNIDVHADYLHSRIVSFYFKETCT-TANGEPMFSITNADWKIFFERLWGRLELEEEFTCASDRLRAAQEQQMLLI [...]
+Seq70 MANDKGSALGCSYLLQDEDFLDQGN-HLEVFQALEKKAGEEQLLNLKRKV-LGSSEASETPGAKRRLFENEANLVKSKNATVFKLGLFKSLFLCSFHDLTRLFKNDKTTNQQWVFGIAEVFFEASLLLKKQCSFVQMQGTCAVYLLCFNTAKSRETVRNLMLNVREECLLMQPPKIRGLSAALFWFKSSLSPATGALPEWIRAQTTL-HFDFGTMVQWAYDHKYAEESKIAYEYALAAGSDSNARAFLATNSQAKHVKDCATMVRHYLRAETQALSMPAYIKTRGSWKSILTFFNYQNIELITFINALKLWLNGPKKNCLAFIGPPKTGKSMLCNSLIHFLGGSVLSFANHKSHFWLASLADARAALVDDATHACWRYFDTYLRNALDGYPVSDRKHKAAVQIKAPPLLVTSNIDVQAEYLHSRVQTFRFEQPCT-DESGEQPFTITDADWKSFFVRLWGRLDLEEDFTCACERLHVAQETQMQLI [...]
+Seq71 MANDKGSGLGCSYLLQDEDFVDQGN-HLEVFQALEKKAGEEQILNLKRKV-LGSSEASETPGAKRRLFE?EANLVKSKNATVFKLGLFKSLFLCSFHDITRLFKNDKTTNQQWVFGLAEVFFEASFLLKKQCSFLQMQGTCAVYLICFNTAKSRETVRNLMLNVREECLMLQPAKIRGLSAALFWFKSSLSPATGALPEWIRAQTTL-NFDFGTMVQWAYDHKYAEESKIAYEYALAAGSDSNARAFLATNSQAKHVKDCATMVRHYLRAETQALSMPAYIKARGSWKSILTFFNYQNIELITFINALKLWLKGPKKNCLAFIGPPNTGKSMLCNSLIHFLGGSVLSFANHKSHFWLASLADTRAALVDDATHACWRYFDTYLRNALDGYPVSDRKHKAAVQIKAPPLLVTSNIDVQAEYLHSRVQTFRFEQPCT-DESGEQPFNITDADWKSFFVRLWGRLDLEEDFTCACERLHVAQETQMQLI [...]
+Seq72 MADKSGRGGCSFVLDFDAEFIDQGN-TLALFQSQVAQAGKQKVNYLKRKLHLESRAVLQPVAAKRRLFCSSSEILKSKNSAACKLAVFKFVYAASFCDLTRPFKNDKTTNYQWVFGVSEELFEASKLLGRSCTYLHATGSVALLLLSFHVAKSRETVTNLLLNLRAEHMMLQPPKLRGVTSAMFWYKMTLSPNTGQLPRWIEQQILITEFDFSHMVQWALDNEMMDESSIAFHYAQMADHDSNARAWLGLSNQAKIVKDVCTMVHHYQRAIMRSMTMSAYVHKMGSWLVIMQFLKFHGIEPIRFVNALRPWLQGPKKNCLAFIGPPDTGKSLFTNSLMSFLKGKVLNFANSASHFWLAPLTEAKVALIDDATHACLKYCDTYLRNFFDGYSVCDRKHKNAVQIKAPPMLLTSNIDIQAEYLKSRVTCFYFNDKCPLNEDGKPLFQITDPDWKSFFERLWQRLELEEEFICAAERLSAAQETQMTLL [...]
+Seq73 MDNTPGTGSSDWVLLDLVDFIDDSDFYRRLQVEQQREDDQRAAHVLKRKFLDSPSPRLEAIRARRKLYDSGHGLMQAGKPRNVLLALCKDAYGCSFSDLTRSYKSDKTVCGDWVAGVPCSLEEAITLLKPHSDYTHVNGLLLLLLVRWKTAKCRETVQKLLMSVEKHQMVLEPPKIRHPATAMFWYKRTLANASGETPEWILKQVSLQEFSLSAMVQWAYDNGLEGESEIAYGYAQLAEEDTNAEAFLRSNAQAKHVKDCAIMVRHYRRAEMCKMNIAQWIKLRGDWRPIMKFLKFQKVEILAFLTFMRHFLRGPKRNCMVLLGPPNTGKSLFGMSLMHFLGGKIISHVNSGSHFWLQPLLECKVAMLDDATTSTWDYMDIYLRNMLDGNTVCDAKHKAPMQLKCPPLIVTTNVDVTANYLHSRLKVFTFPNLCPLNCRGDPEFQLTPENWKAFLEKCWTSLGLDLLLRCLCSRLDVLQEQQMELI [...]
+Seq74 MASQDSTGSGG-FILEYMDFIDDDGGRHLNALLAEDDARAVQAVMSKIGH-SREYSSGGKESSHRKRTRSPDSLIRSGKARAAMLGIFKDSFGVRFTDITRHFKSDKTVCRDWVVGVACSVSDAVPLVRPHTVYSHTTGNMALGLVRWKTAKCRDTCCKLLLTVENKQLLLEPPQTQNAGAALFWYKKSISRGSGETLEWIARQVSLSSFCLSRMVQWAYDCGYTNESTIAYEYAKLADDDSNAEAFLKCNNQARYVRDCCKMVTLYARAEMAKMSMNEWIGRRKEWRVIVQFLKVQKVEFIPFLMQFRKFLKGPKNNCLCFYGPSNTGKSMFCMSLLEFLKGRTISYVNSKSQFWLQPLGDSKIGLLDDATLPVWDYMDVYLRNLLDGNVFCDAKHKAPSQIKAPPLLVTSNYNIKEYYLVNRVHVITFPTVCQTDYKGDVSVKLESHHWKSFFRRWWPLLDSNDGFRCLTNHLDVLQETQMEIF [...]
+Seq75 MAHAEGTGAGGWFVVDLVDFI-Q-EVPLELFVQQTANDDAAAVQALKRKFVGSPSPRLDAIKARRRLFDSGYGLFKGSNVRAAILSKFKDLFGLSFYDLVRQFKSDKSICGDWVFGVYYAVAEAVKLIQPQCIYAHIQGMVVLLLVRFKCGKSRETVAKYMLNVPEKHMLIEPPKIRSGPCALYWYRTAMGNACGETPEWIVRQTVVGHFSLSVLVQWAYDNDIQDESDLAYEYAKLGNEDANAAAFLASNCQAKYIKDAMTMCRLYRRAEQARMSMAQWIVHRGDWKHIVKFLRYQRVEFITFISAFKLFLKGPKKSCLVFYGPSDTGKSLFCMSLLQYLGGAVISFVNSSSHFWLSPLADAKIGLLDDATGQCWTYIDVYLRSILDGNPISDRKHRTLTQLKCPPLMITTNVDPLADYLRSRITVFKFMNKCPVTASGEPVYTLNNETWKSFFQRSWARLELEEEFRCLADRLDACQEMLIDLY [...]
+Seq76 MADTEGTGAGGWFMVDLVDFI-Q-EVPLELFVQQTAEDDAAIVQAVKRKFVCSPSPRLDAIKARRRLFDSGYGLFKCSNVRAAILSKFKDLFGLSFYDLVRQFKSDKSICGDWVFGVYYAVAEAVKLLQPQCLYAHIQGMVVLMLLRFKCGKSRETVAKYMLNVPEKHMLIEPPKIRSGPCALYWYRTAMGNASGETPEWIVRQTVVGHFSLSMLVQWAYDNDIQEESDLAYGYAQLGNTDPNAAAFLASNCQAKYIKDAMTMCRLYRRAEQSRMSMAQWIAHRGDWKHIVKFLKYQNVEFISFISAFKLFLKGPKKSCLVFYGPSDTGKSLFCMSLLQYLGGAVISYVNSSSHFWLSPLADAKIGLLDDATAQCWTYIDVYLRSILDGNPTSDRKHRTLTQLKCPPLMITTNVDPLADYQRSRITVFKFLNKCPVTNSGELVYTLNNETWKSFFQRSWARLELEEEFRCLADRLDACQETLIDLY [...]
+Seq77 MADHEGTRAGGWFLVDLVDFIVQ-EVPLALYVHQNAQDDAAAVQALKRKFTYSPSPRLDAIKARRRLFDSGYGLLKASNIRATILSKFKELFGLSYYDLVRQFKSDKSTCGDWVFGVYHAVAEAVKLLQPHCVYAHIQGMVVLALVRFKCGKNRESVAHCMLNIPDRHMLIEPPKIRSGPCALYWYRTAMGNASGETPEWIVRQTVIGEFSLSTLVQWAYDNDITDESQLAYEYALLGNEDPNAAAFLASNCQAKYIKDAITMCKHYKRAEQARMTMAQWIKYRGDWRHIVKFLRYQNVEFITFMSAFKHFLKGPKKSCMVFYGPSDTGKSLFCMSLLHYLGGAVISFVNSSSHFWLSPLVDAKVGLLDDATMQCWTYIDVYLRSILDGNAISDRKHRNLTQLKCPPLMITTNVDPLADYLKSRIVVFRFLNKCPMNANGEPVYTLNNETWKSFFQRSWARLDLEEEFRCLADRLDACQERLIDLY [...]
+Seq78 MADAEGTGAGGWFMVDLVDFIDQ-EVPLELFVQQNARDDAAAVQALKRKYTYSPSPRLNAIRARRRLFDSGYGLLKTSNLRATLLSKFKELYGLAFGELVRQFKSDKSVCGDWVFGVYHAVAEAIKLIQPVCLYAHIQGMVILMLIRFKCSKSRETVAKCILNVPDKQMLIEPPKIRSAPCALYWFRTAMGNASGETPEWITRQTVVGHFSLSVLVQWAYDNDIVDESDLAYQYALLGNDDPNAAAFLASNCQAKYIKDAITMCKYYKRAEQKRMSMAQWIAHRGDWRPIVRFLRYQKIEFVTFMSALKMFLRNPKKSCIVIYGPSDTGKSLFCMSLLKFLGGAVISYVNSTSHFWLSPLTDAKVGLLDDATYPCWVYIDTHLRSVLDGNQISDRKHKNLTQIQCPPLFITTNINPLEDYLHSRIAVFHFMYKCPLDDKGDPVYQFNNENWKSFFQRSWAQIEGEEEFRCLADRLDACQEKLIDFY [...]
+Seq79 MADVEGTRAGGWFMVDLVDFI-Q-EVPLELFVQQNAQDDAAAVHALKRKYIHSPSPRLDAIRARRRLFDSGYGLLKARNLRATLLSKFKELYGLAFGELVRQFKSDKSTCTDWVFGVYYAVAEALKLIQPLCHYAHIQGMVQLMLIRFKCGKSRDTVAHCILNVSEKQMLIEPPKIKSTPCALYWYKTAMGNASGETPEWIVRQTVVGHFSLSVMVQWAYDHDITDESQLAYEYALLGHEDPNAAAFLASNCQARYIKDAITMCRHYKRAEQARMSMAQWIAHRGDWKPIVRFLRFQKIEFMTFMGAFKMWLKGPKRSCIVIHGPSDTGKSLFCMSLVQFLGGAVISYVNASSHFWLSPLADAKVGLLDDATHPCWVYIDTHLRSVVDGNLISDRKHRNLAQLKCPPLLITTNINPLEDYLHSRMAVFSFMYKCPLDDNGDPVYKFNNENWKSFFQRSWARLEVEEEFRCLADRLDACQETLIDLY [...]
+Seq80 MADSEGTRAGGWFLVDLVDFI-Q-EVPLALFVQQNAQDDAATVQALKRKYTCSPSPRLDAIRARRRLFDSGYGVFKVSNLKAKLLYKFKDLFGLAFGELVRNFKSDKSICGDWVFGVYHAVAEAVKLIQPICVYAHIQGMVILMLVRYKCGKSRETVAHSMLNIPERQMLIEPPKIRSAPCALYWYRTAMGNASGETPEWIVRQTVVGHFSLSMLVQWAYDNDITDESVLAYEYALLGNEDPNAAAFLASNCQAKYIKDAITMCKHYRRAEQAKMTMAQWITHRGDWKAIVKYLRYQQVEFVPFISALKLFLKGPKKSCMVFYGPSDTGKSLFCMSLLNFLGGAVISYVNSSSHFWLSPLADTKVGLLDDATYQCWQYIDTYLRTVLDGNAISDRKHRNLTQLKCPPLMITTNINPLEDYLHSRIVVFQFLHKCPLNSNGDPVYTLNNENWKSFFRRSWARIEGEEEFRCLADRLDACQEKLLDLY [...]
+Seq81 MANCEGTRAGGWFLVDLVDFI-Q-EVPLDLFVQQNARDDAATVQALKRKYTCSPSPRLDAIRARRRLFDSGYGIFKVSNLRVTLLHKFKELFGLAYGDLVRQFKSDKSICGDWVFGVYHAVAEAVKLIQPICLYAHIQGMVILMLVRYKCGKSRETVAHSMLNIPEKQMLIEPPKIRSGPCALYWYRTAMGNGSGETPEWIVRQTVVGHFSLSTLVQWAYDNDITDESELAYDYAMLGNEDPNAAAFLASNCQAKYIKDAITMCKHYKRAEQARMSMTQWIAHRGD*E?IVKYLRYQRVEFVTFMGALKLFLKGPKKSCMVFYGPSDTGKSLFCMSLLKYLGGAVISYVNSGSHFWLSPLVDAKVGLLDDATYQCWQYIDTYLRTVLDGNAISDRKHRNLTQLKCPPLMITTNINPLEDYLHSRIVLFKFMHKCPLKSNGDPVYTLNNENWKSFFQRSWARIEGQEEFRCLADRLDACQEKLLDLY [...]
+Seq82 MAESPEGGAGGWFVVDMVDFVDQ-EVPLGLYVQQTMQDDAATVQALKRKFMGSPSPRLDAIKARRRLFDSGYGLLQVSNLRVQLFTKFKELFGLSFKDLVRQFKSDRSTCAEWVFGVYYAVAEAAKLLQPVCEYAHIQGMVMLLLLRFKCNKSRETVAKCLLNIPEKRMLIEPPKQRSAPCALYWYKTAMGNASGDTPDWIVRQTVIGHFSLSVLVQWAYDNEITDDSELAYEYAKLGNEDPNAAAFLASNCQARYIKDAITMVRHYRRAEQARMSMSQWIAHRGDWRHIVKLLRFQGIEFISFMEALKQFLKGPKKSCLVFYGPSDTGKSLFCMSLLRYLGGAVISFVNSTSHFWLSPLVDAKIGLLDDATQQCWVYIDTYLRTVLDGNTMSDRKHKNLQQLKCPPLMITTNVNIAADYLRSRMVVFPFLQKCPLDSNGEPVYKLNNENWKSFFQRSWARLDLEEEFRCLADRLDACQDKLLDLY [...]
+Seq83 MAESPEGGQG?WFVVDMVDFIDQ-EVPLDLYVQQTMQDDAATVHALKRKYIGSPSPRLDAIKARRRLFDSGYGLLQVSNLRVKLLAKFKELFGLSFMDLVRQFKSNKSTCGDWVFGVYYAVAEAAKLLQPVCDYAHIQGMVMLLLLRFKCNKSRETVAHCILNIPEKRMLIEPPKQRSGPCALYWYKTAMSNASGETPDWIVRQTVIGHFSLSVLVQWAYDNEITEESELAYEYAQLGNEDANAAAFLASNCQARYIKDAITMVRHYRRAEQARMTMSQWIAYRGDWRHIVKLLRYQGIEFISFMEALKHFLKGPKKSCLVFYGPSDTGKSMFCMSLLKYLGGAVISFVNSTSHFWLSPLVDTKIGLLDDATQQCWVYMDTYLRTVLDGNTMSDRKHKTLQQLKCPPLMITTNVNIEADYLRSRMVVFPFLHKCPLDSNGDPVYKLNNENWKSFFQRSWARLDLEEEFRCLAARLDACQDKLIDLY [...]
+Seq84 MAESPEGGAGGWFVVDLVDFIDQ-EVPLDLYVQQTIQDDAATVQALKRKFMGSPSPRLNAIKARRRLFDSGYGLLQVSNLRVKLLGKFKELFGLSFMDLVRQFKSNKSTCGDWVFGVYHAVAEAAKLLQPVCEYAHIQGMVMLLLLRFKCNKSRETVAHCILNVPEKRMLIEPPKQRSGPCALYWYRTAMGNACGETPDWIVRQTVIGHFSLSKLVQWAYDNDITDESELAYEYAQLGTEEPNAAAFLASNCQARYIKDAMIMCRHYRRAEQTRMSMSQWITYRGDWRHIVKLLRYQGIEFISFMTALKQFLKGPKKGCLVFYGPSDTGKSLFCMSLINYLGGTVISFVNSTSHFWLSPLADAKIGLLDDATYQCWIYMDTYLRSVLDGNVISDRKHKNLVQLKCPPLLITTNINPETDYLRSRMVIFPFLNKCPLDANGDPVYQLNNENWKSFFRRSWARLDLEEEFRCLADRLDACQEKLIDLY [...]
+Seq85 MADGEGTSACGWFWVDMVDFIDQ-EVALELYRQQEAQDDEAFVQALKRKYLASPSPRLDAIKAKRRLFDSGYGLLKTSNLRATLLGKFKDIYGLSFMELARQFKSNRTTCLDWVFGVYCTVAEGVKLIQQHCQYAHIQGMVVLMLVRYNCAKNRDTVAKCMLNIPEQHMLIEPPKIRHPAAALYWYKAGMGNASGETPEWIVRQTVVGHFQLSVMVQWAYDHDITDESILAYEYAKLADVDGNAAAFLASNCQAKYVKDACTMCRHYKRAEQAQMTMSEWIRFRGDWRPIVRFLRHHDIEFITFVISLKNFLKGPKKCCIVIYGPADTGKSYFCMSLLRFLGGVVISYANSTSHFWLQPLCDAKIGLIDDVTPQCWSYIDTYLRNALDGNQVCDRKHRPLLQLKCPPLLMTTNTNPLEEFLRSRLQLFTFPNAFPVNQKGDPLYTLNDANWKCFFQRLWARLDLDDQFKCLASRLNACREKLLELY [...]
+Seq86 MADCEGTGACGWFWVDMVDFIDQ-EGAPELYRQQEVQDDEAIVQALKRKYIASPSPRLDAIRAKRRLFDSGYGLLKTSNLRATLLGKFKDLFGLSYMELVRQFKSNKTTCIDWVFGVYCTVAEGVKLIQQHCQYAHIQGMVVLMLVRYNCAKNRDTVAKCMLNIPEHHMLIEPPKIRYPPAALYWYKAGMGNASGETPEWIVRQTVVGHFQLSVMVQWAYDHDITDESILAYEYARLADVDGNAAAFLASNCQAKYVKDACTMCRHYKRAEQAQMTMSQWITFRGDWRPIVRYLRHQDIEFISFVIALKNLLKGPKKCCIVIYGPADTGKSYFCMSLLRFLGGVVISYANSSSHFWLQPLCDAKIGLIDDVTPQCWSYIDTYLRNALDGNQVCDRKHRPLLQLKCPPLLMTTNTNPLEEFLRSRLQMFTFKNAFPVNQKGDPLYILNDANWKCFFQRLWARLDLQDDFRCLASRLDVCQEKLLELY [...]
+Seq87 MADCEGTGAGGWFFVDMVDFINQ-EVAAEVYRQQEALDDEAIVQPLKRKFLASPSPRLDAIKAKRRLFDSGYGLLNSSNRRATLLGKFKDLYGLSYMELVRQFKSNKTTCLDWVFGVYCTVAEGVKLIQQHCEYAHIQGVVILMLLRYKCAKNRDTVAKGLLNIPETNMLIEPPKIRSTPAALYWFRASMGNASGETPEWIVRQTVVGHFQLSVMVQWAYDHDITDESILAYEYARLADVDSNAAAFLASNCQAKYVKDACTMCRHYKRAEQAQMSMSQWISFRGDWRTIVKYLRHQDIEFITFIIALKNFLKGPKKSCLVFYGPADTGKSYFCMSLLRFLGGVVISYANSSSHFWLQPLADAKLGLIDDVTPNCWSYIDVYLRNALDGNQICDRKHRPLLQLKCPPLLITTNTNPLEEFLRSRLQLFTFKNAFPLNSKGDPMYPLNDANWKCFFQRLWARLDLDEQFRCLASRLDACQETLLELY [...]
+Seq88 MEDSEGTRAGGWFHVEDLDFIDQ-EVPLQLYAQQIAQDDEATVQALKRKFVASPSPRLDAIKAKRRLFDSGYGLLKSSNLKATLLSKFKELYGVGYYELVRQFKSSRTACADWVFRVYYAVAEGIKLIQPHTQYAHIQGMVVFMLLRYNCAKNRDTVSKNMLNIPEKHMLIEPPKLRSTPAALYWYKTSMGNGSGETPEWIVRQTLIGHFKLSVMVQYAYDHDITDESALAFEYAQLADVDANAAAFLNSNCQAKYLKDAVTMCRHYKRAEREQMSMSQWITFRGDWKPIVKFLRHQGVEFVSFLAAFKSFLKGPKKNCIVFYGPADTGKSYFCMSLLQFLGGAVISYANSSSHFWLQPLADSKIGLLDDATAQCWTYIDTYLRNLLDGNPFSDRKHKTLLQIKCPPLMITTNINPLEEYLRSRVTLFKFTNPFPFASPGEPLYPINNANWKCFFQRSWSRLDLEDQFRCLASRLDACQETLLELY [...]
+Seq89 MEDSEGTRAGGWFHVEDVDFIDQ-EIPLQLYTQQIAQDDEATVQALKRKFVASPSPRLDAIKAKRRLFDSGYGLLRSSNLKATLLSKFKELFGVGYYELVRQFKSSKTACADWVFGVYYAVAEGIKLIQPHTQYAHIQGMVVFMLLRYNCAKNRDTVSKNMLNIPEKHMLIEPPKLRSTPAALYWYKTAMGNGSGETPEWIVRQTLVGHFRLSVMVQFAYDHDIVEESVLAFEYAQLADVDANAAAFLNSNCQAKYVKDAVTMCRHYKRAERAQMSMSQWITFRGDWKPIVKFLRHQGVEFVSFLAAFKLFLKGPKKNCIVFYGPADTGKSYFCMSLLQFLGGAVISYANSSSHFWLQPLSDSKIGLLDDATPQCWSYIDTYLRNLLDGNPVSDRKHKTLLQLKCPPLMITTNINPLEEYLRSRLTLFTFNNPFPFASPGEPLYPINNANWKCFFQRSWSRLDLEEQFRCLANRLDACQETLLELY [...]
+Seq90 MEDSEGTRAGGWFHVEDLDFIDQ-EIPLQLYAQQTAQDDEATVQALKRKFVASPSPRLDAIKAKRRLFDSGYGLLRSSNLKATLLSKFKDLFGVGFYELVRQFKSSKTACADWVYGVYYAVAEGLKLIQPHTQYAHIQGMVVFMLLRYNCAKNRDSVSKNMLNIPEKHMLIEPPKLRSTPAALYWYKTAMGNGSGETPEWIVRQTLVGHFRLSVMVQYAYDHDIVEESVLAFEYAQLADVDANAAAFLNSNCQAKYVKDAVTMCRHYKRAEREQMSMSQWITFRGDWKPIVRFLRHQGVEFVSFLAAFKLFLKGPKKNCIVFYGPADTGKSYFCMSLLQFLGGAVISYANSSSHFWLQPLSDSKIGLLDDATPQCWSYIDIYLRNLLDGHPVSDRKHKTLLQLKCPPLMITTNTNPLEEYLRSRLTVFTFKNPFPFASPGEPLYPINNANWKCFFQRSWSRLDLEEQFRCLANRLDACQETLLELY [...]
+Seq91 MADNSGTRAGGWFMVDMVDFIDQ-EVAQELLLQQAAADDDEAVHTVKRKFAPSPSPRLDAIKAKRRLFDSGYGILKASNQKATLLGKFKEQFGLGYNELVRHFKSSRTACVDWVFGVYCTVAEGIKLIQPLCEYAHIQGMTVLMLVRYKRAKNRETVAKGLLNVPESHMLIEPPKLRSSPAALYWYKTSMSNISGETPEWIVRQTMVGHFSLSEMVQWAYDHDITDEGTLAYEYALIADVDSNAAAFLASNCQAKYVKDACTMCRHYKRGEQARMSMSEWIRFRGDWKPIVHFLRYQNVEFIPFLCAFKLFLQGPKKSCLVFYGPADTGKSYFCMSLLKFMGGVVISYANSHSHFWLQPLSEAKMGLLDDATSQCWSYVDTYLRNALDGNVMCDRKHRSLLQLKCPPLLITTNVNPLEDYLRSRLQVFTFSNPCPLTSKGEPVYTLNDQNWKSFFQRLWARLSLDDEFRCLANRLDACQDKILELY [...]
+Seq92 MADNSGTRAGGWFIVDMVDFIDQ-EVAQELLLQQAAADDDVAVQAVKRKFTHSPSPRLDAIKAKRRLFDSGYGILKASNQRATLLGKFKEQFGLGYNELVRHFKSNRTACADWVFGVYCTVAEGIKLIQPLCDYAHIQGMTVLMLLRYKRAKNRETVAKGLLNVPESHMLIEPPKLRSGPAALYWYKTGMSNISGDTPDWIVRQTIVGHFRLSDMVQWAYDHDITDEGTLAYEYALIAEFDANAAAFLASNCQAKYVRDACTMCRHYKRGEQARMTMSEWIKFRGNWKPIVQYLRYQDVEFVPFLCALKSFLQGPKKSCLVFYGPADTGKSYFCMSLLRFMGGAVISYANSTSHFWLQPLSEAKMGLLDDATSQCWNYIDTYLRNALDGNVICDRKHRSLLQLKCPPLLITTNVNPLEDYLRSRLQVFTFKNKFPVTSSGDPLYTLNDQNWKSFFQRLWARLRLDDEFRCLASRLDACQDKMLELY [...]
+Seq93 MDDTSGTRAGGWFMVDLVDFIDQ-EVAQELLLQQAAADDDVEVQTVKRKFAPSPSPRLDAIKAKRRLFDSGYGILKASNHKATLLGKFKEQFGLGFNELIRHFKSNKTVCSDWVFGVYCTLAESFKLIQPQCEYAHIQGMTVLTLVRFKRAKNRETVAKGFLNVPENHMLIEPPKLRSAPAALYWFKTSLSNCSGETPEWIVRQTVVGHFSLSEMVQYAYDHDITDESTLAYEYALQADTDANAAAFLASNCQAKYVKDACTMCRHYKRGEQARMNMSEWIKFRGDWKPIVQYLRYQDVEFIPFLCALKSFLQGPKKSCIVFYGPADTGKSYFCMSLLKFLGGVVISYANSSSHFWLQPLAEAKIGLLDDATSQCWCYIDTYLRNALDGNQVCDRKHRALLQLKCPPLLITTNINPLGDYLRSRLQVFTFNNKFPLTTQGEPLYTLNDQNWKSFFQRLWARLNLEDEFRCLANRLDVCQDKILELY [...]
+Seq94 MDDTSGTRAGGWFMVDFVDFIDQ-EVAQELLLQQAAADDDVAVQAVKRKFAPSPSPRLDAITAKRRLFDSGYGILRASNQKATLLGKFKEQFGLGFNELIRHFKSSKTVCLDWVFGVYCTLAEGIKLIQPQCDYAHIQGMTVLMLVRYKRAKNRETVAKGLLNVPESHMLIEPPKLRSGPAALYWYKTAMSNCSGETPEWIVRQTMVGHFSLSEMVQYAYDHDITDESMLAFEYALLADTDANAAAFLSSNCQAKYVKDACTMCRHYKRGEQARMNMSEWIWFRGDWKPIVQFLRYHDVEFIPFLCAFKTFLQGPKKSCLVFYGPADTGKSYFCMSLLRFLGGVVISYANSNSHFWLQPLADAKIGLLDDATSQCWCYIDTYLRNALDGNQVCDRKHRALLQLKCPPLLITTNINPLEDYLRSRVQLFTFKNKFPLTTQGEPLYTLNDQNWKCFFRRLWARLSLDDEFRCLANRLDVCQDKMLELY [...]
+Seq95 MDDNTGTRAGGWFIVDFVDFIDQ-EVAQELFQQQTAADDDVAVQTVKRKFAPSPSPRLDAIKAKRRLFDSGYGILRASNKKATLLGKFKEQFGLGYNELIRHFKSDRTSCADWVFGVFCTVAEGIKLIQPLCDYAHIQGMTVLMLVRYKRAKNRETVAKGLLNVPESQMLIEPPKLRSGPAALYWYKTSMSSCSGETPEWIVRQTMVGHFSLSEMVQWAYDHDITDESTLAYEYALIADTDSNAAAFLSSNCQAKYLKDACTMCRHYKRGEQARMSMSEWIWFRGDWKPIVQFLRYQDVEFIPFLCAFKTFLQGPKKSCLVFYGPADTGKSYFCMSLLRFLGGAVISYANSSSHFWLQPLSEAKIGLLDDATSQCWNYIDTYLRNALDGNQICDRKHRALLQLKCPPLLITTNINPLTDFLRSRLQLFTFKNPFPVTTQGEPMYTLNDQNWKCFFRRLWARLSLEDEFRCLANRLDACQDKMLELY [...]
+Seq96 MDDNTGTRAGGWFIVDCVDFIDQ-EVARELFLQQAAADDDIAVQTVKRKFAPSPSPRLDAIKAKRRLFDSGYGILKASNHKATLLGKFKEQFGLGYNELIRHFKSDRTACVDWVFGVYCTVAEGIKLIQPLCDYAHIQGMTVFMLVRYKRAKNRETVAKGLLNVPESQMLIEPPKLRSGPAALYWYKTSMSSCSGETPEWIVRQTMVGHFTLSEMIQWAYDHDITDESTLAYEYALIADTDANAAAFLASNCQAKYLKDACTMCRHYKRGEQARMSMSEWIRFRGDWKPIVQFLRYQDVEFIPFLCAFKTFLQGPKKSCIVFYGPADTGKSYFCMSLLRFLGGAVISYANSSSHFWLQPLSEAKIGLLDDATTQCWNYVDTYLRNALDGNQVCDRKHRALLQLKCPPLLITTNVNPLADYLRSRLQLFTFKNPFPVTAQGEPLYTLNDQNWKCFFRRLWARLSLEDEFRCLANRLDACQDKMLELY [...]
+Seq97 MADPEGTGCNGWFYVDMVDFIDELETAQALFHAQEVHNDAQVLHVLKRKFAGGSSPRLQEIKAKRRLFDSGYGLLKVNNKQGAMLAVFKDTYGLSFTDLVRNFKSDKTTCTDWVFGVNPTIAEGFKLIQPFILYAHIQGVLILALLRYKCGKSRLTVAKGLLHVPETCMLIQPPKLRSSVAALYWYRTGISNISGDTPEWIQRLTIIQHFDLSEMVQWAFDNELTDESDMAFEYALLADSNSNAAAFLKSNCQAKYLKDCATMCKHYRRAQKRQMNMSQWIRFRGDWRPIVQFLRYQQIEFITFLGALKSFLKGPKKNCLVFCGPANTGKSYFGMSFIHFIQGAVISFVNSTSHFWLEPLTDTKVAMLDDATTTCWTYFDTYMRNALDGNPISDRKHKPLIQLKCPPILLTTNIHPAKDYLESRITVFEFPNAFPFDKNGNPVYEINDKNWKCFFERTWSRLDLEEDFKLLSERLSCVQDKIIDHY [...]
+Seq98 MADPEGTGCNGWFFVDMVDFIDEQETAQALFHAQEVQNDAQVLHLLKRKFAGGSSPRLQEIKAKRRLFDSGYGLLQASNKKAAMLAVFKDIYGLSFTDLVRNFKSDKTTCTDWVFGVNPTVAEGFKLIKPATLYAHIQGVLILALLRYKCGKNRLTVAKGLLHVPETCMLIEPPKLRSSVAALYWYRTGISNISGDTPEWIQRLTIIQHFDLSDMVQWAFDNDLTDESDMAFQYAQLADCNSNAAAFLKSNCQAKYLKDCAVMCRHYKRAQKRQMNMSQWIKYRGDWRPIVQFLRYQGVEFISFLRALKEFLKGPKKNCILLYGPANTGKSYFGMSFIHFLQGAIISFVNSNSHFWLEPLADTKVAMLDDATHTCWTYFDNYMRNALDGNPISDRKHKPLLQLKCPPILLTSNIDPAKDYLESRVTVFTFPHAFPFDKNGNPVYEINDKNWKCFFERTWSRLDLDEDFKCLSERLSALQDKILDHY [...]
+Seq99 MEDSQGTGCNGWFYVDMVDFIDELETAQALFHAQEVDNDAQVLHVLKRKYGTESSPRLQEIKAKRRLFDSGYGLLKANNKKAAMLAVFKETYGLSFADLVRTFKSDKTTCTDWVFGVNPTIAEGFKLIQPCTLYAHIQGVLILALLRYKCGKNRLTVAKGLLHVPETCMLIEPPKLRSSVAALYWYRTGISNISGDTPEWIQRLTIIQHFDLSEMIQWAFDNDFTDESDIAYEYAQLADCNSNAAAFLKSNCQAKYLRDCAVMCRHYKRAQKRQMNMSQWIKYRGDWRPIVQFLRFQGIEFITFLGALKAFLKGPKKNCIVIHGPANTGKSYFGMSFIHFIQGAIISFVNSNSHFWLEPLADAKVAMLDDATNTCWTYFDNYMRNALDGNPISDRKHKPLLQLKCPPILLTSNINPAIDYLESRVTVFTFPNAFPFDKNGNPVYEINDKNWKCFFERTWSRLDLDDDFKCLSERLSVLQDKILDHY [...]
+Seq100 MADSEGTGCNGWFFVDLVDFIDERETAQALFNVQEAQRDAREMHVLKRKFGCS-SP-LQEIKVKRRLIDSGYGLLHSKNKKAAMYAKFKELYGLSFQDLVRTFKSDRTTCSDWVFGVNPTVAEGFKLIQPYVLYAHIQGVVILALLRYKCGKNRITVAKGLLHVPDTCMLIEPPKLRSGVAALYWYRTGMSNISGETPEWIQRLTIIQHFDLSEMIQWAFDNDLTDESDIAYEYALIADSNSNAAAFLKSNCQAKYLKDCAVMCRHYKRAQKRQMSMSQWIKWRGDWKPIVQFLRYQGVEFITFLCALKDFLKGPKRNCIVLCGPANTGKSYFGMSLLHFLQGTVISHVNSNSHFWLEPLTDRKLAMLDDATDSCWTYFDTYMRNALDGNPISDRKHRHLVQIKCPPMLITSNTNPVTDYLNSRLMVFKFPNKLPFDKNRNPVYTINDRNWKCFFERTWCRLDLEEDFKCLSQRLSVLQDQILEH [...]
+Seq101 MANCEGTGCNGWFFVDMVDFIDERETAQVLLNMQEAQRDAQRVRALKRKYTDS-SP-LQELQARQPAYDSGYGLLQCNNKKAAMLTEFKKVYGLSFNDLVRTFKSDKTTCTDWVFGVNPTIAEGFKLIKQYALYTHIQGILILMLIRYKCGKNRITVGKGLLHVPDSCMLLQPPKLRSPVAALYWYRTGISNISGDTPEWIKRLTIIQHFDLSDMVQWAFDNELTDDSDIAFQYAMLADCNSNAAAFLKSNCQAKYVKDCATMCRHYKRAQKRQMTMPQWIKFRGDWRPIVQFLRYQGLEFITFLCALKDFLKGPKRNCIVIHGPPNTGKSYFCMSLIHFLQGTIISYVNSASHFWLEPLADAKIAMLDDATGTCWSYFDNYMRNALDGNPISDRKHRHLIQIKCPPMLITSNTNPVEDYLHSRLTVFKFPNAFPFDQNRNPVYTINDKNWKCFFEKTWCRLDLEDEFKCLSQRLNVLQEKILEH [...]
+Seq102 MANCEGTGCNGWFLVDLADFIDERETAQVLYNMQEAQRDAQSVRALKRKYGGSNRVTLQELQARTNVYDSGYGVLQANNQKAILLSQFKHTYGLAFNDLVRTFKSDKTICTDWVCGVNPTIAEGFKLIQPYALYTHIQGVYILLLIRYKCGKNRITVGKGLLHVPESCMLIEPPKLRSPVAALYWYRTGMSNISGTTPEWIQRLTVIQHFDLSDMVQWAFDNDVTEDSDIAYGYALLADSNSNAAAFLKSNCQAKYVRDCATMCRHYKRAQKKQMTMAQWIRFRGDWRPIVQFLRYQGVEFITFLCAFKEFLKGPKKNCIVIQGPPNTGKSYFCMSLMHFLQGTVISYVNSTSHFWLEPLADAKVAMLDDATGTCWSYFDTYMRNALDGNPISDRKHRHLIQIKCPPILITSNTNPVEEYLTSRLTVFTFPNAFPFDQNRNPVYTINNKNWKSFFQKTWCKLDLEDEFKCLSQRLNALQEKILEH [...]
+Seq103 MANREGTGCNGWFLVDLADFIDERETAQVLLHMQEAQRDAQAVRALKRKYTDSSRGTLQEIQATQTVYDSGYGLLQSNNKKAAMLTQFKETYGLSFTDLVRTFKSDKTTCTDWVFGVHPTIAEGFKLINKYALYTHIQGVLILMLIRYTCGKNRVTVGKGLLHVPESCMLLEPPKLRSPVAALYWYRTGISNISGDTPEWIQRLTVIQHFDLSDMVQWAFDNEYTDESDIAFNYAMLADCNSNAAAFLKSNCQAKYVKDCATMCKHYKRAQKRQMSMSQWIKFRGDWRPIVQFLRYQGIEFISFLCALKEFLKGPKKNCIVIYGPANTGKSHFCMSLMHFLQGTVISYVNSTSHFWLEPLADAKLAMLDDATGTCWSYFDNYMRNALDGYAISDRKYKSLLQMKCPPLLITSNTNPVEDYLRSRLTVFKFPNAFPFDQNRNPVYTINDKNWKCFFEKTWCRLDLEDEFKCLSQRLNVLQDKILEY [...]
+Seq104 MADPEGTGCNGWFFVDLVDFIDDREAAQALLHAQEVETDTKLLHALKRKYGAHSSSPLQEITAKRRLCDSGYGLLKVNNKKAAILAKFKETYGLSFTDLVRTFKSDKTTCTDWVCGVNPNIAEGFKLIQPYVLYAHIQGVFILALLRYKCGKNRLTVAKGLLHVPDTHMLIEPPKLRSSCAALYWYRTGISNISGDTPEWIQRQTIIQHFDLSEMIQWAFDNDYIDESDIAYEYAQLADCNSNAAAFLKSNCQAKYLRDCAVMCRHYKRAQRKQMNMSQWISYRGDWKPIVQFLRFQGIEFITFLRAFKDFLKGPKKNCIVIYGPANTGKSYFCMSLIQFLHGTVLSFVNSNSHFWLEPLTDTKIAMVDDATPTCWSYFDNYMRNALDGNPISDRKHKHLIQMKCPPMLITSNTNPATDYLRSRVTVFTFPHTFPFDSNGNPVYDINDKNWKCFFKRTWSRLDLEEDFKCLSQRLNVLQEKILEH [...]
+Seq105 M-DCEGTGCTGWFSVDLIGFIDQ-EVTQALFQAQQKQANTKAVRNLKRKLLGS-SDSQQNT-AKRRAVDSGYGLLKCSNVKAALLSKFKTVYGVSFAELVRVFKSDKTCCSDWVFGVAGSVAESIKLIQQYCLYYHIQGVIVLMLVRFTCAKNRTTIKNCLLNVPETQLLIEPPKLRSTAVALYFYKTGLSNISGDTPEWIVRQTQLEHFDLSKMVQWAFDHDITDDSEIAFKYAQLADIDSNAAAFLKSNCQAKYVKDCATMTRHYKRAQKRSMCMSQWLQYRGSWKEIAKFLRFQHVNFIYFLQVLKQFLKGPKHNCIVIYGPPNTGKSQFAMSFIKFMQGSVISYVNSNSHFWLQPLEDAKVAVLDDATYSCWLYIDKYLRNFLDGNPCCDRKHRSLLQVTCPPLIITSNINPQEDYLHSRVTVIPFPNTFPFDSNGNPVYALTDVNWKSFFSTTWSRLDLDADFKCLCQRLNACQEKILDY [...]
+Seq106 M-DCEGTGCTGWFSVDLIGFIDQ-QVAQALFQAQETQANKKAVRALKRKLLGS-SNSQQST-AKRRAVDSGYGLLKCSNVKAALLSKFKTVYGVSYTELVRVFKSDKTCCSDWVFGVAGSVAESLKLIQPYCLYYHIQGVLPLMLIRFTCAKNRATIKKCLLNVPDTQLLIEPPKLRSTAVALYFYKTGLSNISGDTPEWIVRQTQLEHFDLSKMVQWAFDHDITDDSEIAFKYAQLADIESNAAAFLKSNCQAKYVKDCATMTRHYKRAQKRSMGMSQWLQHRGTWKDIARFLRYQNVNFIYFLQVLKQFLKGPKHNCIVIYGPPNTGKSQFAMSFIKFVQGSVISYVNSNSHFWLQPLEDAKVALLDDATYGCWLYIDKYLRNFLDGNPCCDRKHRSLIQVRCPPLIITSNINPQDDYLHSRVTVIPFPNTFPFDSNGNPVYELTDVNWKSFFSTTWSRLDLDADFKCLCQRLNACQEKILDY [...]
+Seq107 M-DCEGTGCNGWFFVDLINFIDEQETARALFQAQELQANKEAVHQLKRKFLVSPNNTH-SH-VKRRLLDSGYGVLKSSNAKATLMAKFKELYGISYNELVRVFKSDKTCCIDWVFGVSPMVAENLKLIKPFCMYYHIQGTIVLMLIRFSCAKNRTTIAKCLVNIPQSQMFIEPPKLRSTPVALYFYRTGISNISGETPEWITRQTQLQHFELSQMVQWAFDHEVLDDSEIAFHYAQLADIDSNAAAFLKSNCQAKYVKDCGTMARHYKRAQRKSLSMSAWIRYRGNWREIAKFLRYQGVNFMSFIQMFKQFLKGPKHNCIVIYGPPNTGKSLFAMSLMKFMQGSIISYVNSGSHFWLQPLEDAKIALLDDATYGCWTYIDQYLRNFLDGNPCSDRKHRSLIQLVCPPLLITSNINPQEDYLHTRVTVLKFLNTFPFDNNGNAVYTLNDENWKNFFSTTWSRLDLEEDFKCLCHRLNVCQEKILDC [...]
+Seq108 M-DSEGTGCTGWFYVDIIDFIDERETAQALLQVQETQAHKEAVQHLKRKFLGSPSNSQQQP-GKRRLLDSGYGVLKCSNAKAMFMAKFKELYGVSYNELVRVFKSDKTCCTDWVFGVSPMVAENLKLIQPFCMYYHIQGTIVLLLARFTCAKNRLTIAKCLVNIPQSQMFIEPPKLRSTAVALYFYRTGISNISGETPEWITRQTQLQHFELSQMVQWAFDHDVVDDSEIAFYYAQLADTDSNAAAFLKSNCQAKYVKDCGTMTRHYKRAQRKSLTMSAWIRYRGNWREIAKFLRYQGINFMYFIQTFKLFLKGPKHNCIVIQGPPNTGKSQFAMSLIRFLQGCVISYVNSGSHFWLQPLEDAKVALLDDATYGCWTYIDQYLRNFLNGNPCSDRKHRSLLQIVCPPLLITSNINPKEDYLHSRVTVFQFLNAFPFDPHGNPVYALNDVNWKNFFSTTWSRLDLEEDFKCLCHRLNVCQEKILDC [...]
+Seq109 MASPEGTGCRGWFHVDLDGFIDERETAQQLLH?QNTHADTQTLQKLKRKYLGSP------SAVKRRLIDSGYGILKCSNVQAKLYCKFKDIFGIPFSELVRTFKSDSTCCHDWIFGVN?TLAEALKIIKTQCIYYHMQGVVILLLIRYTCGKNRKTIVKSLLNVPTEQMLVQPPKIRSPAVALYFYKTSISNISGSTPEWIERQTQLQHFELSKMVQWAFDNEVTDDSQIAFHYAQLADVDSNAQAFLKSNMQAKYVKDCGIMCRHYKRAQQQQMNMKQWIKHVGDWKPIVQFLRYQGVEFISFLSYFKLFLQGPKHNCLVIYGPPNTGKSCFAMSLINFFHGSVISYVNSHSHFWLQPLDNTKLGMLDDATEACWKYIDEYLRNLLDGNPVSDRKHKQLVQIKCPPVLITTNINPMQDYLHSRIHVLQFLNPFPIDVNGNPVYQLNNANWKCFFERTWSRLDLDEDFRCLCQRLDACQEKILDC [...]
+Seq110 MASPEGTGCTGWFHVDLDGFLDDRETAQQLLHAQNTYADTQTLHNLKRKYLGSP------SGVKRRIIDSGYGLLKSSNVQAKLCYKFKELFGIPFSELVRTFKSDSTCCHDWIFGVNETLAEALKIIKSQCMYYHIQGVVILMLIRYTCGKNRKTIIKSLVNVPSEQMLVQPPKIRSPAVALYFYKTAMSNISGETPEWIQRQTQIQHFELSKMVQWAFDNDVTDDSDIAFYYAQLADVDSNAQAFLKSNMQAKYVKDCGIMCRHYKRAQQQQMNMKQWITHIGDWRPIVQFLRYQGVDFISFLSYFKLFLRGPKHNCLVLYGPPNTGKSCFAMSLIQFFQGSVISYVNSHSHFWLQPLDNAKLGMLDDATDACWRYIDEYMRNLLDGNPVSDRKHKQLVQIKCPPVIITTNINPLHDYLHSRIHVVPFLNPFPIDTNGNPVYQLNNVNWKCFFERTWSRLDLDEDFRCLCQRLDACQEKILDC [...]
+Seq111 MASPEGTGCCGWFQVDVDGFIDDRETAQQLLQVQTAHADAQTLQKLKRKYIGSP------SEVKRRLIDSGYGLFKSSNVQGRLHFKFKEVYGVPYTELVRTFKSDSTCCNDWIFGVNETLAEALKILKPQCVYYHMQGVIVMMLIRYICGKNRKTITKSLLNVPQEQMLIQPPKLRSPAVALYFYKTAMSNISGETPEWIQRQTQLQHFELSKMVQWAFDNEVTDDSQIAFLYAQLADIDSNAQAFLKSNMQAKYVKDCGIMCRHYKRAQQQQMNMCQWIKHIGDWKPIVQFLRYQGVDFISFLSYFKLFLQGPKHNCLVLCGPPNTGKSCFAMSLINFFQGSVISFVNSQSHFWLQPLDNAKLGLLDDATDTCWRYIDDYLRNLLDGNPISDRKHKQLVQIKCPPVIITTNVNPMQDYLHSRISVFKFENPFPLDNNGNPVYELSNVNWKCFFERTWSRLNLDEDFRCLSQRLDACQNKILDC [...]
+Seq112 MASPEGTGCCGWFEVDLDGFIDDAET?QQLLQVQTAHADKQTLQKLKRKYIASP------SGVKRRLIDSGYGLFKSSNLQGKLYYKFKEVYGIPFSELVRTFKSDSTCCNDWIFGVNETLAEALKIIKPHCMYYHMQGVIVMMLIRYTCGKNRKTIAKALLNVPQEQMLIQPPKIRSPAVALYFYKTAMSNISGDTPEWIQRQTQLQHFELSKMVQWAFDNEVTDDSQIAFQYAQLADVDSNAQAFLKSNMQAKYVKDCGIMCRHYKRAQQQQMNMCQWIKHIGDWKPIVQFLRYQGVDFISFLSYFKLFLQGPKHNCLVLCGPPNTGKSCFAMSLIKFFQGSVISFVNSQSHFWLQPLDNAKLGLLDDATEICWKYIDDYLRNLVDGNPISDRKHKQLVQIKCPPLLITTNINPMLDYLHSRMLVFQFQNPFPLDNNGNPVYELSNVNWKCFFTRTWSRLNLDEDFKCLSQRLNACQNKILDC [...]
+Seq113 MDPEGTPGCTGWFNVDLVDFIDD-EAPGALLHAQETQAHAEAVQVLKRKFVGSPSPRLNEIQAKRRLFDSGYGLLRCSNLKATLLSKFKSVYGVSFSELVRSFKSDRTTCADWVAGVHHSVAEGLKLIQPFCSYAHIQGVYLLLLARFKCGKNRLTVSKCMLNVQETHMLIEPPKLRSAAAALYWYRTGISNVSGETPEWITRQTMFQHFDLSEMVQWAYDHDFTDDSVIAYEYAQLAGIDSNAAAFLKSNAQAKYVKDCATMCRHYKRAERQQMTMSQWIKQRGDWRPIVQFLRYQGVEFIAFLAALKLFLKGPKKNCIVLFGPPNTGKSYFGMSLIHFLQGSIISYVNSNSHFWLQPLADAKVAMLDDATPQCWSYIDNYLRNALDGNPISDRKHKNLVQMKCPPLLITSNTNAGQDYLHSRMVVFTFEQPFPFDQNGNPVYELNDKNWKSFFSRTWSRLDLEEEFKCLAERLSALQDRILEL [...]
+Seq114 MADPEGTGCTGWFEVDLLEFIDDTEAARALFNIQEGEDDLNAVCALKRKFAACSANPCRTSYRKRKIDDSGYGVLHSSNTKANILYKFKEAYGISFMELVRPFKSDKTSCTDWCYGISPSVAESLKLIKQHSLYTHLQGIIILLLIRFRCSKNRLTVAKLMLSIPETCMVIEPPKLRSQTCALYWFRTAMSNISGTTPEWIDRLTVLQHFDLSEMVQWAYDNELTDDSDIAYYYAQLADSNSNAAAFLKSNSQAKIVKDCGIMCRHYKKAEKRKMSIGQWIQSRGNWRPIVQLLRYQNIEFTAFLGAFKKFLKGPKKSCMLICGPANTGKSYFGMSLIQFLKGCVISCVNSKSHFWLQPLSDAKIGMIDDVTPISWTYIDDYMRNALDGNEISDVKHRALVQLKCPPLLLTSNTNAGTDYLHSRLTVFEFKNPFPFDENGNPVYAINDENWKSFFSRTWCKLDLEEDFKCISARLNAVQEKILDL [...]
+Seq115 MDDPEGTGCTGWFEVDLIEFIDEAEAARALFNVQEGVDDINAVCALKRKFAACSANVCVSWHRKRKIIDSGYGILHNSNTKATLLYKFKEAYGVSFMELVRPFKSDKTSCTDWCYGISPSVAESLKLIKQHSIYTHLQGIILLLLIRFKCSKNRLTVAKLMLSIPETCMIIEPPKLRSQACALYWFRTAMSNISGTTPEWIDRLTVLQHFDLSEMIQWAYDNDITDDSDIAYKYAQLADVNSNAAAFLRSNAQAKIVKDCGVMCRHYKRAEKRGMTMGQWIQSRGNWRPIVQFLRYQNIEFTAFLVAFKQFLQGPKKSCMLLCGPANTGKSYFGMSLIHFLKGCIISYVNSKSHFWLQPLSDAKLGMIDDVTAISWTYIDDYMRNALDGNDISDVKHRALVQLKCPPLIITSNTNAGKDYLHSRLTVFEFNNPFPFDANGNPVYKINDENWKSFFSRTWCKLGLEEDFKCISARLSAVQDKILDI [...]
+Seq116 MEDPEGTGCTGWFEVDLIDFIDEHEAARALFNAQEGEDDLHAVSAVKRKFTSSPSPRA-KHLPKRKPCDSGYGIMCENSIKTTVLFKFKETYGVSFMELVRPFKSNRSSCTDWCMGVTPSVAEGLKLIQPYSIYAHLQGVLILLLIRFKCGKNRLTVSKLMLNIPETHMVIEPPKLRSATCALYWYRTGLSNISGTTPEWIEQQTVLQHFDFGEMVQWAYDHDITDDSDIAYKYAQLADVNSNAAAFLKSNSQAKIVKDCATMCRHYKRAERKHMNIGQWIQYRGDWRPIVRFLRYQDIEFTAFLDAFKKFLKGPKKNCLVLYGPANTGKSYFGMSLIRFLSGCVISYVNSKSHFWLQPLTDAKVGMIDDVTPICWTYIDDYMRNALDGNDISDVKHRALVQIKCPPLILTTNTNAGTDYLHSRLVVFHFKNPFPFDENGNPIYEINNENWKSFFSRTWCKLDLEEDFKCIPARLNAVQEKILDL [...]
+Seq117 MEDPEGTGCTGWFSVDLIGFIDDTQAARALFNLQEEEDDLNAVSALKRKF--TGGGNSN--AAKRRAYDSGYGIMHVNNIKATLMHKFKEAYGVTFTQLIRPFKSDRTSCTDWCFGITPSVAESLKLIKPQTLYTHLQGIIILLLVRFKCAKNRLTVSKLMLSIPETHMIIEPPKIRSTTCALYWFRTGMSNISGQTPEWIERLTVLQHFDLGEMVQWAYDNDITDDSEIAYQYAMLADVNSNAAAFLKSNSQAKIVKDCGTMCRHYKRAEKRKMTIGQWIQARGDWRTIVKLLRYQNVEFTQFLATFKKFLKGPKKSCMVICGPPNTGKTYFAMSLIHFLQGCVISYVNAKSHFWLQPLSDAKIGMIDDVTAICWTYIDDYLRNALDGNDISDVKHKALVQLKCPPLLLTSNIDVATDFLHSRVVVFRFNNPFPFDENGNPVYNLNDENWKSFFSRTWCQLDLEEDFKCIATRLNAVQEKILDV [...]
+Seq118 MADPAGTGCNGWFYVDMVDFIDEAETAQALFHAQEAEEHAEAVQVLKRKYVGSPSPRLKAITAKRRLFDSGYGVLKTSNGKAAMLGKFKELYGVSFMELIRPFQSNKSTCTDWCFGVTGTVAEGFKLLQPYCLYCHLQGMVMLMLVRFKCAKNRITIEKLLLCISTNCMLIQPPKLRSTAAALYWYRTGMSNISGETPEWIERQTVLQHFDLSQMVQWAYDNDVMDDSEIAYKYAQLADSDSNACAFLKSNSQAKIVKDCGTMCRHYKRAEKRQMSMGQWIKSRGDWRDIVKFLRYQQIEFVSFLSALKLFLKGPKKNCILIHGAPNTGKSYFGMSLISFLQGCIISYANSKSHFWLQPLADAKIGMLDDATTPCWHYIDNYLRNALDGNPVSDVKHKALMQLKCPPLLITSNINAGKDYLHSRLVVFTFPNPFPFDKNGNPVYELSDKNWKSFFSRTWCRLNLEEDFKCLSQRLNVCQDKILEH [...]
+Seq119 MADPAGTGCNGWFYVDLVDFIVETETAHALFTAQEAKQHRDAVQVLKRKYLGSPSPRLKAIAAKRRLFDSGYGVLKTSNAKAAMLAKFKELYGVSFSELVRPFKSNKSTCCDWCFGLTPSIADSIKLLQQYCLYLHIQGMVVLLLVRYKCGKNRETIEKLLLCVSPMCMMIEPPKLRSTAAALYWYKTGMSNISGDTPEWIQRQTVLQHFELSRMVQWAYDNDIVDDSEIAYKYAQLADTNSNASAFLKSNSQAKIVKDCATMCRHYKRAEKKQMSMSQWIKYRGDWKQIVMFLRYQGVEFMPFLTALKRFLQGPKKNCILLYGAANTGKSLFGMSLIKFLQGSVICFVNSKSHFWLQPLADAKIGMLDDATVPCWNYIDDNLRNALDGNLVSDVKHRPLVQLKCPPLLITSNINAGTDYLHNRLVVFTFPNEFPFDENGNPVYELNDKNWKSFFSRTWSRLSLDEDFKCLCQRLNVCQDKILTH [...]
+Seq120 MADPAGTGCNGWFYVDLVDFIVETETAHALFTAQEAKQHRDAVQVLKRKY?GSPSPRLKAIAAKRRLFDSGYGVLKTSNAKAAMLAKFKELYGVSFSELVRPFKSNKSTCCDWCFGLTPSIADSIKLLQQYCLYLHIQGMVVLLLVRYKCGKNRETIEKLLLCVSPMCMMIEPPKLRSTAAALYWYKTGISNISGDTPEWIQRQTVLQHFELSQMVQWAYDNDIVDDSEIAYKYAQLADTNSNASAFLKSNSQAKIVKDCATMCRHYKRAEKKQMSMSQWIKYRGDWKQIVMFLRYQGVEFMSFLTALKRFLQGPKKNCILLYGAANTGKSLFGMSLMKFLQGSVICFVNSKSHFWLQPLADAKIGMLDDATVPCWNYIDDNLRNALDGNLVSDVKHRPLVQLKCPPLLITSNINAGTDYLHNRLVVFTFPNEFPFDENGNPVYELNDKNWKSFFSRTWSRLSLDEDFKCLCQRLNVCQDKILTH [...]
+Seq121 MADPAGTGCNGWFYVDLVDFIVETETAHALFTAQEAKEHRDAVQVLKRKYLGSPSPRLKAIAAKRRLFDSGYGVLKTSNAKAAMLAKFKELYGVSFSELVRPFKSNKSTCCDWCFGLTPSIADSIKLLQQYCLYLHIQGMVVLLLVRYKCGKNRETIEKLMLCVSPMCMMIEPPKLRSTAAALYWYKTGMSNISGDTPEWIQRQTVLQHFELSQMVQWAYDNDIVDDSEIAYKYAQLADTNSNASAFLKSNSQAKIVKDCATMCRHYKRAEKKQMSMSQWIKYRGDWKQIVMFLRYQGVDFMSFLSALKKFLQGPKKNCILLYGAANTGKSLFGMSLMKFLQGSVICFVNSKSHFWLQPLADAKIGMLDDATVPCWNYIDDNLRNALDGNLVSDVKHRPLVQLKCPPLLITSNINAGTDYLHNRLVVFTFPNEFPFDENGNPVYELNDKNWKSFFSRTWSRLSLDEDFKCLCQRLNVCQDKILTH [...]
+Seq122 MADPAGTGCNGWFYVDLVDFIVETETAHALFTAQEAKEHRDAVQVLKRKYLGSPSPRLKAIAAKRRLFDSGYGVLKTSNAKAAMLAKFKELYGVSFTELVRPFKSNKSTCCDWCFGLTPSIADSIKLLQQYCLYLHIQGMVVLLLVRYKCGKNRETIEKLMLCVSPMCMMIEPPKLRSTAAALYWYKTGMSNISGDTPEWIQRQTVLQHFELSQMVQWAYDNDIVDDSEIAYKYAQLADTNSNASAFLKSNSQAKIVKDCATMCRHYKRAEKKQMSMSQWIKYRGDWKQIVMFLRYQGVDFMSFLSALKKFLQGPKKNCILLYGAANTGKSLFGMSLMKFLQGSVICFVNSKSHFWLQPLADAKIGMLDDATVPCWNYIDDNLRNALDGNLVSDVKHRPLVQLKCPPLLITSNINAGTDYLHNRLVVFTFPNEFPFDKNGNPVYELNDKNWKSFFSRTWSRLSLDEDFKCLCQRLNVCQDKILTH [...]
+Seq123 MADPAGTGCNGWFYVDLVDFIVETETAHALFTAQEAKEHRDAVQVLKRKYLGSPSPRLKAIAAKRRLFDSGYGVLKTSNAKAAMLAKFKELYGVSFSELVRPFKSNKSTCCDWCFGLTPSIADSIKLLQQYCLYLHIQGMVVLLLVRYKCGKNRETIEKLLLCVSPMCMMIEPPKLRSTAAALYWYKTGMSNISGDTPEWIQRQTVLQHFELSQMVQWAYDNDIVDDSEIAYKYAQLADTNSNASAFLKSNSQAKIVKDCATMCRHYKRAEKKQMSMSQWIKYRGDWKQIVMFLRYQGVDFMSFLTALKRFLQGPKKNCILLYGAANTGKSLFGMSLMKFLQGSVICFVNSKSHFWLQPLADAKIGMLDDATVPCWNYIDDNLRNALDGNLVSDVKHRPLVQLKCPPLLITSNINAGTDYLHNRLVVFTFPNEFPFDENGNPVYELNDKNWKSFFSRTWSRLSLDEDFKCLCQRLNVCQDKILTH [...]
+Seq124 MADPAGTGCNGWFFVDMVDFINETETAQALFHAQEEQTHKEAVQVLKRKYASSPSPRLKAIAAKRRLFDSGYGILKCSNANAAMLAKFKELFGISFTELIRPFKSDKSTCTDWCFGIAPSVANFKH-----?ICIHIQAMVILALLRFKV?KTRTTIENY*LCISAASMLIQPPKLRSTPAALYWFKTAMSNISGETPEWIQRQTVLQHFDLSEMVQWAYDNDFIDDSDIAYKYAQLAETNSNACAFLKSNSQAKIVKDCATMCRHYKRAEKREMTMSQWIKRRGDWRDIVRFLRYQQVDFVAFLSALKNFLHGPKKNCILIYGAPNTGKSLFGMSLMHFLQGAIISYVNSKSHFWLQPLYDAKIAMLDDATSPC??YIDQYLRNALDGNPIFDVKH*A?VH??CPPLLIT?NINAGKDYLHSRVVVFTFHNEFPFDKNGNPEYGLNDKNWKSFFSRTWCRLNLEEVFKCLSQRLSVCQDKILEH [...]
+Seq125 MADSGNWRCSGWFNVEMGDFIDQQEIAQALYHSQQVNADNEAIRVLKRKFAGSASPH--ILTSTHLLCDSGYGILKSSNVKATLLAKFKEVYGLSYMELVRPYKSDKTQCQDWVFGVAPSLAESLKLLTQYCLYIHLQGIIVLLLARFKCNKNRLTVQKLLLNVTQEYMLIEPPRLRSTPCALYWYRTSLSNISGEVPEWIKRQTVVQHFDLSQMVQWAFDNDITNDCEIAYKYALLASEDSNAAAFLKSNAQAKYVKDCGTMCRHYKAAERKQMTMSQWITHRGNWKHIVQFLRYQQVEFVPFLIALKQFLKGPKQNCIVIYGPPDTGKSHFGMSLMQFMQGVVISYVNSNSHFWLSPLADAKMALLDDATPACWTYIDRYLRNALDGNPMCDRKHKHLLQIKCPPLLITSNTNPKADYLHSRMKVFTFSNPFPFDSNGNPLYQLTNENWKAFFTKTWSKLDLDDDFKCLCKRLSACQDAILEL [...]
+Seq126 MADSGNWRCTGWFNVEMGDFIDQQEIAQALYQSQQANADNEAIRVLKRKFTGSPSPQINVLTSKRRLFDSGYGLLQRNNAKAALLAKFKEVYGLSYMELVRPYKSDKTHCQDWVFGVIPSLAESLKLLTQYCMYIHLQGIIVLVLVRFKCNKNRLTVQKLLLNVTQERMLIEPPRLRSTPCALYWYRTSLSNISGDTPEWIKRQTLVQHFDLSQMIQWAFDNDITDDCEIAYKYALLGNVDSNAAAFLKSNAQAKYVKDCGTMCRHYKAAERKQMSMAQWIQHRGNWKDIVLFLRYQNVEFMPFLITLKQFLKGPKQNCIVLYGPPDTGKSHFGMSLIKFIQGVVISYVNSTSHFWLSPLADAKMALLDDATPGCWTYIDKYLRNALDGNPICDRKHKNLLQVKCPPLLITSNTNPKADYLHSRIKVFTFLNPFPFDSNGNPLYQLTNENWKAFFTKTWSKLDLDDDFKCLCKRLSACQDAILEL [...]
+Seq127 MADDTGTGCSGWFLVDMVDFID-LE-AQALLNEQEADAHYAAVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKDVRATLHGKFKECYGLSFKDLTREFKSDKTTCGDWVFGVHHSVSEAFQLIQPLSTYSHIQGMVLLVLLRFKVNKNRCTVARTLLNIPEDHMLIEPPKIQSSVAALYWFRTSISNASGDTPEWIARQTIVEHFKLTEMVQWAYDNDYCDESDIAFEYAQRADFDSNAKAFLNSNCQAKYVKDCATMCKHYKNAEMKKMSIKQWIKYRGNWKPIVQFLRHQGIEFISFLSKLKLWLHGPKKNCIAIVGPPDTGKSAFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDATQSCWGYMDTYMRNLLDGNPMSDRKHKSLALIKCPPLLVTSNIDITTEYLYSRVTLFKFPNPFPFDSNGNAVYELCDANWKCFFARLSASLDIE-DFRCLAKHLDACQEQLLEL [...]
+Seq128 MADNTGTGCSGWFLVDMVDFID-LE-AQALLNEQEADAHYAAVQDLKRKYLGSPSPRLNAIKVKRRLFDSGYGLLKCKDIRATLHGKFKQCYGLSFTDLIRQFKSNKTTCEDWVFGVHHSVSEAFELIQPLTIYRHIQGMLLLVLLRFKVNKNRCTVARTLLNIPEDHMLIEPPKIQSSVAALYWFRTSLSNASGETPEWIARQTIVEHFKLTEMVQWAYDNDYCDECDIAFEYAKRADFDSNAKAFLNSNCQAKYVKDCATMCKHYKNAEMKKMTMNQWIKHRGNWKPIVQFLRHQNIEFISFLSKLKLWLQGPKKNCIAIVGPPDTGKSMFCMSLIKFLGGTVISYVNSSSHFWLQPLCNTKVALLDDATHSCWGYMDTYMRNLLDGNPMSDRKHKSLALIKCPPLLVTSNIDITTEYLYSRVTVFKFPNPFPFDRNGNAVYELCDANWKCFFARLSASLDIE-DFRCLAKHLDACQEQLLEL [...]
+Seq129 MAEDTGTGCSGWFLVDMVDFID-VE-AQALLNEQEADAHYAAVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKDVRATLYGKFKDCYGLSFTDLIRPFKSDKTTCGDWVFGIHHSVSEAFELMQPLTTYMHIQGMVLLVLIRFKVNKSRCTVARTLLNIPEDHMLIEPPKIQSSVAALYWFRTGISNASGETPEWIKRQTIVEHFKLTEMVQWAYDNDFCDESEIAFEYAQRGDFDSNARAFLNSNCQAKYVKDCATMCKHYKNAEMKKMSMKQWITYRGNWKPIVQFLRHQNIEFIPFLSKLKLWLHGPKKNCIAIVGPPDTGKSCFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDATQSCWVYMDTYMRNLLDGNPMSDRKHKSLALIKCPPLLVTSNVDITKDYLYSRVTTLTFPNPFPFDRNGNAVYELSDANWKCFFTRLSASLDIE-DFRCIAKHLDACQEQLLEL [...]
+Seq130 MADNTGTGCSGWFLVDMVDFID-ME-AQALLNEQEADAHYAAVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKNIRATLLGKFKDCYGLSYTDLIRQFKSDKTTCGDWVFGVHHSVSEAFQLIQPVTTYSHIQGMVLLALVRFKVNKNRCTVARMMLNIPEDHMLIEPPKIQSGVAALYWFRSGISNASGETPEWITRQTIVEHFKLADMVQWAYDNDFCEESEIAFEYAQRADIDANARAFLNSNCQAKYVKDCATMCKHYKTAEMKKMNMKQWIKFRGNWKPIVQFLRHQNIEFIPFLTKLKMWLHGPKKNCIAIVGPPDTGKSCFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDVTQSCWVYMDTYMRNLLDGNPMTDRKHKSLALIKCPPLIVTSNIDITKEYLCSRVTLFTFPNPFPFDRNGNALYDLCETNWKCFFARLSSSLDIE-DFRCIAKHLDVCQEQLLEL [...]
+Seq131 MAENTGTGCSGWFLVDMVDFID-LE-AQALLNEQEADAHYAAVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKDVRATLLGKFKDCYGLSYTDLIRQFKSNKSTCGHWVFGVHHSVADAFQLIQPVTTYSHIQGMVLLALLTFKVNKNRCTVARMLLNIPEDHMLIEPPKIQSTVAALYWFRSSLSNASGDTPDWITRQTIVEHFKLADMVQWAYDNDLCDESEIAFDYAQRADIDANARAFLNSNCQAKYVKDCATMCKHYKNAEMKKMNMKQWIHYRGNWKPIVQFLKHQNIEFIPFLSKLKLWLHGPKKNCIAIVGPPDTGKSCFCMSLIKFLGGTVISYVNSSSHFWLQPLCNAKVALLDDATQSCWVYIDTYMRNLLDGNPMSDRKHKSLALIKCPPLLITSNIDITKDYLFSRVSVFTFPNPFPFDRNGNAVYDLCESNWKCFFTRLSASLDIE-DFRCIAKHLDVCQEQLLEL [...]
+Seq132 MADDSGTGCTGWFMVDMVDFID-LE-AQALFNRQEADTHYATVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKDLRAALLGKFKECFGLSFIDLIRPFKSDKTTCLDWVFGIHHSISEAFQLIEPLSLYAHIQGMVLLVLLRFKVNKSRSTVARTLLNIPENQMLIEPPKIQSGVAALYWFRTGISNASGEAPEWITRQTVIEHFKLTEMVQWAYDNDICEESEIAFEYAQRGDFDSNARAFLNSNMQAKYVKDCATMCRHYKHAEMRKMSIKQWIKHRGNWKPIVQFLRHQNIEFIPFLTKFKLWLHGPKKNCIAIVGPPDTGKSYFCMSLISFLGGTVISHVNSSSHFWLQPLVDAKVALLDDATQPCWIYMDTYMRNLLDGNPMSDRKHKALTLIKCPPLLVTSNIDITKEYLHTRVTTFTFPNPFPFDRNGNAVYELSNTNWKCFFERLSSSLDIE-DFRCIAKRLDACQEQLLEL [...]
+Seq133 MADDSGTGCTGWFMVDMVDFID-VE-AQALFNRQEADAHYATVQDLKRKYLGSPSPRLDAIKVKRRLFDSGYGLLKCKDIRSTLHGKFKDCFGLSFVDLIRPFKSDRTTCADWVFGIHHSIADAFQLIEPLSLYAHIQGMVLLVLIRFKVNKSRCTVARTLLNIPENHMLIEPPKIQSGVRALYWFRTGISNASGEAPEWITRQTVIEHFKLTEMVQWAYDNDICEESEIAFEYAQRGDFDSNARAFLNSNMQAKYVKDCAIMCRHYKHAEMKKMSIKQWIKYRGNWKPIVQFLRHQNIEFIPFLSKLKLWLHGPKKNCIAIVGPPDTGKSCFCMSLIKFLGGTVISYVNSCSHFWLQPLTDAKVALLDDATQPCWTYMDTYMRNLLDGNPMSDRKHRALTLIKCPPLLVTSNIDISKEYLHSRVTTFTFPNPFPFDRNGNAVYELSDANWKCFFERLSSSLDIE-DFRCIAKRLDACQDQLLEL [...]
+Seq134 MADDSGTGCSGWFLVDMVDFINL-SNAQALLHAQQTCADAVELCELKRKYISP-SPRLHAIKAKRRLFDSGYGLFKCKDLNAKLCGKFKELFGVGFHDLVRQFKSDKSTCTDWVFGVNPTIAEGFHLLKGQALYLHTQGMVLLALCRYKVAKNRETVVRQLLNVPDNQLMVQPPKLQSSAAALFWFRSGMGNGSGTTPEWIAKQTMLEHFSLTQMVQWAYDNGHTDECEIAYYYAQIADIDANAAAFLKSNNQAKYVRDCAAMCKHYRLAEMRRMSMADWIKHRGDWKPIVKLLRYQHIDIIVFLAALKKWLHGPKKNCICIVGPPDTGKSCFGMSLMHFLQGTIISFVNSCSHFWLQSLVDAKVAMLDDVTSACWAYMDTHMRNLLDGNPTSDRKHKSLAVIKCPPLLLTSNINIKHDYLQSRVTVFEFPNPFPFDSNGNAVYELSDANWNSFFKRLASSLELE-DF--LARRLDLCQEQLLEL [...]
+Seq135 MADSPGTGCSGWFVVDMIDFIDELSNAQALLHVQQTCADAADLCELKRKYISP-SPRLHAIKAKRRLFDSGYGLFKCKDLNAKLYGKFKELYGVGFGDLVRQFKSDKSTCTDWVFGVNPTIAEGFHLLKRQALYLHTQGMVLLALCRYKVGKNRETVVRQLLNVPDNQILVQPPKLQSPPAALFWFRAGMGNGSGTTPEWIAKQTMLEHFSLTDMVQWAYDNGHTDECEIAYYYAQRADVDANAAAFLKSNNQAKYVRDCASMCKHYRLAEMRRMSMAEWIKHRGDWKPIVKLLRYQHIDIIVFLAALKKWLQGPKKNCICIVGPPDTGKSCFGMSLMHFMQGTIISYVNSCSHFWLQSLADAKVAMLDDVTAACWGYMDTHMRNLLDGNPTSDRKHKPLAVIKCPPLLLTSNINITQDYLQSRVQVFEFPNPFPFDSNGNAVYELNDANWNSFFKRLASSLELG-DF--LARRLDLCQEQLLEL [...]
+Seq136 MADKQGTGCSGWFIVDMVDFINDHSSAQALLNAQQADADAAIVQELKRKYMSP-SPRLHAIKAKRRLFDSTNGLFKDKDVTVKLLGKFKELFGVGFNDLVRQFRSDKSTCTDWVFGVNPSISEGFHLLKEHTLYLHTQGMVLLALCRYKVAKNRSTIVRQLLNVPVQQILIQPPKLQSAPAALFWFRSSMGNGSGTTPEWISRQTMLEHFSLTDMVQWAYDNGYTEEYDIAYYYAQRGDIDANAAAFLKSNMQARYVRDCACMCKHYKLAEMKKMSMAEWIKHRGDWKPIVKFLKYQHIDIIAFLGALKKWLHGPKKNCICIIGPPDTGKSCFGMSLMKFLGGTILSYVNASSHFWLQPLVDAKVAMLDDVTAGCWTYMDMHMRNLLDGNPTSDRKHRALTVIKCPPLLLTSNLDISTEYLRSRITTFTFPNTFPFDTNGNAIYELNDENWNSFFKRLASSLELE-DY--LARRLDMCQEQLLEL [...]
+Seq137 MADKQGTGCSGWFIVDMVDFINEQSCAQALLNKQQADADAAIVQELKRKYISP-SPRLHAIKAKRRLFHSNYGLFKDKDVTVKLLGKFKDLFGVGFNDLVRQFKSDKSTCTDWVFGVNPSIAEGFHLLKEQTLYLHTQGMVLLALCRYKVAKNRSTVGRQLLNVPVQQILIQPPKLQSAPAALFWFRAGMGNGSGTTPEWISRQTVLEHFSLTNMVQWAYDNGYTEECDIAYYYAQLGDTDANAAAFLKSNMQARYVRDCACMCKHYKLAEMKKMSMAEWIKHRGDWKPIIRFLRYQHIDIITFLAALKKWLHGPKKNCICIIGPPDTGKSSFGMSLMKFLGGTMLSYVNSSSHFWLQSLVDAKAAMLDDVTAACWNYMDMHMRNLLDGNLTSDRKHKALAVIKCPPLLLTSNMDINTDYLKSRITTFTFPNAFPFDTNGNAIYEFNDENWNPFFKRLASSLELE-DY--LARRLDMCQEQLLEL [...]
+Seq138 MADDTGTGCSGWFSVDLVDFVDNQLKAQALLNRQQAHADKEAVQALKRKLLGSPSPRLGGLGAKRRLFDSGYGLLKCKNLQATLLGKFKELFGLSFGDLVRQFKSDKSSCTDWVFGVHHSIAEGFNLIKAEALYTHIQGMVLLMLIRFKCGKNRTTVSKGMLNIPANQLLIEPPRLQSVAAAIYWFRAGISNASGETPEWIQRQTIVEHFNLTEMVQWAYDNDLTEDSDIAYEYAQRADTDSNAAAFLKSNCQAKYVKDCGIMCRHYKKAQMKRMSMPQWIKHRGDWRPIVKFIRYQGIDFLTFMSAFKKFLHNPKKSCLVLIGPPNTGKSQFGMSLVKFLAGTVISFVNSHSHFWLQPLDSAKIAMLDDATPPCWTYLDTYLRNLLDGNPCSDRKHKALTVVKCPPLIITSNTDIRTEYLYSRISLFEFPNPFPLDKNGNPVYVLNDENWKSFFQRLWSSLEFEDEFRCLAKRLDACQEQLLEL [...]
+Seq139 MADDTGTGCSGWFCVDLVDFVDQ-VHAQALLNKQQAHADQEAVQALKRKLLGSPSPRLGGLGAKRRLFDSGYGLLKCKNLHATLLGKFKELFGVSFGDLVRQFKSDKSSCTDWVFGVNHSIAEGFNLIKADSLYTHIQGMVLLMLIRFKCGKNRTTVSKGLLNIPTNQLLIEPPRLQSVAAAIYWFRSGISNASGDTPEWIQRQTILEHFNLTEMVQWAYDNDITEDSDIAYEYAQRADRDSNAAAFLKSNCQAKYVKDCGVMCRHYKKAQMRRMSMGAWIKHRGDWKPIVKFIRYQQIDFLAFMSAFKKFLHNPKKSCLVLIGPPNTGKSQFGMSLINFLAGTVISFVNSHSHFWLQPLDSAKIAMLDDATPPCWTYLDIYLRNLLDGNPCSDRKHKALTVVKCPPLLITSNTDIRTNYLYSRVSLFEFPNPFPLDTNGNPVYELNDKNWKSFFQRLWSSLEFEDEFRCLAKRLDACQEQLLEL [...]
+Seq140 MADNQGTGCNGWFFVDMVDFIDQ-ENPQALLHAQQLQADVEAVQQLKRKYIGSPSPRLGAIKAKRRLFPPPNGLIHNTNIRVALFGMFKDLYGLSFMDLARPFKSDKTVCTDWVFGIYHGITDGFKLLEPHCLYGHIQGMVLLLLTRFKCGKNRLTVSKCLLNIPETQMLIDPPKLRTPAAALYWYRQGLSNASGTPPEWLARQTVIEYFDLSKMVQWAYDHNYIDDSIIALEYAKLADIDENAAAFLGSNCQAKYVKDCGTMCRHYIRAQKMQMTMSQWIKHRGEWKEIVRFLRYQHVDFISFMIALKQFLQGPKHNCILLYGPPDTGKSNFAMSLISFLGGVVLSYVNSSSHFWLEPLADAKIAMLDDATTQCWNYMDIYMRNALDGNPMCDRKHRAMVQTKCPPLIVTSNINASTDYLHSRVKCFCFPNRFPFDSNGNPVYDLSNKNWKSFFKRSWSRLALDNEFRCLATRLDVCQERLLDL [...]
+Seq141 Seq142
\ No newline at end of file
diff --git a/testData/140.model b/testData/140.model
new file mode 100644
index 0000000..e2676f0
--- /dev/null
+++ b/testData/140.model
@@ -0,0 +1,3 @@
+WAG, p0 = 1-399
+AUTO, p1 = 400-699
+AUTO, p2 = 700-1104
diff --git a/testData/140.tree b/testData/140.tree
new file mode 100644
index 0000000..76fe704
--- /dev/null
+++ b/testData/140.tree
@@ -0,0 +1 @@
+((((((Seq67,(Seq66,Seq65)),(Seq69,Seq68)),(Seq70,Seq71)),Seq72),(Seq63,((((((Seq38,(Seq37,Seq36)),(((Seq45,(Seq47,Seq46)),(Seq44,(Seq43,Seq42))),(Seq35,(Seq41,(Seq39,Seq40))))),((Seq31,(Seq34,(Seq32,Seq33))),(((Seq14,Seq15),(Seq16,((((Seq8,Seq7),Seq6),(Seq5,(Seq4,Seq3))),((Seq11,(Seq9,Seq10)),(Seq13,Seq12))))),((Seq17,(((Seq24,Seq23),Seq25),(Seq18,((Seq22,Seq21),(Seq19,Seq20))))),((Seq26,(Seq28,Seq27)),(Seq30,Seq29)))))),(((Seq49,Seq48),(Seq51,Seq50)),((Seq52,(((Seq55,(Seq53,Seq54)),Seq5 [...]
diff --git a/testData/354.tree b/testData/354.tree
new file mode 100644
index 0000000..29ac922
--- /dev/null
+++ b/testData/354.tree
@@ -0,0 +1 @@
+((((bn_001BGTue,((ac002MorArb,bn_002BGTue),((bf_005BGTue,(bf_002BGTue,st_001BGTue)),st_002BGTue))),((er101AA26384W,(er002MorArb,(er003MorArb,(er108AA26384W,er005MorArb)))),(ol037MorArb,(((si_006MorArb,ol111PRChina),si_003MorArb),(am111TS17259W,(((ja157TS17319W,ja117TS17319W),ja144TS17319W),((((ps209TS16075W,(ps202TS16075W,(ps117TS16020W,(ps120TS16020W,(ps211TS16075W,ps115TS16020W))))),ps121TS16020W),(pa005BGTue,pa006BGTue)),(((fl017MorArb,fl010MorArb),((wa125For29653E,(wa224Kin20879E,(wa [...]
diff --git a/testData/49 b/testData/49
new file mode 100644
index 0000000..d9d2e72
--- /dev/null
+++ b/testData/49
@@ -0,0 +1,50 @@
+49 1200
+Seq1 ATGACCAACATTCGAAAATCACACCCCCTTATCAAAATCGTTAATCACTCATTCATCGATTTACCCACCCCACCTAACATTTCAGCATGATGAAACTTCGGCTCCCTACTAGGAGTCTGCCTAGTCCTACAGATCCTAACCGGCCTTTTCCTAGCCATACACTACACATCAGACACAATAACCGCCTTTTCATCAGTTACTCACATCTGCCGCGACGTCAACTACGGCTGAATTATTCGATACATGCACGCCAACGGAGCCTCTATATTCTTTATCTGCCTATACATGCATGTAGGACGAGGAATATACTACGGCTCCTACACCTTCTCAGAAACATGAAATATTGGAATCATACTACTACTCACAGTCATAGCCACAGCCTTCATAGGATATGTCTTACCATGAGGTCAAATATCTTTCTGAGGAGCAACTGTAATTACCAACCTCCTATCAGCAATTCCTTACATCGGCACTAATCTAGTAGAGT [...]
+Seq2 ATGAAAATTATACGAAAAACACACCCACTCCTAAAAATCATTAACCATGCATTCGTCGACCTCCCTGCACCCTCCAACATCTCATCATGATGAAACTTCGGCTCTCTATTAGGAGTATGCCTAATAATCCAAATCCTCACAGGACTGTTTCTAGCAATACACTACACCTCAGACACTATAACAGCATTCTCATCCGTAACCCACATCTGCCGAGACGTAAATTACGGTTGACTGATTCGATACCTCCATGCAAACGGAGCCTCCATGTTCTTCATGTGCTTATTCATACACGTAGGACGAGGCATCTACTATGGGTCCTACACCTTTATAGAGACATGAAACCTTGGTATTATTCTACTGTTTGCCGTAATAGCAACTGCATTTATAGGATATGTCCTCCCATGGGGGCAAATATCCTTCTGAGGGGCCACAGTCATCACAAACCTACTTTCAGCCATCCCCTACATCGGTACTAACCTAGTAGAAT [...]
+Seq3 ATGAAAATTATACGAAAAACACACCCACTCCTAAAAATCATTAATCACGCATTCGTCGACCTCCCTGCACCCTCTAACATCTCATCATGATGAAACTTCGGCTCCCTATTAGGAGTATGCCTAATAATCCAAATCCTCACAGGACTATTTCTAGCAATACACTACACCTCCGACACTACAACAGCATTCTCATCCGTAACCCACATCTGCCGAGACGTAAACTACGGCTGATTAATTCGATACCTCCATGCAAATGGGGCTTCCATATTCTTCATGTGCTTATTCATACACGTAGGACGAGGCATTTATTATGGGTCTTACACCTTCACAGAGACATGAAACCTTGGTATCATTCTACTGTTTGCCGTAATAGCAACTGCATTTATAGGATATGTCCTTCCATGGGGACAAATATCCTTCTGAGGGGCCACAGTCATTACAAACCTACTCTCAGCCATCCCCTACATCGGCACTGACCTGGTAGAGT [...]
+Seq4 ATGAAAATTATACGAAAAACACACCCACTCATAAAAATTATCAACCACGCATTCATCGATCTCCCTGCACCCTCCAACATCTCATCATGATGAAACTTTGGTTCTCTATTAGGAGTATGCCTAATAGTCCAAATCCTCACAGGCCTATTCTTAGCAATACACTACACCTCCGACACTATAACAGCATTCACATCCGTAACCCACATCTGCCGAGACGTAAACTACGGCTGATTAATTCGATATCTCCATGCAAACGGAGCCTCCATATTCTTCGTATGCTTGTTTATACACGTAGGACGAGGAATCTACTATGGATCTTACACCTTTACAGAAACATGAAATCTTGGTGTTATTCTACTATTTGCCGTAATAGCAACTGCATTTATAGGATATGTACTTCCATGAGGACAAATATCCTTCTGAGGAGCCACAGTCATTACAAACCTTCTCTCAGCTATTCCCTACATCGGTACTAACCTAGTAGAAT [...]
+Seq5 ATGAAAAACATACGAAAAACGCAACCACTCCTAAAAATTATTAACCACGAATTCATTGA-TTTCCTGAAACATCCAA-ATCTCATCATGATGAAACTTTGGCTCTCTACTAGGCATCTGCCTAGTAATCCAGATCCTAACAGGCTTATTCCTAGCAATACACTATACCTCCGACACCACCACAGCATTTTCATCTGTAACCCACATTTGCCGAGACGTAAACTACGGCTGACTGATTCGTTACCTCCATGCAAATGGAGCCTCCATATTCTTCATGTGCCTGTTCATACATGTAGGACGGGGAATCTACTACGGATCTTATACCTTCATAGAAACCTGAAATCTCGGCATTATTCTACTGTTCGCCGTAATAGCAACTGCATTTATAGGATATGTACTCCCATGAGGACAGATATCCTTCTGAGGGGCCACAGTCATTACAAATCTACTCTCAGCTATCCCCTACATCGGAACTAATCTAGTAGAGT [...]
+Seq6 ATGAAAATTATACGAAAAACACACCCACTCCTAAAAATCATTAATCACGCATTCGTCGACCTCCCTGCACCCTCTAACATCTCATCATGATGAAACTTCGGCTCCCTATTAGGAGTATGCCTAATAATCCAAATCCTCACAGGACTATTTCTAGCAATACACTACACCTCCGACACTACAACAGCATTCTCATCCGTAACCCACATCTGCCGAGACGTAAACTACGGCTGATTAATTCGATACCTCCATGCAAATGGGGCTTCCATATTCTTCATGTGCTTATTCATACACGTAGGACGAGGCATTTATTATGGGTCTTACACCTTCACAGAGACATGAAACCTTGGTATCATTCTACTGTTTGCCGTAATAGCAACTGCATTTATAGGATATGTCCTTCCATGGGGACAAATATCCTTCTGAGGGGCCACAGTCATTACAAACCTACTCTCAGCCATCCCCTACATCGGCACTGACCTGGTAGAGT [...]
+Seq7 ATGAAAATTATACGAAAAACACACCCACTCCTAAAAATCATTAATCACGCATTCGTCGACCTCCCTGCACCCTCTAATATCTCATCATGATGAAACTTCGGCTCCCTATTAGGAGTATGCCTAGTAATCCAAATCCTCACAGGATTATTTCTAGCAATACACTACACCTCCGACACTATAACAGCATTCTCATCCGTAACCCACATCTGCCGAGACGTAAACTACGGCTGATTAATTCGATACCTCCATGCAAATGGGGCTTCCATATTCTTCGTGTGCTTATTCATACACGTAGGACGAGGCATTTATTATGGATCTTACACCTTCACAGAGACATGAAACCTTGGTATCATTCTACTGTTTGCCGTAATAGCAACTGCATTTATAGGGTATGTCCTTCCATGGGGACAAATATCCTTCTGAGGGGCCACAGTCATTACAAACCTACTCTCAGCCATCCCCTACATCGGCACTGACCTGGTAGAGT [...]
+Seq8 ATGAAAAACATACGAAAATCACACCCACTACTAAAAATCATTAATCTCGCATTTATTGACCTACCCGCACCATCCAATATCTCATCATGATGAAACTTTGGGTCCCTTCTAGGAGTCTGTCTAGTAGTACAAATTATCACAGGACTATTTCTAGCAATACACTACACCTCTGATACCACAACAGCATTTTCATCTGTAACTCATATCTGCCGAGATGTAAACTACGGTTGATTGATTCGATATCTTCATGCAAACGGAGCCTCAATATTCTTCATGTGCCTATTTATACACGTAGGACGAGGAATCTACTACGGATCCTACACCTTTACAGAAACCTGAAATATTGGCATTATTCTACTGTTCGCCGTAATAGCAACTGCATTTATAGGATATGTGCTTCCCTGAGGACAAATATCCTTCTGAGGAGCCACAGTTATTACTAACCTTCTCTCAGCAGTTCCCTACATCGGTACTAATTTAGTAGAAT [...]
+Seq9 ATGACAAACATCCGAAAAACACACCCCCTACTTAAAATTATTAATAACGCATTCATTGACCTACCAGCCCCATCCAACATTTCATCATGATGAAACTTCGGGTCTTTACTAGGAATCTGCTTAATCATCCAAATCATCACAGGACTTTTCCTAGCCATACATTACACCTCAGACACCTCAACAGCATTCTCATCTGTTACCCATATTTGCCGAGATGTAAACTACGGTTGACTTATTCGCTATCTTCATGCAAACGGAGCCTCCATATTCTTTGTCTGCTTATTTATACATGTTGGACGAGGAATCTATTACGGATCTTATACCTACATAGAAACATGAAATATCGGCATCATTCTACTGTTCGCCGTAATAGCAACTGCATTTATAGGATATGTACTTCCATGAGGACAAATATCTTTTTGAGGGGCCACAGTCATTACCAACCTACTATCAGCTATCCCTTACATTGGCACTAACTTAGTAGAAT [...]
+Seq10 ATGACTAACATTCGAAAAACTCACCCACTGATAAAAATTGTAAACAACGCATTTATCGACCTCCCAGCCCCATCAAACATTTCATCATGATGAAACTTTGGCTCCCTACTAGGCATCTGCCTAATCCTGCAAATCTTAACAGGCCTATTCCTAGCGATACACTACACATCCGACACAACAACAGCATTCTCCTCTGTCGCCCACATTTGCCGAGACGTCAATTATGGCTGAATCATCCGATACATACACGCAAACGGAGCATCAATATTTTTTATCTGCCTATTCATACACGTAGGACGAGGCCTCTACTATGGGTCATATACCTTCCTAGAAACATGAAACGTCGGAGTAATCCTCCTATTTACAACAATAGCCACAGCATTTATAGGCTATGTCCTGCCATGAGGACAAATATCATTCTGAGGAGCAACAGTCATCACCAACCTTCTCTCAGCAATCCCATATATCGGCACAGACCTGGTCGAA [...]
+Seq11 ATGACAAACATCCGAAAAATTCATCCCCTAATAAAAACCATTAACCACTCCTTCATTGATCTCCCCGCACCATCCAACATCTCATCATGATGAAACTTCGGCTCTCTACTAGGAATTTGCTTAATAGTACAAATCATCACAGGTCTATTCTTAGCCATACATTATACATCAGACACAACAACAGCATTTTCATCAGTAACCCATATCTGCCGAGACGTAAATTATGGATGACTAATCCGATATATACATGCAAACGGAGCCTCAATGTTCTTCATCTGCTTATTCCTTCATGTAGGACGAGGAATATACTATGGATCTTATACATTCCTAGAAACATGAAACATCGGAGTGATTTTATTATTTACAGTCATAGCCACTGCATTCATAGGATATGTCCTTCCATGAGGACAAATATCATTCTGAGGGGCCACAGTAATTACAAACTTACTTTCAGCCATCCCATATATTGGCACAATCCTAGTAGAA [...]
+Seq12 ATGACAAACATCCGAAAAACTCATCCCTTAATAAAAATTATTAATCATTCATTCATTGATCTTCCCGCACCATCCAACATCTCATCATGATGAAATTTCGGCTCCCTATTAGGAATCTGCCTAACAGTACAAATTGCCACAGGCCTATTTCTAGCCATACATTATACATCAGATACAACAACAGCATTCTCATCAGTAGCCCACATTTGCCGAGACGTAAATTACGGATGATTTATCCGATATATACATGCAAACGGAGCTTCCATATTTTTCATATGCCTATTCCTCCACGTAGGACGAGGAATATATTACGGATCCTACACATTTCTAGAAACATGAAATATCGGAGTAATTCTTCTATTTGCAGTTATAGCCACTGCATTCATAGGATATGTCCTTCCATGAGGACAAATATCATTTTGAGGAGCTACAGTAATTACAAATCTACTCTCAGCCATCCCATACATTGGCAGTACTTTAGTAGAG [...]
+Seq13 ATGACAAACATCCGAAAAACCCATCCCCTATTCAAAATTATTAATCACTCATTCATTGACCTTCCAGCCCCATCCAACATCTCATCATGATGAAACTTCGGTTCACTTCTTGGAATCTGCTTAATAGTCCAAATTTTAACTGGCTTATTCTTAGCTATACACTACACCTCCGACACACTAACAGCATTCTCATCAGTAACCCACATCTGTCGAGACGTTAATTACGGCTGATTGGTACGATATATACACGCAAATGGAGCCTCAATATTCTTCATCTGCCTATTTATACACGTAGGACGAGGTATATACTACGGATCATACACATTTTTAGAAACATGAAACATCGGGGTAATTCTTTTATTTACAGTAATAGCTACCGCATTTATAGGTTATGTACTCCCATGAGGACAAATATCATTTTGAGGGGCAACAGTAATTACAAACTTACTATCTGCCATTCCATACATTGGAACTACTTTAGTAGAA [...]
+Seq14 ATGATCAACATCCGAAAAACTCATCCATTAGTTAAAATTATCAACAACTCATTCATTGACCTTCCAACACCATCAAACATTTCAACATGATGGAACTTTGGGTCCCTGTTAGGAGTGTGTCTGATCTTGCAAATCTTAACAGGCTTATTTCTAGCCATACACTATACATCAGATACAGCTACAGCCTTTTCATCAGTCGCACACATTTGTCGAGACGTCAACTATGGGTGATTTATCCGATATATACATGCCAATGGGGCCTCTATATTTTTTATCTGCCTATTTATACACGTAGGGCGAGGCTTATACTATGGATCATACCTATTTCCAGAGACATGGAATATCGGAATTATTCTCCTACTTACAATTATAGCCACCGCATTTATAGGATACGTCCTACCCTGAGGCCAAATGTCCTTCTGAGGAGCGACTGTCATCACCAACCTACTATCGGCCATTCCCTACATCGGAACGAACCTAGTAGAA [...]
+Seq15 ATGAAAATTTTACGGAAAAATCACCCGCTACTTAAAATTGTTAATCATTCATTTATTGACCTCCCAACCCCATCTAACATCTCATCTTGATGGAATTTCGGGTCACTACTCGGTGTGTGCCTAGTAATCCAAATTCTGACCGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCCTCAGTTGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATTCGCTACCTTCACGCTAACGGAGCCTCCATATTCTTTATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTACGGCTCCTATGTCCTCTCAGAAACCTGAAACATCGGTATCATCCTGTTCCTTACAACTATAGCAACAGCATTCGTAGGGTATGTTCTACCGTGGGGACAAATATCCTTCTGAGGAGCTACCGTAATCACAAACCTCCTCTCAGCAATCCCATACATCGGAAGCACCCTTGTTGAA [...]
+Seq16 ATGAAAATTTTACGGAAAAACCACCCGCTACTTAAAATTGTTAATCACTCATTTATTGACCTCCCAACCCCATCCAACATCTCATCTTGATGAAATTTTGGGTCACTACTCGGTGTATGCCTAATAATTCAAATTCTGACTGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTTGCCCACATTTGCCGAGACGTGAACTACGGATGATTAATCCGCTACCTCCACGCCAACGGAGCCTCCATATTCTTTATCTGCCTTTTTATCCACGTAGGTCGAGGAATCTACTACGGCTCCTATGTCCTCTCAGAAACCTGAAACATCGGCATCATCCTATTCCTTACAACTATGGCAACAGCATTCGTAGGGTATGTACTACCATGAGGACAAATATCTTTCTGAGGGGCTACTGTAATCACAAATCTCCTCTCAGCAATCCCCTACATCGGAAGCACCCTTGTTGAA [...]
+Seq17 ATGAAAATTTTACGGAAAAATCACCCACTACTTAAAATTGTTAATCACTCATTCATTGACCTACCAACCCCATCTAGCATCTCGTCTTGATGGAATTTTGGGTCACTACTTGGTGTGTGTCTGATAATTCAAATTCTGACCGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCTCACATCTGTCGAGATGTAAACTACGGATGATTAATCCGCTACCTACATGCTAACGGGGCTTCCATATTCTTTATCTGTCTCTTCATCCACGTAGGCCGAGGGATCTATTACGGTTCCTATGTCCTCTCAGAAACTTGAAACATCGGTATCATTCTATTTCTTACAACTATAGCAACAGCATTCGTAGGCTATGTGTTACCATGAGGACAAATATCTTTCTGAGGGGCCACTGTAATCACAAATCTCCTCTCAGCAATCCCCTACATCGGAAGCACCCTTGTTGAA [...]
+Seq18 ATGAAAATTTTACGTAAAAATCACCCACTACTCAAAATTATAAATCACTCATTCATTGATCTGCCAGCTCCATCTAACATCTCATCCTGATGGAACTTTGGATCCCTACTTGGTACATGCCTAGTAATCCAAATCCTAACAGGCCTATTCTTAGCTATACACTACACATCAGACACAACCACAGCATTCTCCTCAGTAGCTCATATCTGCCGAGACGTAAACTACGGATGATTAATCCGCTACTTACACGCTAATGGAGCCTCCATATTCTTCATCTGCCTCTTCATCCACGTAGGCCGAGGAATCTACTACGGCTCCTACGTCCTTTCAAAAACTTGAAATATCGGCATTATCTTATTCCTCACAACTATAGCAACAGCATTTGTGGGGTACGTACTTCCATGAGGACAAATATCCTTCTGAGGGGCCACTGTAATTACAAACCTCCTCTCAGCCATCCCCTACATCGGAAGCACCCTAGTTGAA [...]
+Seq19 ATGAAAATTTTACGGAAAAATCACCCGCTACTCAAAATTGTTAATCACTCATTCATTGACCTACCAACTCCATCTAACATCTCATCCTGATGAAATTTTGGATCCCTACTAGGCATATGCCTAATAATCCAAATTTTAACAGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCCTCAGTAGCACATATCTGCCGAGATGTAAACTACGGATGATTAATCCGCTACTTGCACGCTAATGGAGCCTCCATATTCTTTATCTGCCTCTTCATCCACGTAGGCCGAGGTATTTACTATGGTTCCTATACCCTCTCAGAAACCTGAAACATTGGCATCATCTTATTCCTCACAACTATAGCAACAGCATTTGTAGGATATGTACTCCCATGAGGACAAATATCCTTCTGGGGTGCCACCGTAATCACAAACCTCCTCTCAGCTATTCCCTACATCGGAAACACCCTAGTTGAA [...]
+Seq20 ATGAAAATCTTACGGAAAAATCATCCACTGCTTAAAATTGTTAATCACTCATTCATTGATCTACCAACTCCATCCAACATCTCATCCTGATGGAATTTTGGATCTCTTCTAGGAATATGCTTAGTAATCCAAATTCTAACAGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCCTCAGTAGCCCATATCTGCCGAGATGTAAACTACGGATGGCTGATCCGCTACTTACACGCCAACGGGGCCTCCATATTCTTTATCTGCCTTTTCATCCACGTAGGCCGAGGGATTTACTACGGCTCCTACGTCCTCTCAGAAACCTGAAACATCGGCATCATCTTACTTCTCACAACCATAGCAACAGCATTTGTGGGATACGTACTCCCATGAGGACAAATATCTTTTTGAGGGGCTACCGTAATCACAAACCTTCTTTCAGCCATCCCATACATCGGAAACACCCTAGTTGAA [...]
+Seq21 ATGAAAATTTTACGAAAAAATCACCCATTATTCAAAATTATTAATCACTCATTCATTGACCTACCAACCCCATCCAATATCTCATCCTGATGGAACTTTGGGTCTCTACTCGGTATGTGCTTAATAATCCAAATTCTAACTGGCTTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATCTGCCGAGACGTGAACTACGGATGACTAATCCGCTACTTACACGCTAATGGAGCCTCTATATTCTTCATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTATGGTTCCTATGTCCTCTCAGAAACTTGAAACATTGGCATTATCTTATTCCTCACAACTATAGCTACAGCGTTCGTGGGGTATGTACTTCCATGGGGACAAATATCCTTCTGAGGAGCCACCGTAATTACAAATCTCCTCTCAGCAATCCCCTACATCGGAAGCACATTAGTTGAA [...]
+Seq22 ATGAAAATCTTACGAAAAAATCATCCACTACTCAAAATTATTAATCATTCATTTATTGACCTACCAGCCCCATCTAACATCTCATCCTGATGGAACTTTGGGTCTCTACTCGGTGTATGCCTAATAATCCAAATTCTAACCGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCATATCTGTCGAGACGTAAATTACGGGTGATTAATCCGTTACCTACACGCTAATGGTGCCTCTATATTCTTCATCTGCCTTTTCATTCATGTAGGTCGAGGGATCTACTATGGCTCTTATGTACTCTCAGAAACTTGAAATATCGGCATTATCCTATTCCTCACAACTATAGCCACAGCATTCGTAGGGTATGTTCTTCCATGAGGACAAATATCTTTCTGAGGGGCCACTGTAATCACAAATCTCCTCTCAGCAATCCCCTACATCGGAAACACCCTAGTTGAA [...]
+Seq23 ATGAAAGTCTTACGAAAAAATCACCCACTACTCAAAATTGTTAATCACTCATTTATCGATCTACCAACCCCATCTAACATCTCATCCTGATGGAATTTCGGGTCCCTACTAGGCACATGCCTAGTAATCCAAATTCTAACAGGCCTATTCCTAGCCATACACTACACGTCAGATACAACCACAGCATTCTCCTCAGTAGCCCACATCTGCCGAGATGTAAACTACGGATGATTAATCCGCTACTTACACGCTAACGGAGCCTCTATATTCTTTATCTGCCTCTTCATCCATGTAGGCCGAGGGATTTACTACGGCTCCTACATCCTCTCAGAAACCTGAAACATTGGCATCATCTTGTTTCTCACAACTATAGCAACAGCATTTGTAGGGTATGTACTTCCATGAGGACAAATATCTTTCTGAGGGGCCACTGTAATCACAAATCTCCTTTCAGCTATCCCCTACATTGGAAACACCTTAGTTGAA [...]
+Seq24 ATGAAAATTTTACGTAAAACTCACCCACTACTTAAAATTGTTAACCACTCATTCATTGACCTACCCACCCCATCTAACATCTCATCCTGATGAAACTTTGGATCCCTACTAGGCATGTGCCTAGTAATTCAAATTCTAACAGGCCTATTCCTAGCTATACACTACACATCAGACACAGCCACAGCATTTTCTTCAGTTGCCCACATCTGTCGAGATGTAAATTACGGATGATTAATCCGTTATCTACACGCCAACGGAGCTTCCATATTCTTCATCTGCCTTTTCATTCATGTAGGACGAGGAATCTACTATGGCTCCTATGTCCTTTCAGAAACCTGAAACATTGGAATTATCCTACTGCTAACTACTATAGCAACAGCATTTGTAGGATATGTTCTACCATGGGGACAAATATCATTCTGAGGCGCTACCGTAATCACAAACCTTCTCTCAGCAATTCCTTACATCGGAAATACCCTAGTTGAA [...]
+Seq25 ATGAAAATCTTACGAAAAAATCACCCACTACTCAAAATTATTAATCACTCATTCATTGATCTTCCAACTCCATCTAACATCTCATCCTGATGGAATTTCGGATCCCTACTAGGCATATGCCTAATGATCCAAATTCTAACAGGCCTATTCCTAGCCATACACTATACATCAGACACAACCACAGCATTCTCCTCAGTAGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATCCGCTATCTACACGCTAACGGAGCCTCCATATTCTTTATTTGTCTCTTCATCCATGTAGGCCGAGGTATTTACTACGGCTCCTATGCCCTCTCAGAAACCTGAAACATCGGCATCATCTTATTTCTCATAACTATAGCAACAGCATTTGTAGGATATGTACTCCCATGAGGACAAATATCCTTCTGAGGGGCTACTGTAATCACAAATCTCCTTTCAGCTATCCCCTACATCGGAAGCACCTTAGTTGAA [...]
+Seq26 ATGAAAATCTTACGGAAAAATCACCCACTACTCAAAATTGTTAATCACTCATTTATTGATCTACCAACTCCATCTAACATCTCATCCTGATGAAATTTTGGGTCTTTGCTAGGTATATGCCTAGTAATCCAAATTCTAACAGGCCTATTCCTAGCCATGCACTACACATCAGACACAGCCACAGCATTCTCTTCAGTAGCCCATATCTGCCGAGACGTAAACTATGGTTGACTAATCCGCTACCTACACGCTAATGGAGCCTCTATATTCTTTATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTACGGCTCCTATGTCCTCTCAGAAACTTGAAACATCGGCATCATCTTACTTCTCACAACCATAGCAACAGCATTCGTAGGATACGTACTTCCATGAGGACAAATATCCTTTTGAGGGGCTACTGTAATCACAAATCTCCTTTCAGCCATCCCCTACATTGGAAGCACCCTAGTCGAA [...]
+Seq27 ATGACAATTATACGAAAAACCCACCCGCTACTTAAAATTATTAACCACTCATTTATTGATCTCCCTACCCCCTCCAACATTTCATCTTGATGGAACTTTGGCTCACTTTTAGGTATTTGCCTAATCATTCAAATTTTAACTGGCCTCTTCCTGGCCATACACTACACATCCGACACAGCCACAGCATTCTCCTCCGTCACCCACATCTGCCGAGACGTAAACTATGGCTGGCTCATCCGTTATATACACGCCAACGGAGCATCCATATTTTTTATTTGCCTATTCATTCACGTAGGACGAGGAATCTACTACGGCTCCTACATGCTCTCAGAAACCTGAAACATTGGCATCATCCTACTCCTAACCACAATAGCCACAGCATTCGTAGGCTATGTTCTCCCATGAGGGCAAATATCCTTCTGAGGCGCCACAGTAATCACAAATTTACTATCAGCAATCCCCTATATCGGAACAACTCTAGTTGAA [...]
+Seq28 ATGAAAAT-TTACGAAAAAATCACCCATTATTCAAAATTATTAACCACTCATTCATTGACCTGCCAACCCCATCCAATATCTCATCCTGATGGAACTTTGGATCTCTACTCGGTATGTGCTTAATAATCCAAATTCTAACTGGCTTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATCTGCCGAGATGTAAACTACGGATGACTAATCCGCTACTTACACGCTAATGGAGCCTCTATATTCTTCATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTATGGTTCCTATGTCCTCTCAGAAACTTGAAACATCGGCATTATCTTATTCCTCACAACTATAGCTACAGCATTCGTAGGGTATGTACTTCCATGGGGACAAATATCCTTCTGAGGAGCCACCGTAATTACAAACCTCCTCTCAGCAATCCCCTACATCGGAAGCACATTAGTTGAA [...]
+Seq29 ATGAAAATCTTACGGAAAAATCACCCGCTACTTAAAATTGTTAATCACTCATTTATTGATCTACCAACTCCATCCAACATCTCATCTTGATGGAACTTTGGGTCACTACTTGGTTTATGCCTAATAATCCAAATTCTGACCGGCCTATTCCTAGCTATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATCCGCTATCTACACGCTAACGGAGCTTCTATATTCTTTATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTACGGCTCCTATGTCCTCTCAGAAACCTGAAACATCGGTATCATTCTATTCCTTACAACCATAGCAACAGCATTCGTAGGATATGTACTACCATGAGGACAAATATCTTTCTGAGGGGCTACTGTAATTACAAACCTCCTTTCAGCAATCCCCTACATCGGAAACACCCTTGTGGAA [...]
+Seq30 ATGAAAATTATACGAAAGAATCACCCCCTACTTAAAATTATTAACCACTCATTCATCGACCTACCAACCCCGTCCAACATCTCATCATGATGAAACTTTGGGTCCCTACTAGGTGCCTGCCTAATTATCCAAATCTTAACGGGCCTCTTTCTAGCCATACACTACACTTCAGATACAACCACAGCATTCTCCTCAGTAGCCCACATTTGCCGAGACGTAAATTACGGGTGATTAATTCGCTATCTACACGCCAACGGAGCCTCCATATTCTTCATCTGCCTATCCATCCACGCCGGCCGAGGAATTTACTACGGCTCCTACGTCCTTTCAGAAACCTGAAACATCGGTATCATCTTATTCCTTACAACCATAGCAACAGCATTTGTAGGTTATGTGCTTCCATGAGGACAAATATCCTTCTGAGGCGCTACCGTAATCACTAACCTTCTCTCAGCAATCCCCTACATCGGAAGCACTCTATTTGAA [...]
+Seq31 ATGACAATCATACGAAAAAACCACCCTTTACTTAAAATCATTAATCACTCGTTTATTGACCTGCCCACCCCTTCCAACATTTCATCCTGATGGAACTTCGGCTCACTCCTTGGCATTTGCTTAATAATTCAAATTTTAACTGGCCTCTTCCTAGCCATACATTATACGTCCGATACAGCTACAGCATTTTCCTCCGTCACCCATATCTGCCGAGACGTAAATTACGGATGACTTATCCGCTACTTACATGCCAATGGGGCATCTATATTTTTTATCTGCCTATTTATTCATGTAGGACGAGGTATCTACTACGGCTCCTACATACTTTCAGAAACATGAAACATCGGAATTATCCTATTCCTAACCACAATAGCCACAGCATTTGTAGGCTATGTTCTTCCATGGGGACAGATATCTTTCTGAGGGGCCACAGTAATTACAAACTTACTCTCAGCAATTCCCTATATTGGAACCTCTCTAGTTGAA [...]
+Seq32 ATGAAAATTCTACGGAAAAATCACCCACTACTTAAAATTGTTAATCACTCATTCATTGACCTACCAACCCCATCCAATATCTCATCCTGATGGAATTTTGGATCGCTACTCGGCGTATGCCTGATAATCCAAATTCTAACCGGTCTATTTCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATCCGCTATCTACACGCTAATGGAGCCTCCATATTCTTCATCTGTCTTTTTATTCATGTAGGTCGAGGAATCTACTACGGCTCTTATGTCCTCTCAGAAACCTGAAACATCGGCATCATTTTATTCCTCACAACTATAGCAACAGCATTCGTAGGATATGTATTACCATGAGGACAAATATCTTTCTGAGGAGCTACTGTAATCACAAATCTCCTTTCAGCGATTCCCTACATCGGAAGCACCCTTGTCGAA [...]
+Seq33 ATGAAAATTTTACGGAAAAACCACCCACTACTCAAAATTATTAATCACTCATTTATTGACCTACCAACTCCATCTAACATCTCATCCTGGTGAAATTTTGGATCCCTACTAGGCATATGCCTAGTAATCCAAATTCTAACAGGCCTATTCCTAGCCATACACTATACATCAGACACAACCACAGCATTCTCCTCAGTAGCCCACATCTGCCGAGATGTAAATTACGGATGATTAATCCGCTATCTACACGCCAATGGAGCTTCTATATTCTTTATCTGCCTCTTCATCCATGTAGGCCGAGGTATTTACTACGGCTCCTATGTCCTCTCAGAAACCTGAAACATCGGCATCATCTTATTCCTCACAACTATAGCAACAGCATTCGTAGGATATGTACTACCATGAGGACAAATGTCTTTCTGAGGAGCCACTGTAATTACAAATCTCCTTTCAGCCATTCCCTACATCGGAAGCACCCTAGTTGAA [...]
+Seq34 ATGAAAATTTTACGAAAAAATCACCCCCTACTCAAAATTATTAATCACTCGTTCATCGACTTACCAACCCCATCCAACATCTCATCCTGATGAAATTTTGGATCCCTACTTGCCCTATGCCTAGCCATCCAAATCCTCACAGGCCTATTTCTAGCCATACATTACACATCAGACACAACCACAGCATTCTCCTCAGTAGCCCACATCTGTCGAGATGTAAATTACGGATGATTAATCCGCTATCTACATGCTAACGGAGCCTCCATATTCTTCATCTGCCTTTTCATCCACGTGGGCCGAGGGATTTATTACGGCTCATATATCCTCTCAGAAACCTGAAACATCGGTATCATTCTATTCCTTACAACTATAGCAACTGCATTCGTAGGATATGTCCTCCCATGGGGACAAATATCTTTCTGAGGAGCCACTGTAATTACTAATCTCCTCTCAGCTATTCCTTACATCGGAAATACCCTAGTAGAA [...]
+Seq35 ATGAAAATTTTACGGAAAAATCACCCGCTACTTAAAATTGTAAACCACTCATTTATTGACCTACCAACCCCATCTAATATTTCATCCTGATGAAATTTTGGGTCCCTACTCGGCGTATGCTTAATTATTCAAATCCTAACCGGTTTATTCCTAGCCATACACTATACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATCCGCTATCTACACGCTAACGGAGCCTCCATATTCTTCATCTGTCTTTTCATTCACGTGGGTCGAGGAATCTATTATGGTTCCTATATCCTCTCAGAAACCTGAAACATCGGCATCATTCTATTCCTTACAACTATAGCAACAGCATTTGTAGGATATGTACTACCATGAGGACAGATATCCTTTTGAGGAGCTACCGTAATCACGAATCTTCTATCAGCAATTCCCTACATCGGAAACACCCTTGTTGAA [...]
+Seq36 ATGAAAATTTTACGGAAGAATCACCCGCTACTCAAAATTGTTAATCATTCATTTATCGACCTTCCAACTCCATCGAACATCTCATCCTGATGAAATTTTGGATCCCTACTAGGCATATGCCTAATAATCCAAATTCTAACAGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCCTCAGTAGCCCATATCTGCCGAGATGTAAATTACGGATGATTAATCCGCTACTTGCACGCTAATGGAGCCTCCATATTCTTTATCTGCCTCTTCATCCACGTAGGCCGAGGTATTTACTATGGTTCCTATGTCCTCTCAGAAACCTGAAACATCGGCATCATCTTATTCCTCACAACTATAGCAACAGCATTCGTGGGGTATGTACTCCCATGAGGACAAATATCCTTCTGAGGTGCCACCGTAATCACAAACCTCCTCTCAGCCATCCCCTACATCGGAAACACCCTAGTTGAA [...]
+Seq37 ATGAAAATTTTACGGAAAAACCACCCACTACTCAAAATTATTAATCACTCATTCATTGACTTACCAACTCCATCTAACATCTCATCCTGATGAAATTTCGGATCCCTACTAGGCATATGCTTAGTGATCCAAATTCTAACAGGCCTGTTCCTAGCCATACACTATACATCCGACACAACTACAGCATTCTCCTCAGTAGCCCATATCTGCCGAGATGTAAACTACGGATGACTAATCCGCTACTTACACGCTAACGGAGCCTCTATATTCTTCATCTGCCTCTTCATCCATGTAGGCCGAGGTATTTACTACGGCTCCTATGTCCTCTCAGAAACTTGAAACATCGGCATCATCTTATTCCTCACAACTATAGCAACAGCATTCGTAGGATATGTATTACCATGAGGACAAATGTCCTTCTGAGGGGCCACTGTAATCACAAACCTCCTTTCAGCCATCCCATACATCGGAACCACCCTAGTTGAA [...]
+Seq38 ATGAAAATTTTACGAAAAAATCACCCATTACTCAAAATTATTAATCACTCATTCATTGACCTACCAACCCCATCCAATATCTCATCCTGATGGAACTTTGGGTCTCTACTCGGTATATGCTTAATAATTCAAATTCTAACTGGCTTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATCTGCCGAGATGTAAACTACGGATGACTAATCCGCTACTTACACGCTAATGGAGCCTCTATATTCTTCATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTATGGTTCCTATGTCCTCTCAGAAACTTGAAACATCGGCATTATCTTATTCCTCACAACTATAGCTACAGCGTTCGTGGGGTATGTACTTCCATGAGGACAAATATCCTTCTGAGGAGCCACCGTAATTACAAATCTCCTCTCAGCAATCCCCTACATCGGAAGCACACTAGTCGAA [...]
+Seq39 ATGAAAATTTTACGGAAAAATCACCCGCTACTTAAAATTGTAAATCACTCATTCATTGACTTACCAACCCCATCCAACATCTCATCTTGATGAAACTTTGGGTCACTACTCGGTGTATGCCTAATAATCCAAATTCTGACCGGCCTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATTTGCCGAGATGTAAACTACGGATGATTAATCCGCTATCTACACGCTAACGGAGCTTCCATATTCTTTATCTGCCTTTTCATCCATGTAGGCCGAGGAATCTATTACGGCTCCTATGTCCTCTCAGAAACCTGAAACGTCGGTATCATCCTATTCCTCACAACTATAGCAACAGCATTCGTAGGGTACGTGTTACCATGAGGACAAATATCTTTCTGAGGAGCTACCGTAATTACAAACCTCCTCTCAGCAATCCCCTACATCGGAAGCACCCTCGTCGAA [...]
+Seq40 ATGAAAATTTTACGAAAAAATCACCCATTACTCAAAATTATTAATCACTCATTCATTGACCTGCCAACCCCATCCAATATCTCATCCTGATGGAACTTTGGGTCTCTACTCGGTATGTGCTTAATAATTCAAATTCTAACTGGCTTATTCCTAGCCATACACTACACATCAGACACAACCACAGCATTCTCTTCAGTAGCCCACATCTGCCGAGATGTAAACTACGGATGACTAATCCGCTACTTACACGCTAATGGAGCCTCTATATTCTTCATCTGCCTTTTCATCCACGTAGGCCGAGGAATCTACTATGGTTCCTATGTCCTCTCAGAAACTTGAAACATCGGCATTATCTTATTCCTCACAACTATAGCTACAGCGTTCGTGGGGTATGTACTTCCATGAGGACAAATATCCTTCTGAGGAGCCACCGTAATTACAAATCTCCTCTCAGCAATCCCCTACATCGGAAGCACACTAGTCGAA [...]
+Seq41 ATGACAATCATACGAAAAAACCACCCTTTACTTAAAATCATTAATCACTCGTTTATTGACCTGCCCACCCCTTCCAACATTTCATCCTGATGGAACTTCGGCTCACTCCTTGGCATTTGCTTAATAATTCAAATTTTAACTGGCCTCTTCCTAGCCATACATTATACGTCCGATACAGCTACAGCATTCTCCTCCGTCACCCATATCTGCCGAGACGTAAATTACGGATGACTTATCCGCTACTTACATGCCAATGGGGCATCTATATTTTTTATCTGCCTATTTATTCATGTAGGACGAGGTATCTACTACGGCTCCTACATACTCTCAGAAACCTGAAACATCGGAATTATCCTATTCCTAACCACAATAGCCACAGCATTCGTAGGCTATGTTCTTCCATGAGGACAGATATCTTTCTGAGGAGCCACAGTAATTACAAACTTGCTCTCAGCAATTCCTTATATTGGAACCTCTCTAGTTGAA [...]
+Seq42 ATGACCAACATCCGAAAGACCCACCCACTAATAAAAATTATTAACAATGCATTCATTGACCTCCCTGCCCCATCAAACATCTCATCGTGATGAAATTTTGGCTCCCTTCTAGGCATCTGCCTAATCCTACAGATCCTAACAGGACTATTTCTAGCAATACACTACACATCTGATACAACAACAGCATTCTCCTCTGTCACCCACATTTGCCGAGACGTCAACTATGGCTGAATCATCCGATATATACACGCAAACGGAGCCTCAATATTTTTTATCTGCCTATTCCTACATGTAGGACGAGGCCTATATTACGGATCCTACGCCTTCCTAGAAACATGAAACGTCGGAGTAATCCTTTTATTCGCAACAATGGCCACAGCATTTATGGGCTACGTTCTGCCATGAGGACAAATATCATTCTGAGGGGCAACAGTCATCACTAATCTCCTCTCAGCAATCCCATATATTGGCACAGACCTAGTAGAA [...]
+Seq43 ATGACCAACATACGAAAAACTCATCCTTTAATAAAAATCATTAACGAGTCTTTCATTGACCTTCCTACCCCATCTAACATCTCTGCATGATGGAACTTCGGCTCTCTTTTAGGATTATGCCTTGTAATTCAGATTCTCACAGGACTTTTCCTAGCCATACATTACACCTCCGACACCACAACAGCCTTCTCATCAGTCACCCATATCTGCCGAGACGTAAACTACGGATGATTAATCCGATACATACATGCTAACGGAGCTTCAATATTCTTCATCTGCCTCTTCCTCCACGTAGGACGAGGCCTGTACTATGGGTCATACACTTTCATTGAAACCTGAAACATTGGAGTACTATTATTATTCACTGTAATAGCAACAGCCTTTATAGGCTACGTCTTACCATGAGGACAAATATCCTTCTGAGGAGCCACAGTAATTACAAACCTCCTATCCGCTATCCCCTATATCGGCACAACCCTGGTAGAA [...]
+Seq44 ATGACAACCCCCCGCAAAACACATCCACTAGCAAAAATCATTAACAACTCATTCATTGATCTCCCCACACCATCCAACATCTCCGCCTGATGAAATTTCGGCTCACTCCTAGGTATTTGCCTGATTATCCAAATTACTACAGGTCTATTCTTAGCCATACACTACACACCAGACACCTCAACTGCCTTCTCCTCAGTCGCCCACATCACCCGAGACGTCAACTACGGCTGAATAATCCGCTACCTACACGCCAATGGCGCCTCCATATTCTTCATCTGCCTCTTCCTCCACATTGGCCGAGGCCTATACTATGGATCATTCCTTTTTCTGAAGACCTGAAACGTCGGTATTATCCTCCTACTCACAACCATAGCCACAGCATTCATAGGCTATGTCCTCCCATGGGGCCAAATATCATTCTGAGGGGCCACAGTAATTACAAACCTTCTATCAGCCATCCCATACATCGGATCTGACCTCGTACAA [...]
+Seq45 ATGACTACCCCCCGCAAGACACATCCACTAACAAAAATCATTAACAACTCATTCATTGATCTCCCCACACCATCCAACATTTCCGCCTGATGAAATTTCGGCTCACTCCTAGGTATTTGCCTAATTATCCAAATCACTACAGGTCTATTCCTAGCCATACATTATACACCAGACACTTCAACTGCCTTCTCCTCGGTCGCCCACATCACCCGAGACGTCAACTACGGCTGAATAATCCGCTACCTACACGCCAACGGCGCTTCCATATTCTTCATCTGCCTATTCCTCCACATTGGCCGAGGCTTATATTACGGGTCATTCCTTTTTCTGAAGACCTGAAACGTCGGTATTATCCTCCTACTCACAACCATAGCCACAGCATTCATAGGCTACGTCCTCCCATGAGGCCAAATATCATTCTGAGGGGCCACAGTAATTACAAACCTTCTATCAGCCATCCCATACATCGGATCTGACCTCGTACAA [...]
+Seq46 ATGACTACCCCCCGCAAAACTCACCCACTAGCAAAAATCATCAACAATTCATTCATTGACCTCCCTACACCATCCAACATCTCCGCCTGGTGAAATTTCGGCTCACTCCTAGGTATTTGCCTAATTATTCAAATCACTACAGGTCTATTCTTAGCCATACACTATACACCAGACACTTCAACCGCCTTCTCCTCAGTCGCCCACATCACCCGAGACGTCAACTATGGCTGAATAATCCGCTACCTACATGCCAACGGCGCCTCCATATTCTTTATCTGCCTCTTTCTCCACATTGGCCGAGGCTTATATTACGGATCATTCCTTTTTCTGGAGACCTGAAACGTCGGTATTATCCTCCTACTCACAACCATAGCCACAGCATTCATAGGCTATGTCCTCCCATGAGGCCAAATATCATTCTGAGGGGCCACAGTAATTACAAACCTTCTGTCAGCCATTCCATATATCGGGTCTGACCTCGTACAA [...]
+Seq47 ATGACTACTCCCCGCAAAACACATCCACTAGCAAAAATCATCAACAACTCATTCATTGATCTCCCTACACCATCCAACATCTCCGCCTGATGAAATTTCGGCTCACTCCTAGGTATTTGCCTAATTATTCAAATCACTACAGGTCTATTCTTAGCCATACACTATACACCAGACACTTCAACTGCCTTCTCCTCAGTCGCCCACATCACCCGAGACGTCAACTACGGCTGAATAATCCGCTACCTACACGCCAATGGCGCCTCCATATTCTTCATCTGCCTCTTCCTTCACATTGGCCGAGGCCTATATTATGGATCATTCCTTTTTCTGAAGACCTGAAACGTCGGTATTATCCTTCTACTCACAACTATAGCCACAGCATTCATAGGCTATGTCCTCCCATGAGGCCAAATATCATTCTGAGGGGCCACAGTAATTACAAACCTCCTATCAGCCATCCCATACATCGGACCTGACCTCGTACAA [...]
+Seq48 ATGACTACCCCCCGCAAAACTCACCCACTAGCAAAAATCATCAACAATTCATTCATTGACCTCCCTACACCATCCAACATCTCCGCCTGGTGAAATTTCGGCTCACTCCTAGGTATTTGCCTAATTATTCAAATCACTACAGGTCTATTCTTAGCCATACACTATACACCAGACACTTCAACCGCCTTCTCCTCAGTCGCCCACATCACCCGAGACGTCAACTATGGCTGAATAATCCGCTATCTACATGCCAACGGCGCCTCCATATTCTTTATCTGCCTCTTTCTCCACATTGGCCGAGGCTTATATTACGGATCATTCCTTTTTCTGGAGACCTGGAACGTCGGTATTATCCTCCTACTCACAACCATAGCCACAGCATTCATAGGCTATGTCCTCCCATGAGGCCAAATATCATTCTGAGGGGCCACAGTAATTACAAACCTTCTGTCAGCCATTCCATATATCGGGTCTGACCTCGTACAA [...]
+Seq49 ATGACTACCCCCCGCAAAACTCACCCACTAGCAAAAATCATCAACAATTCATTCATTGACCTCCCTACACCATCCAACATCTCCGCCTGATGAAATTTCGGCTCACTCCTAGGCATTTGCCTCATTATTCAAATTACTACAGGCCTATTCTTAGCCATACACTATACACCAGATACTTCAACCGCCTTCTCTTCAGTCGCTCACATCACCCGAGACGTCAACTATGGCTGAATAATCCGCTACCTACACGCCAATGGCGCCTCCATATTCTTTATCTGTCTCTTTCTCCACATTGGCCGAGGCTTATATTACGGATCATTCCTTTTTCTGGAGACCTGAAACGTCGGTATTATCCTCCTACTCACAACCATAGCCACAGCATTCATAGGCTATGTCCTCCCATGAGGCCAAATATCATTCTGAGGCGCCACAGTAATTACAAACCTTCTGTCAGCCATCCCATATATCGGATCTGACCTTGTACAA [...]
diff --git a/testData/49.model b/testData/49.model
new file mode 100644
index 0000000..c586c71
--- /dev/null
+++ b/testData/49.model
@@ -0,0 +1,4 @@
+DNA, gene1 = 1-300
+DNA, gene2 = 301-900
+DNA, gene3 = 901-1100
+DNA, gene4 = 1101-1200
diff --git a/testData/49.tree b/testData/49.tree
new file mode 100644
index 0000000..ce35154
--- /dev/null
+++ b/testData/49.tree
@@ -0,0 +1 @@
+(Seq14,(((Seq10,Seq42),((Seq47,(Seq44,Seq45)),(Seq49,(Seq48,Seq46)))),(((((((Seq2,((Seq6,Seq3),Seq7)),Seq4),Seq5),(Seq9,Seq8)),(Seq13,(Seq11,Seq12))),((Seq27,(Seq31,Seq41)),((((Seq22,((Seq38,Seq40),(Seq28,Seq21))),(((Seq34,Seq30),((Seq20,Seq26),(((Seq19,Seq36),(Seq37,Seq33)),(Seq25,Seq23)))),(Seq17,(Seq29,(((Seq16,Seq15),Seq39),(Seq35,Seq32)))))),Seq18),Seq24))),Seq43)),Seq1);
diff --git a/versionHeader/version.h b/versionHeader/version.h
new file mode 100644
index 0000000..b5973b3
--- /dev/null
+++ b/versionHeader/version.h
@@ -0,0 +1,4 @@
+#define programName "ExaML"
+#define programVersion "3.0.18"
+#define programVersionInt 3018
+#define programDate "February 14 2017"
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/examl.git
More information about the debian-med-commit
mailing list