[med-svn] [eagle] 02/03: Imported Upstream version 2.3
Dylan Aïssi
bob.dybian-guest at moszumanska.debian.org
Sun Sep 18 21:06:56 UTC 2016
This is an automated email from the git hooks/post-receive script.
bob.dybian-guest pushed a commit to branch master
in repository eagle.
commit 87154188af91b9947100620d920e0070c9723000
Author: Dylan Aïssi <bob.dybian at gmail.com>
Date: Sun Sep 18 23:05:45 2016 +0200
Imported Upstream version 2.3
---
.gitignore | 2 +
README.md | 7 +
example/EUR_test.bed | 400 ++++
example/EUR_test.bim | 2000 ++++++++++++++++++
example/EUR_test.fam | 379 ++++
example/EUR_test.vcf.gz | Bin 0 -> 193569 bytes
example/example.log | 194 ++
example/example_ref.log | 79 +
example/example_vcf.log | 177 ++
example/phased.haps.gz | Bin 0 -> 194442 bytes
example/phased.sample | 381 ++++
example/phased.vcf.gz | Bin 0 -> 200389 bytes
example/ref.bcf | Bin 0 -> 59136 bytes
example/ref.bcf.csi | Bin 0 -> 1185 bytes
example/run_example.sh | 19 +
example/run_example_ref.sh | 9 +
example/run_example_vcf.sh | 17 +
example/target.phased.vcf.gz | Bin 0 -> 21049 bytes
example/target.vcf.gz | Bin 0 -> 20835 bytes
example/target.vcf.gz.tbi | Bin 0 -> 1275 bytes
src/COPYING | 675 ++++++
src/DipTreePBWT.cpp | 611 ++++++
src/DipTreePBWT.hpp | 169 ++
src/Eagle.cpp | 3576 ++++++++++++++++++++++++++++++++
src/Eagle.hpp | 209 ++
src/EagleImpMiss.cpp | 286 +++
src/EagleMain.cpp | 620 ++++++
src/EaglePBWT.cpp | 744 +++++++
src/EagleParams.cpp | 379 ++++
src/EagleParams.hpp | 83 +
src/FileUtils.cpp | 215 ++
src/FileUtils.hpp | 94 +
src/GenoData.cpp | 845 ++++++++
src/GenoData.hpp | 140 ++
src/HapHedge.cpp | 599 ++++++
src/HapHedge.hpp | 133 ++
src/LapackConst.hpp | 76 +
src/Makefile | 107 +
src/MapInterpolater.cpp | 83 +
src/MapInterpolater.hpp | 40 +
src/MemoryUtils.cpp | 40 +
src/MemoryUtils.hpp | 45 +
src/NumericUtils.cpp | 108 +
src/NumericUtils.hpp | 52 +
src/StaticMultimap.cpp | 114 +
src/StaticMultimap.hpp | 56 +
src/StringUtils.cpp | 151 ++
src/StringUtils.hpp | 45 +
src/SyncedVcfData.cpp | 520 +++++
src/SyncedVcfData.hpp | 76 +
src/Timer.cpp | 46 +
src/Timer.hpp | 34 +
src/Types.hpp | 31 +
src/Version.hpp | 25 +
tables/README.txt | 3 +
tables/genetic_map_hg19_example.txt.gz | Bin 0 -> 718229 bytes
56 files changed, 14694 insertions(+)
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..e31a722
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,2 @@
+*.o
+src/eagle
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..164d5ee
--- /dev/null
+++ b/README.md
@@ -0,0 +1,7 @@
+# Eagle
+
+This repository is for developers of the Eagle haplotype phasing software, which is open-source (GNU GPLv3).
+
+Most users will wish to download release tarballs (containing compiled executables and full genetic map tables) from the main Eagle website:
+
+http://data.broadinstitute.org/alkesgroup/Eagle/
diff --git a/example/EUR_test.bed b/example/EUR_test.bed
new file mode 100644
index 0000000..06b53d6
--- /dev/null
+++ b/example/EUR_test.bed
@@ -0,0 +1,400 @@
+l�������������(�Ϯ��*��輈2������館��.��*���¢��.���#����̸��ꫮ
*�ο诈�诠�>�
���ʪ�8�������������*�Ϯ�����ʺ��˺���館��.��㪯��⢼�����#����̸��+��ο�ꯨ�꾸
��+ʪ�8>����*��� ����:�ʻ��꺂���+�;��������+ギ����ʯ�����ΰ���머?�:��:�论��+��������/��+��������>/�;������"�����������;+��쮺����������벪����/����?�����������?������������?�+����������������������������������������������������������������������������������������������?��������:/�;������"�����������;+��쮺����������벪����/����?�����������?������������?�+������ [...]
+���.�������������̊��8ʺ+�:������� ʪ�/(�2�ʠ���*���������������������������������������������������������������������������������������������?���������>Ϻ������*��ʿ�����������Ϋ��������������+��������뫫��������/������?�.���������������������������������������������������������������������������������������������?:�+������������/*��
+�/*�2��<��ʪ����>*먊.Ȩ�"++��̫��������ʫ��늣��Ê.������*��꺮�*�8���;�*���������꼪�Σ2㬠:�+*8�.���(���;�����*3��������*:���(���
<��
+�������*�����*�8���;�*���������꼪�Σ2㬠:�+*8�.���(���;�����*3��������*:���(���
<��
+�������*�����*�<莿�?�:������������ί>�����>��.���������讪����������:�������/���>�ꯣ���;������*:�+������������/*��
+�/*��2��<��ʪ����>*먊.Ȩ�"++��̫��������ʫ��늣��Ê.������*��꺮�*�8���;�*���������꼪�Σ2㬠:�+*8�.���(���;�����*3��������*:���(���
<��
+�������*�����*�8���;�*���������꼪�Σ2㬠:�+*8�.���(���;�����*3��������*:���(���
<��
+�������*�����*�<莿�?�:������������ί>�����>��.���������讪����������:�������/��㊾�ꯣ���;������*���������������������������������������������������������������������������������������������?���������������������������������������������������������������������������������������������?ʸ莯�;�(���������꼪�Σ2��:�+*8�.���(���;�����*;��������*:���(���
���
+�������*�����*����������������������������������������������������������������������������������������������?ʸ莯����;� ������ꨮ���꼪�Σ2��:�+*8�.���(���;�����*;��������*;���(���
���
+�������*�����*����������������������������������������������������������������������������������������������?:�+�������諸���.*��
+�/*��*2��<#�ʪ����>
+몊.Ȩ�"++��ȫ��������ʫ��늣����.������*��꺮�*����������������������������������������������������������������������������������������������?ʸ莯�;�(���������꼪�Σ2��:�+*8�.���(���;�����*;��������*:���(���
���
+�������*�����*����������������������������������;���������������������������:���������������������������?��ο�?�8������������ί>�����>��.������������讪����������:�������/��㊾�ꯣ���;������*ʸ莯�;�(���������꼪�Σ2��:�+*8�.���(���;�����*;��������*:���(���
���
+�������*�����*ʸ莯�;�(���������꼪�Σ2��:�+*8�.���(���;�����*;��������*:���(���
���
+�������*�����*����������������������������������������������������������������������������������������������?ʸ讯�;�(���������꼪�Σ2��:�+*8�.���(���;�����*;��������*:���(���
���
+�������*�����*����?�8������������ί>�����>��.������������讪����������:�������/��㊾�ꯣ���;������*���������������������������������������������������������������������������������������������?����������������;����������������˿�����̎�"�>���������������ﯻ����������������;���/�;�����ˢ>��;+:��+0ʪ���;�㪻��������*��?/��
+�8��ꃺ⫊��3�"��� ����뼺ꮿ��:*������.���������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������;��꯸���3��>����������:��//����/�������ή������8������������ﯺ��������.�������������������������������������������������������������������������������������������� [...]
+���;����"#��
+���.+
�+�(�*(:�
�
�"/���������#ˋ/��>�����.�3������ʺ����������*(���������(�����:.��������������������ຬ����켼"��ʋ����������*���>��/�㨨������"���������*���;����"#��
+�Ψ�+
�+�(�:,�"�+�
�"����������#��/���������ξ���������;�����?��?�����������>���/���������������������������������������������>���������������������������������������������������������������?���������������������������?����������������������������������������������������������������������������������������������?��>�̏�.�:����������?�����:���+8�8�늲�����*�����8������莫�����������ຼ��̊���"���������������������������������������������������������������������������������������������?�����:�� [...]
+/������0:*��(�¨�
+����:���뾋����*����>��
+��(����ȣ�*+�������*
+��,��;������������/��믯�����⣫�>���������������뿏����>���˯������+�뺣����������������>����������������������������������������������������������������������������������������������/��?��������������������������������������Ͽ������������������������������������������������>����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?���������������������������������� [...]
+/������0:*��(�¨�
+����:���뾋����
+�������
+��(����ȣ�.+������*
��,��:��:��?��눎�/���
+/������0:*��(�¨�
+����:���뮋����
+�������
+��(����ȣ�.+�������*
��,���������������������������������������������������������������������������������������������?��;���������������믯�����⣫�?���������������뿏����.���˯������+�ﺫ����������������>������������������������?�������������������������������?���ʿ������������;�Ͽ����������?��;���������������믯��ﺣ���⣫������������������뿏����>���˯������������������������>��;��:��?�����視�
+/�����2�:��(��*����:���뾋����*����>��*���뮈
+�̳�/;�������*
��,����������������������������������������������������������������������������������������������?�:>3�����#�꾏�ˎ�믪��ﻣΫ�£���(�����»�������ʯ����������+��ʫ��������������������������>�
+.0�샪�"�꾋�ʈ�믢���;�ʫ8��*�(*"���«� ������*.8���˨���*��ʫ����������?�����*+�������:������;������>������Ȭ��˿��.�����ʫ�����/��Ͼ��������������������.��+���ϼ�����Ϭ��2:�������ᆱ������������?��������������������˫��?�;.����>�˫�����������;������/�������������ᆱ������������?��������������������˫��?�;.����>�˫����������;������/��������������������������������������������������������������������������������������?�������������/������������������������������������������������������������������������� [...]
+,>������
�Ϊ芊뻪*�못˪�����8��.��������:Ϯ/��?���믻>�������Ϻ+�*��;�+�,ʫ��躪��+��>�����./+���2�;��:���(��<���(�������?����+ʸ����
���ﯫ:�������Ϻ����;�/����������ʮ/��*�<*��?������+�:��/�����*�㢮(�������;����ﲫ����ϻ+���ﯫ:�������Ϻ����;�/����������ʮ/��*�<*��?������+�*��/�����*�㢮(�������;����ﲫ����˻+���→:�������Ϻ����;�/����������ʮ/��*�<*��?�������/�*��/�����*�㪪(������������ﲫ����˻/���→:�������Ϻ����;�/����������ʮ/��*�<*��?�������/�*��/�����*�㪪(������������ﲫ����˻/����������������������������������������� [...]
+���;��������ʪ.:�*�<��.�����*��/�:<�/�����*�⢊(������3�����������+/���"�������˲
+���;�+����(�����ʪ.:�*�<��.�����*��/�*<�/�����*�₊(������3�����������+/��������������������������������������������������������������������������������?�������������?�8:+����������3�� �
.��ʊ���2�*������:���,/�⺎;�,����ꂳ�
+��>8�⫺��ꢃ0���, �������ά���;�������������������������������������������������������������������������������������������?��������������������������������������������������������������������������������?�������������?��������������������������������������������������������������������������������?�������������?�������������������������������������������������������������������������������������������?������������������������+��������������������������������������������������:��������������� [...]
+������ϻ:����������������������������������������������������������������������������������������������>�ʯ����?���/������/��������¿����������;��������뿫���*���������諭ϻ����������>:������>�����+�����+(��.������;���>�<ʨ�ꢲ��ϊ��
+�.��늎���*�ˈ�"�꺪论���*��
+���8�<���/����������������������?���������?����������������������>��������������������������������;��������*��*��+��+������/.�¬�����8����(�?�������;
<��(��>��#�����.
;��*�"�����.��� ��ʮ��*������������������������������������������������������������������������������������/����>�:⢫������(>���������(:�*>��2��>���Â��
+ �+������(����Ȋ���;��*��;�����
+�"�8踾��+����������������������������������������������������������������������������������������������?�����(�꿋������������+����:���������ﯾ����������������������������+�������������˪;/+���������������������������������������������������������������������������������������������?�꾮���.�������늊늾���ú����⫺ϳ��⾮������������������?���������+�����3��������<
�.���(,�/
+*��«����(������8�«��3��®��#�<������謪���"�2뢾�.;��*���+��"�:�"��22����+��#(����������������������������������������������������������������������������������������������:����������������������������������������������������������������������������?��������������+�������������������������������������������������������������������������������������������+���������?�����������?�뺿��ξ���뼿;�������˻�����/�����������������ﻬ?����������������#/�?��/������������?��������?���ί�믻�����ώ��������������� [...]
+ꃪ+;���:�:�胮� ά��Ͼ>� ˫�*����+�.��
�� �*��ﰼ*���/:κ���ª����
+���,���(⸳����� *���믾館ꬪ�2��?����?���
+���:>ξ�����.:�닾��샾3⯫��+������.;���꺻�����꯲�;⸳����� *���믾館ꬪ�2��?����?���
+���:>ξ�����.:�닾��죾3⯫��+������.;���꺻�����ꯢ�;긿�;�����*����Ͼ类ꬪ����?����?�������>:�����:��������;���������?���������������;����������������������������������������������������������������������������������������������?���������������������������������������������?����⿻��������ﯾ������/�뫮:������?�ʻ
��/���������(+��>��/����;�����+*����*�������®�����Ϻ.�ʼ�?���/���㬳�+�������꾌�*⢻���*�������������������������������������������������������������������������?��������������������?˸��"��������Ϫ��� [...]
+�Î��Ȏꪠ�"��:�+
+
+�����(�ʿ��*��*����������ϯ**���*������
+�/�˨8 ���.�8��������
��3�
��<�����*����?�����������(�*:�+�Ȼ�� �+*�����#0��+�:*�
+���;�:�
���.�8��������
��3�
��<�����*����?�����������(�*:�+�Ȼ�� �+*�����#0��+�:*�
+���;�:�
����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������/((���>�8����������3����<�����*����?����������(�*:�/(���� �+*������2��+ꮺ+�
+���;��:�;����������������������������������������������������������������������������������������������/ ���.�8��������
��3�
��<�����*����?�����������(�*:�+�Ȼ�� �+*�����#0��+�:*�
+���;�:�
����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?��������� [...]
+��<�����*����?����������(�*2�/�Ⱥ�� �+*�
+���+0��̰�+ꪺ*����;��2�+����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?��������������������;�����������������>��>�����������������������������������������������?���.₋<�8���ˊ��3����<������������������8�:�/(�������*�
+�������갮�+����;��"�;�����/������쮲�:���+����:��#���������
+�;:������������?Ύ��������������������.��?��/��������������������;�������ο���������:�>�������������Ͼ������������������������������?����������������������������������������������������������������������������������������������/����������������������������������������������������������������������������������������������?��������ϯϪ/���?��/��˿�쪏��������˾�ϋ��*�뻯����뻫���������������/���㢳�Ϻ��:��������������������;�������ο���������:�>�������������Ͼ������������������������������?��������ϿϪ/�� [...]
+�:�����������+��,8���*�.��:�ꢿ����*��.��Ⱦ���>��
+*>;�;������8���������מּ��������������������������������������������������?������몫���������;���������מּ�������������������������������������������������>?������못���������;��������������������������������������������������������������������������������������������,����������������������������������������������������������������������������������������������?������������������������������������������������������������������������������������������?��?����:�����<��3�����뿻���������������� [...]
+�*����������*��
��.��;��Ϊ�?+���*������?���.���?���<곫ﯮ?����:�����˿���;���������
+�(��������*��
��.��+��̪�#+���*�������/#��.��;�?���,곫ﯮ?����:�����˯���:����������
+�*����������*��
��,��;�����?+�,�*������?���.���?���<곫ﯮ?����:�����˿���;����������������������������������������������������������������������������������������������?����������
���������������+�������������?��������������.��������<���������������;����������������������������������������������������������������������������������������������/�
+������¢�(�뫂�����*�쬈�#��Ϋ��:����.�?���� �������#?����"ή�>
+����.#����
+���������"�
+������¢�(�ꫂ�����*�쬈��#��Ϋ�������.�?���� �������#?����"ή�>
+����.#����
+���������"����������������������������������������������������������������������������������������������?���������������������������������������������������������������������������������������������?���������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����
+�ʪ�3��(�(���먣8�°������
+ʢ�ʯ꿂���*����:��ﰼ����ϫ���:��"��0(���*�
+��(.
+����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?��:������N�
>������3����������������>������������
+�����˯뿾����;>ᆰ��⾾�����������;����������������������������������������������������������������������������������������������?��������������������������������������������:���/�ﻺ��*��������������������������>��������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?�������������������������������������������������������������������������������������������?��� [...]
+����î�>�(��;>�����⾺����������:�������������:����������Ͼ���>��������?�����8���/������
+������������������������?����?���/��>8�(ʰ�*������� (�����
+����ʻ��*��8�(�
+���*<���
��?��������⪮˲ ��ꮯ
+�������������:����������Ͼ���>��������?�����着8���/������
������������������������?+�(�
+.�+�������>�:�+�
#*�������,�낾����������;��#�˳�.�̨��è��<�3������ �*�(�.����磌.��:����������������������������������������������������������������������������������������������?�������������������/���꾪���������������������������������������������.쮯�����?��>�����������>��<��������������꾳������ʯ����;����>���������������ʾ���������������*��;ί�躸;��˪�������������������������ϻ�����/���꯫�?�����������������������������/�������������������������������������������������������������������������� [...]
+8���:���ȳ+���� ��������.�����*��,��;謁�⫺��̸2�*�����+2�(����#���
+����:닰�/���*<�������賫��������������,Ϋ���*��>>��;謁���̿�*�����??�,����������:����������������������������������������������������������������������������������������������?����������>�����믎���������κ���������������������.���*���뿲������������������/�������������������������������������������������������������������?�������������������������?���������������??��������������������������������������������������������������������������/.������������#�>
+ Ψ�����ꃫ�
+������������������*�ʺ��:���
+��������껺������.���3��>;��*����������������?"�����������곻���������������������������������������������������������?⬲�����"˪����"��;���®�>����⎻����/���;������;��:�;.�믺�������"�����*��,���Ȼ������/���������������㿿�������������믿����������������������������������뿻������������������>���������������������������������������������������������������ﮮ����������������������������;����������������?��������������������������������������������ﮬ����������������������������;��(��������������������믾� [...]
+?������뾬������������?��������꾾����○��뿺���������Ͼ;���������?�������#�㫫+��������:������;���+�������.����+����⮻����������������ﻨ����
����*��������+����+��������:�������;���+�������������뿫�������������������ﻬ����,�����*����������������������������������������������������������������������������������������������?����뺺"것�.#(��������:����(�;���+���캂������<�>+���(⎻���
���+��������8������"
ξ"��������#�㫯+��������:����î�;���+�������������?+����⮻���������+������ﻨ����
������������������������ [...]
+⊏�����;�.�ம����˿("��:��?��������뫾�/��(
+ꮪ��,����
+����/�����
+;����������/�;����������������������������������������������������;���?�������������������������������/ο�������������������������/����;�����������#������"�꿮��������+������ꪮ��믫�������;�?����;��?����������������������������������������������������������������������������������;���(�����>�*摒��긬����*��물���⋯�/����(;>�����*�.����*�:�;*�����//����+�<(�����>�����
+���⫺�������ή�������������?�?����.�:�����0��������������뺏���3����������#�ﮯ�:���⫺�������ή�������������?�?����.�:�����0��������������뺏���3����������#�ﮯ�:�������������������������������������������>����������������������������������������#������?�������������������������������������������:����������������������������������������"������?�������������������������������/����.�������������������������뺏������������������������:�������������������������������/����.�������������������������뺏� [...]
+�꿯�<������������������������������������������������������������������������������������������?��������������������������������������������������������������������?�����������������������>��������������������������������������������������������������������?�����������������������>����������������������������������������������������������������������������������������������?����������������#��8���뿼�����>��븻������/��?�������;��/����*���?몾+�����3�����>������������������+�뮾������;�����/�� [...]
+�ʈ��#�#*�8������#�����.�*�#�;��꺺�
��?�������;�+.����"���?��" ����3�8>��>����������������+����������������.����;������������������������������������������>���?���������������������;����������������������;����?��?������Ͽ�������������:�����������?��
�����ˌ���
+�ʈ��#�#*�8������+�����.�*�#�;��꺺�+��?�������;�+.����"���?��" ����3�8>��>���������������������������������������������������������������������������������������������?��������������������������������������������������������������������������������������������>��������������������������������������������������������������������������������������������>��
����������
+�ʈ��+�.˺��+��2*���/�#�/�ʪ;���+��?��;谊;�;��������?��+:諿�3�8;��>��
����������
+�ʈ��#�#*�8��˺��+*���2���*�#�+�ʪ2��+��?���#谊;�+�;������?��"8����3�8:��>������������������������������������������������������������������������������������������?��������������ʺ�+�������ᄈ����>.�����;���˻�����������?�뺺��������������������>���?����������8.� ��:��3�/��ˬ������ꌊ�����#���:�Ί
��������/��Ȩ�Ⱦ���*�?�8+����̪�ʂ����������������������������������������������������������������������������������������������?����������������������������������������������������������������������� [...]
+�������������<�����������:����������������������������������������������������������������������������������������������?��/���������*+���ί�����ꊢ;��,�®��(�����������˼��β�����諊���뫯���˯�����̪못�
�����������<����诮2��,�+꿮���������������������ϫ�>���?�����+�����;��⯂���
������?��;>���������������������������������������������������������?������������������������������������?����������������?��<�+��請�������������������������?���������������������������?��;>����������������?��<�+��請���� [...]
+�����?����������<����������:��������/����������������;����.���컲�����3��;����(��>�:<�������.�������,�����������:��/����2(�����.*ʲ��(�����":ꏨ∏,����� + ��>��
8�"
��
�����Ψ������꿫�벮����૫�
��������/����������������;����.���컲�����3��;����(��>�:<�������.�������,�����������:��������/����������������;����/���컲�����3��;����(��>�:<�������.�������,�����������:�������������������������������������������������������������?�����������������������������?��������/����������������;����/���컲�� [...]
+�������������<����������:���������������������������������������������������������?������������������������������������?�����ˋ�����.�몮�/�������������ﯺ�������?�������������./���������돻����������>�:���?�������?����������������?���/?������������;�������.����;
+��������������������:��������������������������������������.�������������������������������������������������������?������ʿ��������������.�����.������
���;�:ή��������Ϫ���;��������<���*������������:����������������ꬾ����������������������;��������������������������������������������������++*"���;������;����Ϻ������/�������+�������ﲪ�����ʨ���ë�����ο�����+�믿���ʿ������<����������������������������������������������������������������������������������������������?�+���������쪏���ʨ���ꯪ ���� [...]
+����(����������������������������������������������������������������������������������������������?���.�Ȯ
+�����������"8*�ﻯ(���(���#��+��<������:+�*����������+������� �"0��몾��+����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?���������������������������������������� [...]
+�������躯���Ί���ì�̸��������#� β�����
+ϻ��+������ﮮ;����������?;����>�������Ϯ�����ﺋΏ���������?�껺����>���������������γ��믪ʯ����>���������+�������������������>���������������/����껮�/����������<�����������.���� �����/����.����/��������*���躯���Ί���ì��꺊�쳮�#�����������������������������������������������������������������������������������?���������?���������������������������>������Ϻ�+����������������������>�������������������ﻯ��껮��.���������������������������������������������������������������������������������������������� [...]
+˻��+������﮾������뿾㯿�����> ������Ϯ���+�ﲋ��������?�ʻ����;>����������������������������������������������������������������������/��������������������?���������������������������>������Ϻ�+���������������������>�������������������������껮��/����������������������������������������������������������������������/�������������������?����������������������������������������������������������������������/��������������������?���������������������������������������������������������������������� [...]
+��?��������胺������꾎���:+��������������>�8����<���˫�������������������������������������:�����?�����������������������������������Ͽ�������������������������������������������������������������������������������������������������?�������������������������迋���ﮏ��>��������>�����������ᄐ����������������������������?�¿���˫������������������?���������������:�����?�������꿸����������������������Ͽ��������������������������������������������������?��������������������������������������������� [...]
+��*�*������������������������������������������������;����������������/��ᄒ�������������.����;���/�������������������������?���:�������ξ�뺯��������*ﮫ���?����?��ﮣ���������������*�������������������������������������������������������������������������������������������?����뻪��2�������שׂ뫪��3�+����
+���/8��������;��誻��*������(����*;��������*����;���
+������������������������������������������������������������������������?����������������;�������������<�;������.������;����+�����������������뾣��������+����;�>�뮿�����3�������������������������/�2��;�������뺿������������?�����������>���������ʺ���������?�������������������������/�2��;�������뺿������������?�����������>���������ʺ���������?�����*�;��,�������+�>���*���Ͼ���+��� ��������诮��/�������������+�ʿ���>⬪뻫���.������;�����������������������/�2��;�ꪫ����뺯��<����������?�����������>��������� [...]
+������ʪ.�꺯���*�������;�����:/��ʿ��ﻸ��������
+������(����<������������������
+������ʪ.�꺯���*�������;�����:/��ʿ��ﻸ��������
+������(���������������������������������������������������������������������������������������������?+���������������������
+
+�����˪.��*����*�����;�
��������ϻ���:������+�����:(����������������������������������������������������������������������������������������������?�������������������������
+
���������:����*��������;�����/�������������������+�������(����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������� [...]
+*�
�*�2�2
��"��:�
�*�+����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?���� ����#�����*�� <�*
+�����8����
+ʋ���3μ�"�/����(�����Ώ����ʠ���;*����������������������������������������������������������������������������������������������?������*�2�����/������,��:��:��ꎊ�㬨�"�ˢ�����;������2����80�
+:�
�*�:��2
��"��:�
�*�*����������������������������������������������������������������������������������������������?��ꢢ� Ϋ������સ�������� <�*
+*���,#���8����苬:�����"����
�ʸ���;�
�����.ʠ���;"����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������;����������������������������������������������������������������������������������������������?�������������������������������������������������������������� [...]
+���+�/������������������#>��;�?���:�������ʫ��:���������/ﺺ���ʾ�>����������믢��
+���+��/������������������#>��;�?���:�������ʫ��:���������/ﺺ���ʾ�>����������믢��
+���+�/������������������#>��;���������������������������������������������������������������������������������������������>���#�����������������;���>�.�������+�����������ë?����㯮?������뿸��+����Ϯ��2�*��*��?��3����/�>��������������������;���������������������/���������.����������������������?���#�����������������:���>�.�������+�����������ë?����㯮?������뿸��+����Ϯ��2�*��*������������㪯������������������������������������������������������㏺�������������>��?��3����/������������������;����� [...]
+����>���
����«"����<�ﺬ��������/�2�?(�
��#��������:����������;���>��������+�����������£��������?������뾸��/诨�����"�*ʫ�:���;��껊��⫪�ꪨ���������/�+������"�����<����ˈ?+����*>�����8�� ʳ�
�
+.����
���2����::��
ʺ�����/쪾
+����˨���:������.�:���⣊�� ϫ*�:���
*��������<���:������ȫ.���.�::?*�*����������/�����������������������������������������������������������������������������������?�������/����������������������?�����������������������������������������������������/������������㫯������*�����������������������������?������������������㏮�ﯻ���������>������������㫯������*�����������������������������쿿�����������������㏾�������������>����������������������������������������������������������������������������� [...]
+�� .3�>�������
+���2�0诨����*���2���(���8������(������
+�
��
+;���������.���������������������������������������������������������������������������������?�����������?"�����?����+�.���������������
+���3?���좯���������/���.��������믿��ϫ�?���;�������":�����.�Ϊ̣
��".��?���������ꋋ��<�����������*.��<��8���+��(���
���� �����ʨ���":�����.�Ϊ̣
��".��?���������ꋋ��<�����������*.��<��8���+��(���
���� �����ʨ�������������������������������������������������������������������������������������������������/��������������������������?����������������������������������������������������>���.���;�������������������������������������������������������������������������������?������>����?� [...]
+����2������+�����.��������*�껺���.�;��ꎋ�����.����>�������?������?+�����?ˮ������꿿�������ί�������:��������˾�*.��ʮ�����:����?�.�+�����������������˻������:��
��>�����
+����2������+�����.��������:�ꪮꮻ�����/��.�뻫*����>��:"��,2 �*��(���.������������"��+���+.��̢�����>�����
����/�ꪮꮻ�����/��.�뻫*����>��:"��,2 �*��(���.������������"��+���+.��̢�����>�����
����/����������������������������������������������������������������������������������������������?����/������2����?�*�+�����������������������⯿���:��
��?�����
+����2���+�����.뎫�����*����/������2����?�*�+�����������������������⯿���:��
��?�����
+����2���+�����.뎫�����*������?���������������������������������������������������ᄏ�?�����������������?����������?�����?���+�懲.����������:��������:��������/��������������請�.�ꫪ��ά�:��������?����˼��.��������������������������������������������������?��������������������������?������������;����������������������������������������������������������������������������������������������?����븪ᄎ�����꾮���������������㳫�������������⾾��+�2���.�����?�뺺�����������������/�:�������;��>��.ꢿ��?뻾�?����˯��� [...]
+�8��������:�.��ꨰ:(����������(�"��Ϊ�������껪�����ˀ;
�������#�슪("������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?���������������������������������������������������������������������������������������������?��*�?:�*���"����
+�8��������:�.��ꨰ:(����������8�"��Ϊ�������껪�����ˀ;
�������"�슪("�������������������/����������ﻫ�������>����������������������/?��ʯ���������������������?.�����������������������������������������?����������������������?�������������������������?>��������������㿮���������������������������?㮼����.�������?�/껫�>������������+��;���������;��������;�������������������������������ﺾ�>��������������������������������㾮�/��������ﺿ������>������,�ᄌ�����������::�����������ο�������/��������(������� [...]
+���#�.�"�㺪��*��*"�����;����:�>>(�������(�����껪�#�¾����.����*��������+�����.��������0����+������쯪?*��*��������������������/������>ϯ����;�����?��8������+�����.��*�Ϣʾ�0��辪�#������쯪?��*��������������(��/��.
+�.�����>ϯ���+⨫�?/��8������+�����.����Ϣʾ�0��辪�#������쯫?
+��*��������������(��?��.
+�.�����>ϯ���+⨫�?/��8�����������������������������������������������������������������������?�ᄏ��?�����:������������?��������������������迻��������;��������?�����������뎯������������������:�������������������������������������������������?��������������������������������������������?�����������������������������������������������������������������������������������?���������?�#����������+��/�#�*�;����>�+
+裸���(��˯>�����
+�*��
+*������:������/ ���˺�.���/*���蠏(/��������̯����*����Ȼ�������+�������<��
+��..���������*�����>���謹���>��2����������*����������������������������������������������������������������������������������������������?�����������믿�����������������������������������
���������������������+����?�������������?�����������믿�����������������������������������
���������������������+����?�������������?�����������������������˾���������������������������������������������������������������������;�����������������������ˮ������������������������������������������������������� [...]
+�#ª�̺�������������˳���.���:�����ﺿ�������>���.*2*3�/���>����
��?:����Ϊ*������:�����������������������������������������������������������������?���������������������������?������+������3�����(��<*������������������8���+���������������������.����뿯�,�;;˫�����������������"�뾻�*��(������˺�;����������/�뫿�������;���������?����㮌����������પ��*#��������;���������˳��ꎫ��:�����ﺺ�������>���.2*���ᆴ
+���:��;:����
+��㺮����:��પ��:
#��������;���������˳��ꎫ��>�����ﺺ�������>���.2*���ᆴ
+���:��;:����
+��㺮����:��પ��*(#���������;���������˳����:�����ﺺ�������>���.2*���ᆴ����:��;:������������:��પ��*#���������;���������˳��ﮫ��:�����ﺺ�������>���.2*���N����:��;:������������:�������������������������������������������������������������������������������������������?�������������������������������������������������������������������������������������������?�����������������������������������������������������������������������������/���������������:���������������������������������������� [...]
+�;
+,��
+ʻ�(�(�,�;����������������������������������������������������������������������������������������������?����ί�����������������������������������������������������������������������������*���������������������������������������λ����������/�������������������������ᄎ����?����>�?����������������������������������������������������������������������������������������������?�����������������������;�����.��>��������;������;���?��?�����������������������ᆲ����������/���������������������������� [...]
+����ʿ����
��?�#��;�������裣��.�����?�ˮ�ˬ����.��.��>먬:���������������������������������������������������������������������������������������������?���������������������������������������������������:������迫������������������/���������:����������������������������������������������������������������������������������������������?�����*����ﯿ������������������.����������������2/���������������������뻲�������������+��;�����?�������������������������������������������;�������������������������� [...]
+>뾣�����㸰��ί���������
+�
��������
+������3����������������;�����긬��������.����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?���������������������������������������������������������������������������������������������������������������������������������������������������������������������� [...]
+���(��>�����»��λ���;�����«�;�����: +�/Ȳ� �����+��".� ������;��
����?��������ꣻ������Ϯ����?�����;��ϋ��������Ϫ;������*���/�3닺�����������.�
+���������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������.����ᄏ������������������ϯ����>����뿸������������껯������+����3뫺����?�������
+����������������������������������������������������������������������������������������������?��?#����������+������������:+�����⪿3�������;�����룯������/���+����������ί�����������?;���������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?���������������������������������������������������������������������������������������������?�������������������������ﯮ�� [...]
+�������믯���������?����������������ʻ��>����
�쿯>�8�������뿿�����ί�뿾��������������3����������������������������������������������;���;��������뾼�����>�������������껿���;���������������������������˺�����������Ͽ�����������������������̿������������������>��쪏����+�2�������쿏��;#��(���.����£��>����ꌻ�.�:�����������(ﻏ���ˋ�8���������:��쪏����+�2�������쿏��;#��(���.����£��>����ꌻ�.�:��(��������(ﻏ���ˋ�8���������:���������������������������˺�����������ϻ�����������������������̿������������������?���� [...]
+몫º��� ��/ ��:��++���(����,��ʈ�숻��������*�𫺲���������������������������˺���������������?��������������������ο�����������������?���������������������������������������������������������������������������������������������?�������������>��������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������� [...]
+�����������������?�������?�����������ハ����������?�.��8�㾊���2.�믾������#���������������������������������������������������������������������������������������������?�������뿿���������������������������������������������������;����������������������������;����������������������������������������������������������������������������������������������?�ꮬ躺ʸ�<������.������
«�
+���*̲��(�2�믨��������;��
+�>����(ʊ���ȫ���?�"��+�㬪�*�꯬�*�ʺ�<���
���������>����*��� �2�뿨����Ϋ��?��:�>����(ʪ���ȫ���?�"��+�⬾�*���������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?���������������������?����;�������/�?�����������������������������������������<����������������������������������������������������������������������������������������������?������������������������� [...]
+��������������:�3����������;���������,�����"�����������������������/3������������������������Ͽ������?������������㾮�������������������������������������������/�����������������������������������������������/����������������������������������������������?����������������������������������+������������������������������������/�����ϯ�������*��.���������������������������������������������������������������������������������������������?�������������������������������������������������������� [...]
+;�*�#�+���3��+����������������+꿿������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?��������������������������������/����������?軮�������;���������������������?�:�/��������/����������;�������������?�������/�����<����>ຊ�����/3��/�ꯈ�����������"?�:�+�/������/���������������������������������������������������������������������������������������������;��������������������������� [...]
+.�3����2��긫�� .�:�(��<ʢ�� ��
+�+��̢��������
���(�#*��*�+�ꈠ?�8��"���λ��8���
+�ή������ ���*��+�"��"ʣ���:��*�?������������������������>���������������<���������������������������������ﯪ����������������?�����������������������>���?��ᅫ����������������������������������������;�������������?����������������������������������������������������������������������������������������������?�뾺����������������������������������.������������������κ��Ͽ���>������������������;����:��������ϫ������ﯿ�����������������Ϋ�3�����������������������������:�����;���?������������������� [...]
+��#+��;3��ꫨꪊ�ꎪ�(��⯈�ʸ����.��"�*����+��*:*,�"�*�8������/����� #���./*���*��+���������������������?����/�����������������������������������������������������������������+/꼺��訯��̬����+���������+��?��븯뮿������?#����������˿�*����*����슂���꺪����*,����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������/���������������������������������������������������������� [...]
+��뮿��������;��������ˬ��/�
�������������?���������Ï�;����������������������������������������������������������������������������������������������?��*���달�ˏ�訿�;��ʊ��
+�2��+���뢣�+������*0��ﮢ����"�
+(.��
+���*����������<:,.
+"�����ë��+�ꏸ������������諳���#.����®�����;�����,��ê���;��8�������
+�*��?�>>�뻪�
������.���������������8����?���>���辏�?������"����������긪����������+���8�������.����������������������������������������������������������������������������������������?�����������>�����������뻾�������뿯������������������������������������;������������������;����������������������������������������������������������������������������������������������;�������������������������������������������������������������������������������������������?�
+�2*�:��;�����/����*���*���2����/?����˰���3����<�Ί;������
+�����>�ﻠ��0���ʿ���*裫#���������������������������������������������������������������������������������������������;+��������������>�����������������������������������/���������/�����������������������:�������?��������������������������������������������������������������������������������������?+��������������>�������������������������������������/���������������������������������:��������������������������������������������������������뮿��������������������������������>+��������������>� [...]
+������⨾�>�����������뺾ﻫ�������쯨������>���*>�˻�����
���������說��*����*+����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?�����������꾿�����������������˿����ξ������������������������������?���������������������/�������������������������������������������������������������������������������������������?��������������������������������˿������������������� [...]
+�
+*��ˈ.����>+�;*���<;;��>>
+�������������.:��������������������������������������������?�������������������������������������������������?����������������������������������������������������������������������������������������������?�������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?��������������������������������������������?�������������������������������������?�����������+���� [...]
+����?����*����/��㼋���>
,�ʋ������芪:�#ʨ� �*.��ꃺ
�,�Ȋ*?�<�:��(�
+���:��*����..�����*�����?����+�����뺬����
+?�8��닋�迋>航��̨��ʫ�����,�,������������8�믾���?������������*�".���������㾾�����������������������������������*����������8���?����������������������������������������������������������������������������������������������?���������������������������������������������������������������������������������������������>����������������������������������������������������������������������������������������������?����������ﻺ>.�/�����������몯���������:3�������������������;���������ϻ���ξ���ϋ>�����/� [...]
+�>��:ꪃ�����/>����/�������������������뻮�������ϯ3��.�����������Ϯ��Ϫ������/�+ʪ���몢>��:ꪋ�����>����������������������������������������������������������������������������������������������?���������������������������������������������������?��������������������������������������?����������������������������������������������������������������������������������������������;��������.���������������>�>>������������������������?㻣����������������������?�����������;������������������������������ [...]
+
�����"��㫺���3�0����"��
+㺸�����,����ᆲ��*/�(���+�/������>������ ��?;��������������������������������������������迨������������������������:������;����?����������������������������������������������������������������������������������?���������.�:��3+��+��*�������
+������˪��;(
� �2��*����볯�뮾(�����������������ˎ�ʻ�����������;������������������������������������������������������������������������������������������?��/������������������������������������������������������������������������������������������?��/������������������������������������������������������������������������������������������?��/��ί���������#�����뮾��ʺ���8������/� ��2�(����������辫�����������/��.?������.��;�������������>��������㿿�����������*���������ȿ��������������"������� [...]
+3���
�����
+����뻿��+���������Ϊ���<�.�/��������������3�����*��>�������ʺ>��:�𮬳��?
���+����
+���+��+�;�����/"�*>��:�
+��?��
���
���ʎ��Ϭ�:�������響����"������������8+��
�������
+�������������������>�/뿿��������������;�����+�������������>��������?����+�����
+�����������뼺�*�������
��車��������������벯ꫂ������ꮾ���������+��?�����������������������˯��î����.��+>��+;��?��
����
���ʎʾϬ�:���� ���響����"������������8+��
�������
+�������������������>����������������������������������������������������������������������������������������������?�������;��������>������ꂿ�������*����������/��>�������������������/�˯�������?����#���������������������?�������������������������������?�������������������������������������?�
+�;��*3������:�:�8���ʨ�/������� 쾢����(2����/�*<�������* ������
�#�.������( ���+��� �������;����������������ꂿ������>�������������/�?��������������������/�ϯ������������?�
�;��.;������:�:�8���˨�/������� ����(3����/��<�﮻����; �����ꮺ�����/������*����;�������������������������������������������������������������������������������������������/���������������������������������ϻ������������������������������������������������>������.������+�������������Ψ��ˬ��;.������+��.����������������� [...]
+뻮��������//����������:�,����������;����������������������������������������������������������������������������������������������/����������������������������������������������������������������������������������������������>���������������������ᆰ���������˿�������������������������������������������������>�+�ϯ��>��:���+���辎��/�.��ʿ����������*"�*:�� 꾯���;:����Ȫ���+���/�������
+ώ�:ﻺ�8��:�.��������������������������������������������/?�.�����������������������������������:����ﺾ�?����������������������������������������������������������������������������������������������/����������������������������������������������������������������������������������������������/+���������"����,��;����+��������**��/�ⲣꀢ��������/�?��.����닺:�����«��꺪���.����������������������������������������������������������������������������������������������?������������������������������ [...]
+�ﻫ뻪��*��/�⺣ꀢ��������/�?��.����ꋺ:�����£��꺨���.����������������������������������������������������������������������������������������������?����������������������꾻�����������������������������������?��/��������˾����������;�/����������������������.���������������������������������������������������������������?��������������������������������������������?������������������������������������������������?����������������������������������㿯����������������?���?�/���>���������ʿ����:�?������ [...]
+���(껯+?���"����.�#�����˯���
+ʨ:����뾨�������
��˯�;������������?����������������迿�:���������������������뿿�����������������������������?�:��+�������뻺���:*�î�诎�?���/�������»����
+.�⮏�ꮿ�����;�ʪ����»������.� ���8�����������������������������������������������������������������������������������������������?�:��+�������뻺���:*�î�诎�?���/�������»����
+.�⮏�ꮿ�����;�ʪ����«������.� ���8����>����������̺�������2
+����˻������#����?�2���������������������������������������?����������������������������ಪ���볯�����3����?�:�����������������������������������������?����������������������������ྺ���볯�����3����?�:�����������������������������������������?�:��+�������뻺���:*�î�诎�?���/�������»����
+.�⮏�ꮿ�����;�ʪ����»������.� ���8�����������������������������������������������������������������������������������������������?�:��+�������뻺���:*�î�诎�?���/�������»����
+.�⮏�ꮿ�����;�ʪ����»������.� ���8�����������������������������������������������������������������������������������������������?�:��+�������뻺���:*�î�诎�?���/�������»����
+.�⮏�ꮿ�����;�ʪ����»������.� ���8��:��+�������뻺���:*�î�诎�?���/�������»����
+.�⮏�ꮿ�����;�ʪ����»������.� ���8��:��+�������뻺���:*�î�诎�?���/�������»����
+.�⮏�ꮿ�����;�ʪ����»������.� ���8����������������������������������������������������������������?��������������/�����믯?����������������������������������������������������������������������������������������������?���������������������������������������������������������������?��������������/�����믯?������������������������������꿿������������������������������������������������������?���������������������������������������������������������������?��������������/�����믯?�:��+�������뻺���:* [...]
+.�⮏�ꮿ�����;�ʪ����»������.� ���8��:��+�������뻺���:*�î�诎�?���/�������»����
+.�⮏�ꮿ�����;�ʪ����»������.� ���8��:諨�
+���ꎨ�����<�������2�
+���(���+?���"*���.�"�ꮲ����˯��?Ȁ:�����*�>����������*�;�������������������������꾻���껯?����2����>�;����������������������������������������;���������������������������������������������������������������?��������������/�����믯?����������������������������������������������������������������������������������������������?�;��+�������뻺���:��î�诎�?���/����������
+.�⮏�ꮿ�����;�ʪ��ú�ꮂ�������.� ������������������������������������꿿������������������������������������������������������?������������������������������꿿������������������������������������������������������?������������������������������꿿������������������������������������������������������?���������������������������������������������������������������������������������������������?��������������ᄎ������뻮����������������������������������뿿�?���?����;������/��������>��������������ᄎ�� [...]
+��ﳪ�������������������;��������8�*�����.���������������������������������������������������������������������������������������������?����������������������������������������������������������������?�����������������������������/����������������������������������������������������������������?�����������������������������/�:���������ʯ��0#���쿯�?���*�*����������
+
�ﳺ⮋�������������˯��� �����*����˪/����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?������������������������������������������������������������������������������ [...]
+
�ﳺ⮯*����������ꊰ��뮿�>� �������ꎼ˪/�������������������?�������������������������������������������?��������������������������/�겻��+���>�������ꎿ�����讣��,¯�??���2����>�;�������������������/ꮺ���Ͽ������3���������������������������������������������������������������������������������������������?�겻��+���>�������ꎿ�����讫���,꿻?����2����>�;����������������������������������3�벻��+����>��������������讫���,��?�늯2����>�;����������������������������������3�벻��+���>��������������讫���,꿻?����2��� [...]
+�/���������������������뿯�����������������������Ͽ��������뾿���������������������������������>�������������������������������������������������������������������������������������������?���������������������Ͽ���������������������������������뿿�����������ί������������������?����������������������������������������������������������������������������������������?*��������������>�"���
+꯸..��/���꺻;>����
+골�ࣸ#�����˯Ȯ��������
�����>"���Ȩ������?����������������������������������������������������������������������������������������������;��������"����(��쯪�����������>��?���.��������>��;��ꪎ�����.�����>���/��讫3�;���8���/��������"����(��쮨�����ﻮ>��3���.��������:��"��ꪊ����.���,���
�.:��/��誣3�;���8���+����������������������������������������������������������������������������������������������?������������,�����������������>����������������>�������ꎿ��������곋�����/�����3�;���:���?��������������� [...]
+���������������������8��"��2���3���Ϋè+�
�������+�˲
� <����"�달�����>�*Ϡ
+�;��"���;��;������:�������.����������>����������������������������8�˾��?����뫿���¸�����������>>��;������:�������.����������>����������������������������8�˾��?����뫿���¸�����������>>�������모�2����������?��쯯��"���;���ίê��#��/����̫����� ,�����㫫��������Ϊ��⫲�(���::�������;����;/����*����0��#�����쬬�
+����+�.�+��������"������+����ﺈ�뮿����������?�����������������������ﯿ����������ϯ�������/�����������"������������������������?������������������������������������������������������������������������� [...]
+�
+�����.,�����������*���>�������������+�������ú�(>�� ��#�:���β�,�( ��?��
+�������늼�����������������+�������������������������������>������/�>�������,�(�ꃿ��
�������늼�����������������+�������������������������������.������/�>�������,�(�ꃿ��
��������+��>����8������;����2.������������/?��ʲ
��������.>���?������������������:�������������������������������������?���������;����������������������������������������������?����������������������������������������������������������������������������������������������?������������������������"������������?���������� [...]
+*�;��
+�ꀺ"��+긲��
+������8�(���������+���(��� �˫����뎻���:������;�����#��������컨���+��������誺����������.������������믊2����ʏ
+��������"��/��?��.��㣪���+�:�믪Ώ����������
+����
�8����㣯�������;�� �(��+�����ʮ�;*���.+���������������������������?������������������������������������������������������������������?����������������������������������������������������������������������������������������������?��������#��,��?����㣪����;���
+��컫��/��
+�>��
�8����ã��������;�� �,����ˮ�;2���.+���������������������������?������������������������������������������������������������������?���������������������������?������������������������������������������������������������������?����������������������������������������������������������������������������������������������?��������"��*��?��.��㣪���+�:���ί����������
+����
�8����ワ
+������;�� �(��+�����ʮ�;(���.#����������������������������������������������������������������������������������������������?���������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?��������"��.��?��.��㣪���+�:���ί����������
+����
.�8����㣯�������;�� �(��+�����ʮ�;*���.+����������������������������������������������������������������������������������������������?��������#��.��?�����㣪����;������������
+����
.�8����㣯�������;�� �,����ˮ�;:���.+����������������������������������������������������������������������������������������������?��������#��*��?�����㣪����;������������
+����
.�8����ヮ
+������;�� �,�⫎����ˮ�;8���.#��������#��*��?�����㣪����;������������
+����
.�8����ヮ
+������;�� �,�⫎����ˮ�;8���.#�������������������������?���ꪾ���������������������������������������������������+���<��������#��*��?�����㣪����;������������
+����
.�8����ワ
+������;�� �,�⫎����ˮ�;8���.#���������������������������?������������������������������������������������������������������?����������������������������������������������������������������������������������������������?��������.���2�?��.��볪���+���Ϯ//��������
+����
����������������� ����*�����.ꎼ+�곣,+�*�.0.�3���ϳ>�������������?���Ϻ�.�Ⱦ�;/���긺�#���/*>>����.��������뾊.��2�⻮?�*�.0.쳬��ϳ>�������������?���Ϻ�.�Ⱦ�;/���긺�+���/*>>���ﮰ�������뾊/��2�⻮?�*�.0.�3���ϳ>�������������?���Ϻ�.�Ⱦ�;/���긺�#���/*>>����.��������뾊.��2�⻮?�*�>8.�3���ϳ>�������������?���Ϻ�.�Ⱥ�;/���긺�����.*>>����.��������뾊*��2�?��?���
+������뫬���?�������;��⾯�������������꾼+���������+�뾪
�����������������������?������������������������������뿿���?��������������?�������������������������������������;���������������������������������������������������������������������������������������������?�����������������������������������?�ί�������������;��>�����������������������;�����������>����������������������������������������������������������������������������������������������?��?���
+��﮿�������?��������?����������������������>+�����������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������+����������������������������������������������������������������������������������������������/����������������/��������*���������������������������������/�����?�������������������˻>�������������������������������������� [...]
+ʢ�ʂ�����;+�����˺*>����(���.������+�"���:��#���������/,�⮊��".�Ί�:�㋫(�����88���������������������������������������������������������������������������������������������?���˫����:��;�����;�ﲻ����#�����.��?�(���
.���:����+��*����.����:�����0����������ꋯ��+�������?����<�;�+���������Ȼ����?���
����>����;���������;���ஸ��������������:���ʫ�:����3�(�;
*˰��������;���;�2���
+��
+�<�*��*�
+�"��ή����+ⲫ���"� ς������� ��"�����������������������������<�����뻿�������/��˫������:��꯫����������������������
+����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������+����:�.ʺ���Ψ�����>/?�+�;>����</�����/?�"ø�
+���+�����:���/���.����������"⣿��(�*�*�?����������������������������������������������������������������������������������������������+����������������������������������������������������������������������������������������������+����������������������������������������������������������������������������������������������+���ʫ꺺���;�.(�;++����������˻����?�:��.+����<����
��.�ʪ������;����������ʻ�������8"������������������������������������������������������������������������������������� [...]
+���+���������/���.�����������⣿����.�+�?������*��;�*(*:�*�������»����>�*���+����8����.������/��*���.���������;�������,:����������������������������������������������������������������������������������������������?�
+(�:3��/�*������*��<2� �����?
���.>�:�������<�ˀ����22�
+3�븪��
�+����?�
���;��� �����*�.����������������������������������������������������������������������������������������������?�
+쫻�.�>�.������+�������⯾�*�(�� ���2�ʿ���.À�*
겢�;��>�뫨+��������/��࣮�:������>���?�.�>�/������;������������,��(���?��������.���*�꺮�����������+�㻫������/������:������>�
+쫻�.�>�.������+�������⯾���,�� ���2������.���*
겢�;��>����+��������/��࣮�:������>����������������������������������������������������������Ͽ����������������������������������?����������������������������������������������������������������������������������������������>����������������������������������������������������������Ͽ����������������������������������?�������>��������;���������������(���������/���+�꺮���������������?�����/��������������?������������������������������������������ [...]
+��;�����������������������������?��>����������������������������������������������������/����������������������������������������������������������������������������������������������?������������������������ᄒ�����?��>��������?�����������������������������/����������Ͽ���/����������������������������������������������������������������������������������������������?������������������������ᄒ�����?��>��������?�����������������������������/��������������/������������������������ᄒ�����?��>�� [...]
+3�
+��<��겺�
��뮢����*������:��+�����������0�����*Ϡ����¨�"�������:����������������������������������������������������������������������������������������������?������������������������������������������;�����������������;�?�����������������������������
+���������������/���>�������������������ﺮ����ˮ��������������������������������������/���������������/���>�������������������ﺮ����ˮ��������������������������������������/�����������������?���>�����������������������������������������������������������������������?��������������������������������������������������������������������������������������������/��������������������������������������������������������������������������������������������/.������������/���;������ﯯ�����>����:����ᆱ�� [...]
+볊*���3��+�+����ʫ�/˲⬺�����"#��+./*�耪�������/��������/������������������������??����?��������������������������������������������������������������?����������������������������������������������������������������������������������������������?�������������������������������������������?����������������������������������������������/�������>����������������������������������?���������������������������������������������/�+���������ꪪ��뿿�ﮪ�����*�讫��:���+��������
��+?�+������/���*��,.?���.� [...]
+���˪���:��
���*0��
� (+����ʊ���3?���/��,�8��ꮪ��ʊ��*#�ꮻ����/2����/�שׁ�*����
+�˪���:ꈂ
���*0��
� (+����ʊ���3?���/��,�8
+�ꮪ��ʊ�����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������+���������� [...]
+���∣��<��;��.�/��"�
.���(*����<���� ���,�
��
��������������������������������������������������������������������>��캿�����;���/��;�"
+����ʪ��싸,�?��+�+�0�����������⸪����?������ꨲ����?��:�,�����*���;��>��+��.+���*�*����ί���^�����;ʻ�������������������������������>��?������:שׂ���������������������.�*��芿ȯ���묮�����;��������믮���������������먫��.��:����:שׂ����몮�������꿺�,�"
+����ª��슸,�+��+�+�0����������>�*����>�����ꨲ.�
+��?��𪠪:�,����(���;��>��*"��.+���*�"
+����ª��슸,�+��+�+�0����������>�*����>�����ꨲ.�
+��?��𪠪:�,����(���;��>��*"��.+���*�*����ί���^�����;ʻ�������������������������������>��?������:שׂ���������������������.�*����ʯ��������;ʻ���������>븯�������������.�:��?������:��������������������.�*����ʯ��������;ʻ���������>븯�������������.�:��?������:��������������������.��
���������������������������������������������������������������������>����������;���/��;��
���������������������������������������������������������������������>����������;���/��;�������������������������������������� [...]
+����ʪ��쪸,�;��+�+�0�����������Ⱚ����?������ꨲ����?��:�,�����*���;��<�˪+��.+���.����������������������������������������������������������������������������������������������?�*����ί���^�����;ʻ�������������������������������>��?������:שׂ��������������������.�����������������������������������������������������?����������������������������������������?
������ʫ��������;�������������>������.���������.�2��?�����>ά���������;��뮪�������.�뫿���:���˯����.*�Ⱥ誯����*�����������?����#:*������.������<� [...]
+�*.��.�;���뫢���;�:�3�㯰���Ϫ�
+ʬ*ʣ����+��²긮�,�#".���+����������������������������������������������������������������������������������������������?��������������������������������������������������������������������������?��������������>�������������������������������������������������������������������������������������������?��������������������������������������������������������������������������?��������������>��
�ȏ��;�⣮�⯊����/����?껲�ʼ�⫸�*��
+��.(����ϻ��˻���8�����/���+�����+����>��:��*ﲫ�*ϫ�����������*�+긺����+�*8�+(���>���Ϣ����.Ϡ �����:κ���?��������/�::ﲾ�����:����à������������������꿫�#�+8�+����������ˮ������*�:������˺�����������Ͽ�����.��������������������������������������������������������������������������������������������?�������������������ᆱ����������������������������������������������?������?����������������?����?�*����������2*�<����������ϻ�����:�?�ϳ�������������ή������3�+�.����>�˫�������ʫ���?ﰾ�����:����â������� [...]
+�8���飼������:��3��������.�����>�/����Ͽ���Ͼ����/�����?��#�+�ί8�ˎ�2.������������>������뫮�+���������Ϋ�������������.ʿ⏯﮳����������;��������;���>��뺺�?��������������������������������������������������������������������������������������������������������������������������������.������������������������/�����������������;������������������������������������������������������������������������������������������?�������������������������������������������*�����������������������/����������� [...]
+����>�/���������Ͼ���������������?����:����������������������������������������������������������������������������������������������?�������������������������������������������������������������������������������������������?��;�������������;���:��������뿿���/����늿(�;�������;����?�����������������+���.:�����������������������������������������������������?�������믿��������?�����������������?����������������������������������������������������������ο���������������̿�����������������? [...]
+��;ʾ�����������;���:�������ﯫ�����/����늿(�;�������+����?��������������+���.:�����������������������������������������������������?�������˯�������?���������������Ϯ�>(����ʪ�뿯��>�����㯮�;���,/�,������
++/�����;������¿�/:�ʻ��������+��������,?��?�����?������������ꢻ���ά���ﮏ������������������������?��#�����������������������?�������;�����������*������?���/��8�뮯�⫫�����������¿��?��Ȣ�/Ⱒ��."+����,㨪�"��;>�?����
+*�껫��/����>�����㮮�+��,��
(����<�+/��>�;����+.��>�;��;��������+��������;�*��?��κ��?����������������������������������������������������������������������������������������������?��*�;����ꫪ�#���:�⪻�
��㎫.����ﮢ���/訿��?������
+���:겮�
�
+������.���þ��#���
+
+����*����ꯪ���#/��谪��
��뎪/����ﮠ��Ȫ
+証��?�������
+���:����,�*�����
���胾�꣯����������������������������������������������������������������������������������������������?���������������?�����������������������������������?�������������������������̿�������������?����������������?����������.������������������������������������������������������������>�?���������������?ﻯ��������.��������������������.����������������������>���������:���������������������������������������������������������������������������������������������>������� [...]
+������������������:����������������������������������������������������������������������.꾻��������������������������/���������꪿������������/��������������������������뿿���+�����������������ﯻ�ᄎ�����?����������������������������������������������������������;�8�������Ϻ��:�請��*8����/:�/��>�������*������묾��(�ꮏ��迮�̻�/������������:����#�����������������ﯻ�ᄎ�����?����������������������������������������������������������;�����������������ﯻ�ᄎ�����?������������������������������������������ [...]
+��
����������� �����
�곌����*��
��꺼��뫲�*
�����;�����+�ʳ������#��<���ΰ/������.�ʯ�
+��
����������� �����
���
+�+��
��꺼��뫺�*
�����;�������Ͽ������������������������������
+���������������.���������㿾�?������������������+����;�������Ͽ������������������������������
+���������������.���������㿾�?������������������+����������������������������������������������������������������������������������������������?����;�����
��:������;뎺<?郞꯸��ξ��*�˫�*�Ì�������:�*�������芼��?�ಎ�������믺�.
�
+����������;�����������������������������������������?�������/������˿�������������������>����������;�����������������������������������������?�������/���������������������������?����������;����������������������������������������?�������/��������������������������>����������������������������������������������������������������������������������������������?���������������������������������������������������������������������������������������������?���;Έ
��;�.8�+*�.�������(�������������ʪ [...]
+/�*:��:������˿������Ë��/�����<�����
���������*������뻪��﮻ ������(��8����;�*
+/�*:��:������˿������Ë��/�����<�����
���������*������뻪��﮻ ������(��8����;�*
+/�*:��:������˿������Ë��/�����<�����
���������*������뻪��﮻ ������(��8����;�*
+/�*:��:������˿������Ë��/�����<�����
���������*������뻪��﮻ ����������������������������������������������������������������������������������������������?��������������������������������������������������������������������������������������������8��������������������������������������������������������������������������������������������8����������������������������������������������������������������������������������������������?����������������������������������������������������� [...]
+2#��+���2��<�*������/��(���
+�����.��
�ȳ�Ⱦ�"�"��:��>
+.*.����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?�����������������������������������;�������������������������������������������������������?*�*�«ꪫ�"/���
�눸�ª�����ʫ��������꺣�*����
�������0:���(���"��8���:>��.��/*,*�«ꪣ�"/���
�눸�ª�����ʫ��������#꺣� [...]
+,���:�+��*��3
+���*�����
�*��,�*�+:ʪ*��:�:������"���
+2������;�ﻃ�β���������2��.������;����.��������������캢�꣺��������������.���2�묫����;�������*�������2����������������������������������������������������������������������������������������������?���̪ʮ���:����>�>�����.�(�3.�.���������>.�⎃����ˬ��������������������⫿�8��������#����������������������������������������������������������������������������������������������?������������������������������������������������������������������������������������������?���������������������>���*���ﮮ.� [...]
+��
+��쮺��达�>몪����ȫ�ꌮ���������(>
�����軲��2�ʪ����8��?��������>��?��������;���뺫��˰�說���쾺�?�>������(��쪻�������������������������;������������������������������������������������������������������������������?����������Ͼ��������������������������������������������������������������������������������?�������0��:�ʮ�;
�0�������*��>*����;�+�먨2�����
++�븏���輨�>���������⫬��,�;�������겾
�������0��:�ʪ�3
�0�������*��>*����;�+���2�����
++�븏�*����輨�>����������+���,�;��<���Ȫ貾
+�������0��:�ʪ�3
�0�������*��>*����;�+���2�����
++�븏�*����輨�>����������+���,�;��<���Ȫ貾
+�������0��:�ʪ�3
�0�������*��>*����;�+���2�����+��8��*����輨�>����������+���,�;��<���Ȫ貾
+�����������������������������������?�������������������������������������������������������?��;��ʻ��뾋��������������
*����.��������������莾���;�����?��������������"���?�����/����������������������������������������������������������������������������������������������?�����������������������������������������������������������������������������������������?����?����������������������������������������������������������������������������������������������?����꺾��˾�������������+*����.������ [...]
+�..����������������������������������������������������������������������������������������������?���ϯ���>���������뻮����������?��뺊*�������>�����ﯯ���;���������>������??������������+���ϯ���>���������뻮����������?��뺊*�������>�����ﯯ���;���������>������??������������+���˨>��:��.�����
��㫪���:8����>����� ������2��" �˯�
+�3���>����,���
ꨮ*>�����?�����
+�;�����ϯ�������:��������������������������+��/�����������������������?>���:���¯�����������>����������������������������������������������������������������������������������������������?�;�����ϯ�������:��������������������������+��/�����������������������?>���:���¯�����������>;軫��⿸�����;�+�������������
+�.����>�*���������*���.�������;�ﯫ����:������?;;軫��⿸�����;�+�������������
+�.����>�*���������*�Ϊ���.�������;�ﯫ����:������?;���ϯ���>���������뻮����������?�������������>��������;���������������??��������.���+�;�����ϯ�������:��������������������������+��/�����������������������?>���:���¯�����������>�����������������������������?����������������������������������������������������������������>�����������������������������?����������������������������������������������������������������>���������������������������������������������������������������������������� [...]
+̮�訨���>�"�����������.�Ϊ��®�������������;������/;��˾��?��������*������
+��
+Ȯ�ꨨ���>� �����������.�Ϊ����������2������#���;���/:���������������������������������������������������������������������������������������������?�����������������������������?���������������������������������������������������������������>�;�����ί�����������������������������������#��?�뺮�����������������������:��������:;������.�����������������������������?����������������������������������������������������������������>��뾯�?��������*�������������
+̮�ꨨ����>�*�����������.�Ϊ��®�����곻����;;����˯/+�����������������������������?����������������������������������������������������������������>��뾯�?��������*�������������
+̮�ꨨ����>�*�����������.�Ϊ��®�����곻����;;����˯/+���;�?���,�����
�:��;�0����;����?���,��̾����3?���/�/��03��������������??�������,���/����������������������������������������������������������������������������������������������?�;�����������������������������������������?�뺾�����������������������:ﲾ�����>;������.����������������������������������������������������������������������������������������������>�����ꪸ������������������������������������������������������������3��������� [...]
+(������ﮬ.���������첯�#����"����
�����"
+��
�:�믯ʿ�ºκ�#���������ﻺ�*�ꪸ��#**:�ꮮ����讫�댨પ�+ϳ㨣����.��.�+:*�«��2
����
���<���
+�������������������;���?���������������.����������ﮣ��(���>���������:����������?�
������˪������Ϯ�������������?����輻��������<��"��������몿�,���<����;��������������������������������������;����������������������������������������������?����;��������������������������������������;����������������������������������������������?����
���<���
+������?�����������;���?�������������κ���.�����������諸��(���>�2���������Ⱥ��*
�����請���,��ꀸ����
+ ����诮��ﮬ.��������̲�꣪���#����
�����"
+����������������������������������������������������������������������������������������������;����������뿿�����������������������������������+����ϯ���������������:����������������;����������������������������������������������������������������������������������������������/���:�����?�������������;�*��������������>���/+�˾�����ʌ�ᄐ���������<�����ﯼ��������#����������������������������������������������������������������������������������������������?������������������������������:����.��� [...]
+����;�������.�:�*��꺊:��껊���+��*���꺯�����긫��2*���������
ꮫ�����#��:<�
++(���*3����
+����3�������������������������3��.����ϻ�����������ᄎ�������ﻺ�����������Ȋ�����?��ϯ����:������?���/�����������:����������������������/�������/��꿾����ᄏ���>�������������������.����������������������������������������������������������������������������������������������?��?�
+�����ﻻ��������:��3�������������������Ϋ���?���/���뫲������?8��껺���������+��:���⫈����+��������������?���ϫ�/�������ά���>�:���������.������껌��㏈�>�˨������?��������������������������������������������������������������������������������?������������>��>������������;������������������/��������������������>������������:��访�����?����븮������3����ʮꈫ�����.��8+�����+�+��/����������2�����.��"�������?��Ⱦ�㪮���������������������������������������������������������������������������������������� [...]
+̋����"����8*������껪ˮ�������������(�Ψ�3�����;��ð���뫊� ��*.#��,��:<��;���������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������������˨�+����"����8����������������ꯊ�����:���������껨�㸮���������:��������;��?�������������������������������������������������������� [...]
+;��
+��꿿� �"��ꋰ������*��3����������뺢?�������������������������>�?�/뿾������������������;������.���������������/��;����ﺣ������?�*���>������:�ϯ̣����������*�� �/����*
3������>��*��:������;�> ����8�:���(��#�����;��� �*���⾬�����;�������������*뻠�/������+3������>��+��>�������>"����8�>���(�3������ʿ2�������������2��������������������#?���Ϋ��������������������������������������������?�+���쿿����;������������"����/����+��"�������?��>�����>+������>���������2��"��>⫪�?����������낪2:;?�
������8���>����
+������꿿��"̏�ꋸ��������;������˫�����?����������������������������������������������������������������������������������������������?�������������������Ͽ������������ϯ����������������������������������������������������������.����������������������������������������������������������������������������������������������/�������������������Ͽ������������ϯ����������������������������������������������������������.���������������������������������������������������������������������������������� [...]
+�0����������������������������������������������������������������������������������������������?�������������������������������������������������������������2������������������������������?��:��������﮸�2����ο������������?�"?.�����븺��0�ì�ο�������������+�����+��ʮ���?��;��������������������ο����������������?����������������������������������������������������?������������2��������������������"?�����������먻��������������Ϋ;�����������������/�����쿿����?�������������"���������+��"���2���?��> [...]
+���������2���ꏺ����������������������?�����������������������������������������?����������������������������������������������������/����������������������������������������������������������������������������������������������?���������������������������?������������������������������������������������������������������?�������������������ϯ������������ϯ������/���������������������������������������������������������������������������������������������?����������������������������������� [...]
+���
+��
꿯��"���ꋸ������㪯�3�������������?�� ��>⫪�?������젲�����2*;>�,����츨�������
+���
+��
꿯��"���ꋸ������㪯�3�������������?��"��>⫪�?�������������;?�/�������?����������
+��꿫�
+�����ꋺ������������������������/����������������������������������������������������������������������������������������������?�����������������������������������������?����������������������������������������������������?�����������������������������������������?����������������������������������������������������?�������������������ϯ������������ϯ����������������������������������������������������������.*� ��>�����������̠������2
+:
+�,��젨��/����
+���
+����"���苨����������#����������������������������������������������������������������������������������������������������������?��0��>ꫪÿ�����������ʮ2�;��,�����쿸����������
���*��(������2̏���������������ȣ������?����������������������������������������������������������������������������������������������?�������������������������������������������;�����������������������������������>������������?�������������������ϯ�����������ᆵ�����/����������������������������������������������������. [...]
+������������㾾�����������>?��,�.�������
����*��/�
+��辻��ﳫ�����꾿������<����������ʾ���������:������/�:��������������>�������;����������������������������������������������������������������������������������������������?�������������������ϯ������������ϯ������/����������������������������������������������>�����.��������뫿������﮿��������������������������������������Ͼ������������������;�������������?�������������������ϯ������������ϯ������/����������������������������������������������������.�������������������ϯ������������ϯ������/���������� [...]
+������������㾾�����������>?��,���������
����:����������������������������������������������������������������������������������������������?�����������������������������������������?����������������������������������������������������?����������������������������������������������������������������������������������������������;�������������������������˿���ϯ����������������������������������������������������>�����/������믻������;ʮ������˿��<ϯ��˿������?��˾��������ϼ�:>�����������:����>���� [...]
+����������������������������������������������������������������������������������������������>��������������������������������������������������������������>�������������������������������>����������������������������������������������������������������������������������������������?��������������������������>�������������������������������������������������������������������?뺿�����������������������>������������������������������������������������������������������?ﺯ������������������� [...]
+����Ë��.���ⳣ�
��"<��/��
+꺸��Ȼ����ʋ�2�<��ʌ;�*����������������������������������������������������������������������������������������������?���������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?��������������������������������������������������������������������������������������������� [...]
+���Ã��.������"<��/��#�����ȳ����ȋ������ˌ;�*�������������������?������:������/�������������������������������������������ο������������/���������������������������������������������������������������������������������������������?����ꢊ���������+�����ꊰ*�"����袢���*�닫:��8.�ʻ�����"���/���"���0���ʋ��������:�"������������������ﯿ�������������������������������������������������������������������������?����:����β����������0�˳����������������ꫯ���������������������*��Ϯ�,���Ϻ��������.:��>��� [...]
+
+�����:�������Ϩ�����
+��>��ʨ����>�����������
+���+�+�����?���*���,>Ϊ.����������������������������������������������������������������������������������������������>��������������������������������������������������������������������������������������������?ﺯ���������������������������������������������������������������������������������������?议�*������������˸���˺�������Ϩ�������>��ʬ����>��������,����+���������*���>>Ϊ>����������������������������������������/��������������������������������������������������?������������������� [...]
+�����ϰ��*��˳�+/�����*��˨�>ಬ�;�/������*�/���ª������껪��������>�?���/����������������������������������������������������������������������������������������������?���������/��?��������������+�����>������뿿��>����������;�������������������������������?����ﯾ���?�������������������>��������������ﻯ�����������������������������������˯���;�������������������;������������������������������������������������������������������������?���������������������������������������������������������������� [...]
+��ʃ��
+Ⱥ������;����?�:��/(�8�8�����ϣ���8�����>�ο�*��(���;�8��(/Ϩ���Ϩ���ʯ꣺�.����:*���;�.���2�:�*��»��
���������2�����Ϫ����몰������㫎��⯿�������
+���ﻻ��������/;���������������������������;����������������˾�����������������������������:����*���������Ͽ�������껨������?������;����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?�����(�
�<����
+��ˮ.��.��⊳������������οᆰ��2�<��8��"/���>���
+�������������:�?��+>輢:���������������������������������������������������������������������������������������������?�������/�>���������ᄒ����������������������������������?���뾺��:����+��������;����?����>8�8�:*����ϣ��긺����>���..�����;�8������Ȫ믪���+�����>��+��?��������:�>/�ÿ������������������˾��ϯ�������������������������:����*���������Ͽ�������껨������?������;���������������������������������������������������������������������������������������������?������������������������������������ [...]
+/����������������뼋���Ύ�#
+����������:��
+�.
+(���8�못��3�
+�+�������ꃈ����#�������
+/���������������
+�뼏���Ύ�#
������¯��:�
+�.
+(���8�몿�;�
+�+�������ꃈ�:���#������
+/����ꌪ����������
+�뼏���Ύ�#
���.���«��:��
+�.
+(��
�8�몿��;��+�������ꃈ�:���#������
+/����ꌪ����������
+�뼏���Ύ�#
���.���«��:��
+�.
+(��
�8�몿��;��+�����������������������������˿����������������ί�;���������������;���>����������+��#������������������������?������������������:������������?���/���������*��Ϋ����������?����.����������������������������������������������������������������������������������������������?����������������������������������������������������������������������������������������������?���������������������������������������������������������������������������������������.����������������������� [...]
+�ʮ����>����������ꪯ���.��<�ʊ��(����
������:�������>�����.���κ�����ʪ�#�(�*<��.�8�
+�ʮ��+�>�������������ꪯ���.��<�ʊ��(����
������:�������>�����.��������������������.���8��������������������������ﻯ���������������
������������?����.��������������������.���8��������������������������ﻯ������������������ʮ��������?����.��������������������.���8��������������������������ﻯ���������������
������������?����.��������������������.���8��������������������������ﻯ���������������
������������?����.����������������������������������������������������������������������������������������� [...]
+�ʮ���뿾�뮏���������.��<����(���������ú�������?�����.�κ�����Ϊ�+�(�����.ﯰ�8�
+�ʮ����>�������������ꪯ���.��<���(���������ú�������>�����.���Ȋ���Ϊ�+�������.����8�
+�ʮ���뿾�뮏���������.��<����(���������ú�������?�����.��������������������.���8��������������������������ﻯ���������������������������?����.����������������������������������������������������������������������������������������������?�������ﮯ���묻�뺫������������������?����˿�������
+����ο�������;�Ͽ�������������:��;��?������������������?��������������������������������������������������?������������������������/�ϯ������������������������������������������������������������������;�Ͽ����������������;��?����������������/���.�������/������:�����������#����������?��?��ϻ������������?���������8������*,��˪�����.����#����������/�����������������������3��γ������>�����?��:�β�0����������������������������������������������������������������������������������������������?�ί��� [...]
+���8�.������舀��>��
+��싺�,�,���?*(�:��
+��2���**���밬���������;������������;�
+"*:2+�+���3Ȫ����
+����������������������������������������������������������������������������������������������;"�ϫ�;�����ï��/���㯎���,����+,��
+����(�?��.��"��+�����.���������>����*��ʿ ��*�+/������.�����?��?/��뼮�����������*�����ꮮ��;������,����/��;㿈
+�*�?�����쯺���.����������������������������������������������������������������������������������������������;/������.����:?��?�#+��뼨���������*�*�Ϊ�����誮��;�:����,���� ��;¯�
#�*�?����:������*�*���/��
�����ા��������������̲ˣ���.����Ϣ�Ȫ���
��#�ë
+쮈���+����*���+��.3������>��������������������������?ϻ�������;�������ﮫ������������������������������.������������������������������������������������������������?��������������������������������?�����������������������+������������������������������������������������������������������?���������������������������������������������������������������������������������?����������?�������������������������������������������������������������������������Ϯ����������������;���������������κ���� [...]
\ No newline at end of file
diff --git a/example/EUR_test.bim b/example/EUR_test.bim
new file mode 100644
index 0000000..8ccfd54
--- /dev/null
+++ b/example/EUR_test.bim
@@ -0,0 +1,2000 @@
+21 rs11702480 0.415634 38347375 G A
+21 rs7280358 0.415683 38349787 A C
+21 rs7282108 0.415721 38352192 A C
+21 rs58296537 0.415776 38358682 G C
+21 rs150853915 0.415796 38361458 T C
+21 rs115711809 0.415803 38362498 G A
+21 rs2409872 0.415806 38362985 G A
+21 rs11910597 0.415955 38384946 G T
+21 rs3761360 0.415964 38386216 G A
+21 rs2266592 0.416021 38394733 C A
+21 rs148101413 0.41606 38400451 A G
+21 rs148136816 0.416239 38416034 G A
+21 rs762372 0.416247 38417361 A G
+21 rs141970591 0.416252 38420970 G A
+21 rs12185821 0.416253 38422733 C T
+21 rs79801167 0.417267 38442191 A G
+21 rs28503645 0.417269 38443674 A G
+21 rs4817843 0.417321 38447040 A C
+21 rs60447195 0.417321 38448600 T A
+21 rs8131246 0.417321 38451926 G A
+21 rs13050226 0.417321 38457708 C T
+21 rs8130846 0.417327 38480242 A G
+21 rs117771271 0.417327 38482304 A G
+21 rs2154535 0.417327 38488908 G A
+21 rs74420981 0.41735 38495693 G A
+21 rs141610402 0.417356 38500731 T C
+21 rs9985082 0.417356 38501021 A G
+21 rs75144049 0.417357 38501185 C T
+21 rs2154537 0.417357 38502012 G C
+21 rs190527289 0.417378 38516221 G T
+21 rs1053808 0.417378 38525356 C T
+21 rs147582059 0.41738 38528855 A C
+21 rs2156077 0.417384 38539357 C T
+21 rs111570643 0.417384 38541778 G A
+21 rs2835645 0.417385 38546382 T C
+21 rs2156078 0.417385 38546473 T C
+21 rs8126491 0.417385 38551219 T C
+21 rs144627088 0.417385 38554329 G A
+21 rs7276711 0.417389 38578189 G A
+21 rs9976569 0.417409 38580810 C G
+21 rs11911729 0.417418 38592416 C A
+21 rs62226497 0.417421 38594601 G A
+21 rs1128922 0.417427 38596677 A G
+21 rs74791694 0.417468 38603836 G C
+21 rs73220516 0.417494 38608373 G A
+21 rs138006613 0.417536 38615603 C T
+21 rs2051399 0.41758 38627522 C T
+21 rs1080847 0.417588 38632880 G C
+21 rs9974208 0.417602 38644934 T C
+21 rs11701507 0.417768 38667835 C T
+21 rs9984095 0.418018 38678407 T C
+21 rs2840356 0.418066 38685113 G A
+21 rs9980746 0.418074 38686115 G A
+21 rs7278090 0.418148 38696481 G T
+21 rs73203904 0.418156 38697623 C T
+21 rs11911733 0.418182 38701247 T A
+21 rs2065304 0.418192 38702722 A G
+21 rs73903586 0.418754 38710971 C T
+21 rs8128060 0.419644 38727637 C G
+21 rs11701585 0.41965 38729691 C T
+21 rs75575279 0.419663 38732621 T A
+21 rs73214091 0.419671 38741739 C G
+21 rs76460412 0.419673 38749343 A G
+21 rs73216425 0.419683 38768439 G A
+21 rs192246294 0.419705 38770446 A T
+21 rs117029806 0.419707 38776680 A G
+21 rs182536099 0.419718 38788076 T C
+21 rs56232482 0.419731 38800069 G C
+21 rs2835735 0.419747 38802816 G A
+21 rs2835736 0.419747 38802873 C G
+21 rs28630928 0.419748 38803827 T A
+21 rs73218435 0.419758 38815456 T C
+21 rs139446358 0.419838 38854224 A G
+21 rs78436698 0.419844 38860312 G A
+21 rs138004108 0.419858 38873581 C G
+21 rs2835774 0.41986 38875232 A T
+21 rs2236687 0.419885 38889729 C T
+21 rs17229459 0.419895 38891587 C T
+21 rs11701836 0.419916 38892250 A G
+21 rs2835788 0.420008 38906071 C G
+21 rs2835791 0.420037 38910567 T G
+21 rs7282810 0.420054 38917463 T C
+21 rs73220452 0.420104 38922719 A T
+21 rs7281380 0.420277 38930922 C T
+21 rs2248669 0.420373 38940103 C G
+21 rs4817871 0.420379 38943410 G C
+21 rs58743101 0.420431 38957645 C T
+21 rs62224260 0.42046 38959689 A G
+21 rs113285258 0.420491 38961827 G A
+21 rs79486053 0.42054 38965342 G A
+21 rs9981372 0.420552 38966201 C T
+21 rs2835826 0.42087 38971126 G C
+21 rs7279695 0.420872 38971410 G A
+21 rs73906081 0.42094 38976159 A G
+21 rs9979022 0.421047 38982827 G A
+21 rs8129759 0.421052 38983111 C T
+21 rs73220491 0.421126 38988323 T G
+21 rs702859 0.42116 38997701 A G
+21 rs857975 0.421192 39001613 G T
+21 rs1787330 0.421195 39002787 A G
+21 rs2835864 0.421306 39023431 T A
+21 rs2248732 0.421311 39025049 A C
+21 rs56682341 0.421311 39025611 T C
+21 rs731329 0.421321 39029086 C T
+21 rs2835876 0.421322 39029255 T C
+21 rs1709819 0.421425 39036115 C A
+21 rs1709817 0.421425 39036349 A T
+21 rs8128422 0.421456 39040796 A G
+21 rs1709826 0.421475 39049473 A G
+21 rs724491 0.421489 39051786 G A
+21 rs55680031 0.421507 39056538 G A
+21 rs150761798 0.42151 39058801 T C
+21 rs860797 0.421523 39066804 C T
+21 rs858003 0.421533 39072984 G A
+21 rs112944150 0.421598 39084803 C A
+21 rs59350504 0.422229 39105534 C T
+21 rs67336869 0.422229 39105626 G A
+21 rs702865 0.422298 39118742 G A
+21 rs2835939 0.422365 39128848 T C
+21 rs188059434 0.422375 39129456 A T
+21 rs80289738 0.422403 39132130 G A
+21 rs2835951 0.422447 39145172 A G
+21 rs762145 0.422493 39146318 C T
+21 rs9680139 0.42257 39147099 G A
+21 rs861416 0.42311 39155095 C T
+21 rs2835977 0.423495 39182197 G A
+21 rs2835989 0.423617 39192552 A G
+21 rs3787839 0.423694 39202292 G A
+21 rs3787853 0.423904 39216661 T C
+21 rs3787854 0.423916 39220245 C T
+21 rs764165 0.424712 39231190 T C
+21 rs143926103 0.42472 39233266 C T
+21 rs2836028 0.424783 39240050 C T
+21 rs117154416 0.424785 39240260 C A
+21 rs8126718 0.424806 39242964 G A
+21 rs2409943 0.424815 39244171 T C
+21 rs2836029 0.426903 39254227 C A
+21 rs2836030 0.427667 39255727 A G
+21 rs56236575 0.427778 39268944 C T
+21 rs928766 0.427781 39269275 A G
+21 rs1537104 0.427797 39270929 A G
+21 rs2249899 0.427829 39276106 G A
+21 rs9976923 0.428474 39313380 T C
+21 rs60418075 0.428482 39314486 C T
+21 rs11088414 0.428514 39319456 G A
+21 rs4817905 0.430106 39329715 A G
+21 rs11910719 0.430174 39332588 C T
+21 rs1539902 0.430292 39337209 G A
+21 rs13052997 0.430295 39337307 C T
+21 rs8130416 0.430566 39344228 T G
+21 rs2836074 0.430632 39346333 C T
+21 rs2836076 0.430697 39350002 A G
+21 rs59729477 0.430852 39353242 C G
+21 rs2836079 0.431004 39355035 T C
+21 rs62222024 0.4315 39357533 T C
+21 rs2836085 0.431709 39359907 C T
+21 rs7284062 0.43182 39360803 C T
+21 rs2836096 0.43182 39363949 T C
+21 rs17815267 0.43182 39365472 G A
+21 rs1028999 0.431831 39368038 C A
+21 rs1029002 0.431835 39368177 T C
+21 rs17815279 0.431836 39368387 A G
+21 rs74870234 0.431836 39373325 C T
+21 rs974975 0.431837 39378169 G T
+21 rs139277783 0.432017 39388349 G T
+21 rs11702812 0.432022 39394979 C T
+21 rs2032091 0.432073 39417372 T C
+21 rs117443437 0.432095 39428600 C G
+21 rs74509656 0.432106 39438181 T A
+21 rs78246489 0.432109 39439742 T C
+21 rs113297965 0.432408 39452801 T C
+21 rs62223479 0.432442 39454239 C A
+21 rs61640919 0.432506 39485041 T C
+21 rs76954756 0.432533 39491202 T G
+21 rs2211855 0.432537 39494820 G A
+21 rs60876887 0.432539 39496307 C T
+21 rs62223504 0.43254 39497188 G T
+21 rs2211858 0.432544 39501464 C G
+21 rs62223513 0.43258 39527833 C T
+21 rs2836178 0.432599 39542974 G A
+21 rs7277351 0.432619 39556816 A C
+21 rs77058278 0.43262 39557079 A C
+21 rs13052748 0.43262 39557088 A G
+21 rs79904375 0.432624 39562929 T A
+21 rs114519939 0.432637 39565906 T A
+21 rs11700531 0.432638 39574717 G A
+21 rs71316628 0.432638 39576125 T A
+21 rs62223523 0.432666 39595228 T C
+21 rs7279619 0.432669 39598059 G A
+21 rs982807 0.432687 39602696 A C
+21 rs9984963 0.432793 39630944 T C
+21 rs2836251 0.432876 39633192 T G
+21 rs2836255 0.432923 39634196 A G
+21 rs2836258 0.432923 39635924 A C
+21 rs12329791 0.432949 39646635 C T
+21 rs113766510 0.432952 39646932 C T
+21 rs3804020 0.432953 39647125 A G
+21 rs7279943 0.432957 39647601 G A
+21 rs2836277 0.432962 39652400 G A
+21 rs2836278 0.432962 39652840 G A
+21 rs77033880 0.432963 39653641 C A
+21 rs13046434 0.432963 39654038 A G
+21 rs2836287 0.432976 39659057 C A
+21 rs9984636 0.43304 39687171 T G
+21 rs2211866 0.433055 39688107 G A
+21 rs4817938 0.433087 39690119 T C
+21 rs2836306 0.433205 39697436 C A
+21 rs2836307 0.433219 39698055 C T
+21 rs73211968 0.43326 39700260 A G
+21 rs56121533 0.433261 39700292 C T
+21 rs76808559 0.433325 39705041 C T
+21 rs2836309 0.433365 39707909 G A
+21 rs9976478 0.433416 39710855 T A
+21 rs17284924 0.433421 39714817 C A
+21 rs2836319 0.433422 39716601 G T
+21 rs75725139 0.433423 39717457 T A
+21 rs73423993 0.433495 39723120 G A
+21 rs6517462 0.433937 39730913 T C
+21 rs12152039 0.434724 39736972 G T
+21 rs2836345 0.434732 39738813 C A
+21 rs730854 0.434814 39749486 A C
+21 rs112785916 0.434817 39749692 T A
+21 rs117647402 0.434906 39764401 C G
+21 rs2836364 0.434919 39767874 C T
+21 rs3787889 0.434979 39770856 A T
+21 rs2156074 0.435145 39780646 G A
+21 rs2186345 0.435149 39781688 C T
+21 rs112392766 0.435345 39790518 G A
+21 rs1571703 0.435472 39794835 T G
+21 rs115717119 0.43552 39802736 A G
+21 rs56180262 0.435556 39805503 G A
+21 rs9305648 0.436465 39822906 C G
+21 rs2836421 0.437986 39835286 C T
+21 rs34825969 0.438048 39846142 G A
+21 rs11701454 0.438052 39847563 G A
+21 rs2836431 0.438059 39849917 C T
+21 rs76979970 0.438069 39853485 A G
+21 rs77106233 0.438079 39856897 T G
+21 rs2212933 0.43808 39856991 C A
+21 rs2836437 0.438129 39864584 A G
+21 rs73215941 0.438333 39879758 T A
+21 rs2155718 0.438419 39891105 G A
+21 rs10854385 0.438688 39909266 G A
+21 rs2836491 0.43882 39913016 A G
+21 rs9980821 0.438892 39921759 C G
+21 rs12482226 0.438897 39924114 A T
+21 rs12483227 0.438901 39925723 A G
+21 rs111569767 0.438907 39927840 T C
+21 rs4817950 0.438915 39930483 C T
+21 rs117013656 0.43903 39940290 C T
+21 rs8130680 0.4393 39948136 A G
+21 rs3827208 0.439341 39950587 G C
+21 rs79595200 0.439357 39951582 C T
+21 rs2836541 0.4395 39954413 T C
+21 rs71316649 0.440191 39958087 C G
+21 rs77282337 0.440822 39963158 G C
+21 rs13051259 0.440839 39964867 C T
+21 rs2836556 0.440887 39973770 A G
+21 rs57461137 0.440889 39975299 T C
+21 rs112776799 0.440889 39976107 G A
+21 rs2836568 0.440912 39977435 A G
+21 rs73217979 0.44092 39978443 G T
+21 rs73432179 0.440921 39980219 A G
+21 rs60914381 0.440921 39980456 C G
+21 rs73217988 0.440928 39986737 C T
+21 rs16996585 0.440936 39990729 T C
+21 rs460439 0.441044 39999894 T C
+21 rs464484 0.441075 40003117 T C
+21 rs460214 0.441093 40006081 T C
+21 rs11702424 0.441112 40008675 G A
+21 rs464519 0.441315 40024182 A G
+21 rs2836597 0.441318 40024424 G A
+21 rs1041780 0.441393 40044145 C T
+21 rs2836621 0.441435 40054149 C T
+21 rs139129346 0.441477 40061429 A G
+21 rs73222057 0.441479 40062563 G A
+21 rs6517477 0.441607 40072376 T C
+21 rs4817959 0.442089 40080202 C T
+21 rs80335778 0.44238 40086457 T C
+21 rs2836642 0.442423 40092424 G A
+21 rs11911341 0.442427 40093506 G A
+21 rs73205549 0.444697 40124746 C T
+21 rs118061384 0.4447 40125274 A G
+21 rs1209912 0.444929 40149642 A G
+21 rs56392700 0.445085 40159943 A G
+21 rs1209935 0.445094 40163327 C T
+21 rs59492909 0.445095 40163715 A T
+21 rs714781 0.445171 40178381 C T
+21 rs76490364 0.445207 40187028 T C
+21 rs2836694 0.445385 40203826 G C
+21 rs2836703 0.445415 40210162 T G
+21 rs1534824 0.445418 40210483 G A
+21 rs1534825 0.445419 40210516 G A
+21 rs468529 0.445522 40221741 T C
+21 rs17816027 0.445523 40222549 C T
+21 rs75765540 0.445585 40227927 G C
+21 rs425931 0.445632 40232326 C T
+21 rs7279660 0.445734 40236810 A G
+21 rs468637 0.445741 40239269 G A
+21 rs75571264 0.445789 40247834 G C
+21 rs414845 0.445807 40254465 T C
+21 rs62217843 0.44582 40260612 T A
+21 rs9982210 0.447267 40288511 T C
+21 rs7279192 0.447412 40293084 T G
+21 rs7279414 0.447418 40293421 A G
+21 rs11088457 0.447471 40296275 G A
+21 rs1888477 0.447492 40297536 T C
+21 rs10854391 0.447542 40301253 C T
+21 rs2142106 0.447556 40302129 A C
+21 rs2836786 0.448701 40332176 A G
+21 rs190174633 0.448709 40335993 C T
+21 rs79567064 0.44872 40342350 G T
+21 rs191872249 0.448732 40344466 C T
+21 rs11088461 0.448876 40354480 G A
+21 rs79902787 0.449218 40364553 A G
+21 rs17285812 0.449783 40382167 C T
+21 rs11701235 0.450137 40384135 C T
+21 rs55754456 0.450236 40385208 C T
+21 rs7275748 0.450279 40385736 G A
+21 rs8134572 0.451299 40396227 T G
+21 rs35474192 0.451378 40403534 G A
+21 rs1882776 0.451412 40405512 G A
+21 rs2242932 0.451435 40408664 T C
+21 rs1882774 0.451456 40413733 G A
+21 rs148351416 0.451497 40422362 C G
+21 rs4279007 0.451548 40424954 A G
+21 rs445593 0.451647 40448382 G A
+21 rs74641112 0.451664 40449829 G A
+21 rs4817986 0.452718 40465512 G T
+21 rs67603741 0.45298 40473620 A G
+21 rs2037922 0.45473 40486507 G A
+21 rs1554930 0.454747 40488359 A T
+21 rs2836891 0.454753 40489027 T C
+21 rs376521 0.454811 40491526 G A
+21 rs4314079 0.454872 40494979 G A
+21 rs1013129 0.454879 40498109 G A
+21 rs909181 0.45493 40499111 C G
+21 rs11700449 0.455436 40522419 A T
+21 rs56151678 0.455446 40526973 T G
+21 rs8132295 0.455446 40527774 C T
+21 rs148624590 0.455446 40529500 C A
+21 rs8134843 0.455454 40542522 G C
+21 rs2836930 0.455454 40545656 A G
+21 rs8129850 0.455456 40549805 A G
+21 rs73221192 0.455462 40554561 G A
+21 rs76075760 0.455463 40555609 T C
+21 rs112150498 0.455463 40555711 C G
+21 rs13050584 0.455466 40574090 T C
+21 rs148938553 0.455466 40578832 T C
+21 rs6517529 0.455467 40584598 T C
+21 rs142893816 0.455469 40586908 C T
+21 rs2898370 0.45547 40588948 T C
+21 rs2836958 0.455495 40623526 T C
+21 rs2836962 0.455512 40628982 T C
+21 rs75281336 0.455512 40641262 T C
+21 rs140126455 0.455512 40662217 A T
+21 rs9982111 0.455512 40665296 A G
+21 rs2056844 0.455512 40670460 G C
+21 rs8131150 0.455512 40677315 G C
+21 rs10154217 0.455512 40678195 T C
+21 rs9975562 0.455512 40678968 T C
+21 rs8128620 0.455515 40680170 G C
+21 rs8127986 0.455567 40687211 T A
+21 rs150097632 0.455567 40688518 C T
+21 rs77027816 0.455574 40701657 C G
+21 rs8130854 0.455577 40707041 A G
+21 rs185596052 0.455582 40709290 A C
+21 rs13045993 0.455672 40727966 A T
+21 rs12106311 0.45569 40738620 G A
+21 rs71316134 0.45569 40738776 C T
+21 rs190523219 0.455764 40750555 T C
+21 rs185350809 0.455764 40750563 T C
+21 rs117266211 0.455822 40763494 G A
+21 rs2837007 0.455841 40772484 T C
+21 rs2837027 0.455903 40799797 G A
+21 rs8129943 0.455903 40807057 A G
+21 rs76141934 0.455908 40812635 T C
+21 rs9983404 0.45592 40817127 C G
+21 rs2837035 0.455921 40818379 A G
+21 rs78655815 0.455932 40828483 G C
+21 rs7277633 0.455981 40836441 A G
+21 rs141312238 0.456115 40842297 G A
+21 rs1735136 0.456139 40844843 T A
+21 rs9975004 0.456142 40845626 T C
+21 rs7277107 0.45625 40862760 G A
+21 rs2837052 0.456257 40869963 G A
+21 rs112960656 0.456269 40875714 C T
+21 rs9636960 0.45627 40877030 C T
+21 rs2246616 0.456287 40886816 T C
+21 rs16997766 0.456349 40891157 C T
+21 rs2837061 0.456353 40891862 C T
+21 rs76031704 0.456354 40892575 C T
+21 rs77236117 0.456355 40893441 C A
+21 rs2837064 0.456356 40894230 T C
+21 rs2837067 0.456357 40894840 T G
+21 rs17408203 0.456357 40894896 A G
+21 rs118111352 0.456706 40900002 A G
+21 rs645495 0.456721 40900795 C T
+21 rs117114519 0.456728 40901172 T C
+21 rs570510 0.456733 40901431 G A
+21 rs74813064 0.456735 40901563 T C
+21 rs7279098 0.456746 40903686 C T
+21 rs574863 0.456876 40911324 C T
+21 rs74552879 0.456896 40913992 T C
+21 rs608265 0.456914 40923307 T C
+21 rs11700560 0.456919 40927194 A G
+21 rs2837078 0.456976 40938031 G A
+21 rs561166 0.457004 40941132 T G
+21 rs628045 0.457025 40942375 C T
+21 rs4818058 0.457026 40942475 C T
+21 rs617260 0.457026 40942490 T C
+21 rs599377 0.45707 40944211 G C
+21 rs77949542 0.457088 40945212 C T
+21 rs572926 0.457088 40945268 C T
+21 rs193123070 0.457112 40947330 A G
+21 rs144058957 0.457176 40957762 C T
+21 rs73358157 0.457179 40958546 A G
+21 rs578349 0.457193 40961915 T G
+21 rs116943326 0.457201 40964005 C T
+21 rs760163 0.457233 40970832 C T
+21 rs583953 0.457246 40971786 C A
+21 rs112325952 0.457255 40972378 C T
+21 rs4816636 0.459094 41026510 G T
+21 rs734413 0.459746 41032804 A G
+21 rs909182 0.459761 41033816 G A
+21 rs79728168 0.460091 41042478 C T
+21 rs9975884 0.460126 41043190 G T
+21 rs79171747 0.460145 41044980 T C
+21 rs980184 0.460227 41056338 G A
+21 rs16998145 0.460672 41074844 C T
+21 rs8128755 0.46069 41077248 A T
+21 rs56112847 0.460762 41087850 G A
+21 rs12626282 0.460797 41093854 T G
+21 rs76733558 0.460926 41095898 G A
+21 rs78695875 0.462431 41098725 A G
+21 rs4818075 0.462707 41107044 T C
+21 rs144501849 0.464298 41125951 T C
+21 rs10470189 0.464301 41126946 A G
+21 rs71316138 0.464302 41127439 A G
+21 rs4818083 0.464303 41127767 G A
+21 rs2837173 0.464312 41131437 C T
+21 rs2837183 0.464319 41135338 T C
+21 rs73217279 0.464405 41142775 A G
+21 rs2837213 0.464528 41148563 A G
+21 rs2205205 0.46453 41149964 G A
+21 rs73217299 0.464576 41154754 C G
+21 rs1118103 0.464579 41155495 G C
+21 rs73219403 0.464604 41159823 C A
+21 rs117111901 0.464623 41161542 G A
+21 rs463903 0.464632 41165677 A G
+21 rs462163 0.464656 41175808 C T
+21 rs462782 0.464723 41193094 T G
+21 rs73221209 0.464728 41196507 A G
+21 rs461019 0.464728 41197984 G A
+21 rs56374307 0.464728 41199722 G A
+21 rs55684706 0.464728 41200994 T G
+21 rs16998635 0.464728 41203015 T C
+21 rs7283816 0.464728 41204081 C G
+21 rs55728093 0.464728 41205217 G A
+21 rs67897619 0.464729 41206282 C T
+21 rs66820219 0.464729 41206344 A C
+21 rs66920289 0.464729 41206542 G A
+21 rs2837239 0.464729 41207683 G A
+21 rs55884303 0.464729 41207783 A C
+21 rs55928419 0.464729 41207829 A G
+21 rs111512893 0.464729 41208691 C T
+21 rs112903189 0.464729 41208697 A T
+21 rs1004663 0.464729 41209790 G C
+21 rs2837244 0.464729 41210034 G C
+21 rs2837246 0.464773 41215018 T C
+21 rs419536 0.464846 41225158 C T
+21 rs11910004 0.465335 41241547 C T
+21 rs71316701 0.465335 41241571 C G
+21 rs62235432 0.465354 41248127 G A
+21 rs2276520 0.465424 41258505 G A
+21 rs2299759 0.466871 41269091 G C
+21 rs2075719 0.466878 41270521 C T
+21 rs2837284 0.46689 41275401 G A
+21 rs62236564 0.466894 41275463 C T
+21 rs2837285 0.466919 41275890 G A
+21 rs2299769 0.468719 41280181 G A
+21 rs4143343 0.469531 41285481 T C
+21 rs2299789 0.469615 41291358 T G
+21 rs9977425 0.469629 41292364 T A
+21 rs9984322 0.46964 41294162 C T
+21 rs9977486 0.47081 41321735 C T
+21 rs74932549 0.470936 41334070 T G
+21 rs1123923 0.470942 41335367 T G
+21 rs736288 0.470977 41340005 T C
+21 rs1734920 0.471513 41344911 G T
+21 rs62236620 0.472124 41348304 T C
+21 rs992039 0.472142 41348521 A G
+21 rs67451115 0.472196 41353045 A G
+21 rs56205306 0.47221 41353879 C T
+21 rs760130 0.472247 41356039 C T
+21 rs111445113 0.472276 41358845 G A
+21 rs4818105 0.472348 41370397 G A
+21 rs1033340 0.47235 41370589 G T
+21 rs9975535 0.472433 41374583 T C
+21 rs78659544 0.472466 41376800 A G
+21 rs10775663 0.472506 41379496 A G
+21 rs2837369 0.472548 41382468 T C
+21 rs3901224 0.473492 41389090 A G
+21 rs28605097 0.474264 41405382 C T
+21 rs9981861 0.475553 41415044 T C
+21 rs28582765 0.475742 41418714 G T
+21 rs2837405 0.475834 41429710 T C
+21 rs13051351 0.475843 41430001 C T
+21 rs2065315 0.475929 41437081 G A
+21 rs8130946 0.476588 41453043 T C
+21 rs11701562 0.477035 41467123 G T
+21 rs78015389 0.477056 41472520 G A
+21 rs13049725 0.477093 41476880 A G
+21 rs79072855 0.477101 41484983 A T
+21 rs1235600 0.477101 41485700 T C
+21 rs2837438 0.477153 41498063 G A
+21 rs74901161 0.477249 41511118 A G
+21 rs75018081 0.47726 41511553 A T
+21 rs75988415 0.47732 41513870 C T
+21 rs145680049 0.477336 41515589 G A
+21 rs9978649 0.477357 41518160 G T
+21 rs4402848 0.477359 41519133 G A
+21 rs73225226 0.477362 41520075 T C
+21 rs10483063 0.477362 41520150 T C
+21 rs73225239 0.47738 41523183 A G
+21 rs8132673 0.477424 41525276 G T
+21 rs7275239 0.478723 41531337 T C
+21 rs67142512 0.480496 41546416 C G
+21 rs2837480 0.480501 41546955 T C
+21 rs117375757 0.480712 41555147 G C
+21 rs1882791 0.48072 41555446 G A
+21 rs73364118 0.480763 41557287 G A
+21 rs66564794 0.480768 41558058 G A
+21 rs57055280 0.480812 41565282 C A
+21 rs1882796 0.480815 41565675 G C
+21 rs113663397 0.480843 41569270 C A
+21 rs60693305 0.480851 41571250 A T
+21 rs117944090 0.480854 41571605 A G
+21 rs7278621 0.480858 41572207 A C
+21 rs2178849 0.480881 41575158 A G
+21 rs2178850 0.480882 41575227 T C
+21 rs11702147 0.480899 41577346 A G
+21 rs2837499 0.480902 41577819 C T
+21 rs8131481 0.481075 41581722 C T
+21 rs73230354 0.481118 41588385 A C
+21 rs116979427 0.481124 41589547 T C
+21 rs9982501 0.481155 41594632 G A
+21 rs2837513 0.481173 41596554 T C
+21 rs2837516 0.481179 41597421 C T
+21 rs449728 0.481194 41598581 C G
+21 rs7281487 0.481351 41613577 G A
+21 rs117210180 0.481612 41648897 G A
+21 rs117988684 0.481624 41651099 T A
+21 rs7282745 0.482415 41671653 T G
+21 rs1997542 0.482464 41676359 C A
+21 rs2837569 0.482468 41680306 G T
+21 rs10211903 0.482585 41684994 C T
+21 rs143659674 0.483666 41712347 T C
+21 rs117043406 0.483698 41727221 C T
+21 rs143243590 0.483771 41740860 T C
+21 rs2017995 0.483819 41755231 C A
+21 rs78140159 0.484123 41768402 T C
+21 rs74911059 0.484174 41774550 C A
+21 rs80268298 0.484183 41776452 C T
+21 rs66519541 0.484191 41779019 T C
+21 rs73217015 0.485308 41804377 C T
+21 rs8133661 0.485419 41805199 C T
+21 rs10439672 0.48545 41813285 T C
+21 rs4818130 0.485459 41814287 T A
+21 rs73902634 0.486136 41834322 T C
+21 rs8127921 0.486149 41839209 G A
+21 rs9983133 0.486155 41841134 T C
+21 rs2837647 0.486168 41848101 C T
+21 rs62225479 0.486169 41848643 T C
+21 rs9977414 0.48623 41855584 T C
+21 rs8131537 0.486261 41859770 C A
+21 rs2837660 0.486271 41861838 G C
+21 rs8128255 0.486292 41869508 G A
+21 rs2094876 0.486329 41882205 A G
+21 rs9980603 0.48635 41886583 T C
+21 rs144729058 0.486361 41889069 A G
+21 rs61557310 0.486391 41895763 T C
+21 rs2205133 0.486484 41899139 T A
+21 rs62225549 0.486505 41903870 G A
+21 rs76494176 0.486513 41904952 G T
+21 rs8133246 0.486534 41907203 T C
+21 rs2837694 0.486635 41910637 A G
+21 rs16999959 0.486642 41910944 T C
+21 rs55714611 0.48748 41917545 C G
+21 rs2837700 0.488132 41922017 C T
+21 rs35202506 0.488139 41925725 A G
+21 rs2837704 0.488141 41927021 C T
+21 rs4818143 0.488142 41927859 C T
+21 rs458934 0.488334 41949401 C T
+21 rs462992 0.488334 41949690 G A
+21 rs13048982 0.488334 41950029 G A
+21 rs117868113 0.488352 41955788 C T
+21 rs9978289 0.488352 41955789 A G
+21 rs465127 0.488361 41959552 A T
+21 rs455772 0.488368 41962135 G A
+21 rs460181 0.488407 41963228 G T
+21 rs2103480 0.488431 41970874 C T
+21 rs56291275 0.488441 41974464 T C
+21 rs10460679 0.488454 41977675 C T
+21 rs146539748 0.488461 41980044 G A
+21 rs6517602 0.488478 41985748 T G
+21 rs2837753 0.488554 41990749 A G
+21 rs73216202 0.488595 41991343 T A
+21 rs3827218 0.488805 41993758 A G
+21 rs35178623 0.489092 42003866 G C
+21 rs35786508 0.489117 42004353 T A
+21 rs1015993 0.489165 42005288 C T
+21 rs34651383 0.489192 42009791 G A
+21 rs8127399 0.489216 42015678 C G
+21 rs9984403 0.489224 42017949 A G
+21 rs987897 0.489236 42021805 A T
+21 rs35057352 0.489426 42025594 C T
+21 rs1537112 0.489767 42029143 C T
+21 rs8134076 0.489936 42044877 G T
+21 rs117799787 0.489939 42045580 G A
+21 rs7280280 0.490002 42055029 A T
+21 rs62223017 0.49174 42065826 C G
+21 rs11909903 0.491781 42073641 C T
+21 rs926090 0.491785 42075391 G C
+21 rs11910405 0.491791 42077795 A C
+21 rs71318535 0.491792 42077909 A G
+21 rs2026274 0.491811 42079127 C T
+21 rs71316171 0.491953 42084246 G A
+21 rs8129663 0.491957 42084630 C T
+21 rs66461687 0.49196 42084895 C A
+21 rs35035578 0.491994 42089417 C A
+21 rs13048841 0.492001 42090776 G A
+21 rs79566442 0.49203 42096412 G C
+21 rs13051235 0.4922 42103818 A T
+21 rs58114519 0.492247 42104122 A T
+21 rs8132900 0.492396 42106263 C T
+21 rs188073352 0.49251 42112106 A C
+21 rs2837804 0.492514 42115318 C T
+21 rs117532654 0.492518 42119105 C A
+21 rs2837806 0.492523 42123676 C T
+21 rs12482503 0.492609 42133903 G A
+21 rs56060434 0.49261 42133960 C T
+21 rs79676743 0.492613 42135350 A G
+21 rs148898802 0.492614 42135891 T C
+21 rs7279013 0.492618 42137851 G A
+21 rs1999329 0.492691 42140401 C T
+21 rs9808699 0.4927 42140619 A G
+21 rs77603305 0.492703 42140696 T C
+21 rs7277165 0.492866 42154583 G A
+21 rs7277205 0.492866 42154651 G A
+21 rs6517615 0.492888 42157509 T C
+21 rs7277017 0.492903 42161703 G A
+21 rs13052165 0.492929 42169368 A G
+21 rs113430952 0.495271 42194554 G A
+21 rs2837843 0.496379 42203219 T C
+21 rs34812495 0.496713 42219820 C G
+21 rs8133945 0.498224 42226166 A T
+21 rs6517627 0.498367 42230435 C T
+21 rs77947474 0.498381 42233558 T C
+21 rs741772 0.498389 42234205 T C
+21 rs2222986 0.498408 42237629 A G
+21 rs79004969 0.498418 42240732 C G
+21 rs142315268 0.498425 42242860 G A
+21 rs2094878 0.498437 42246551 C T
+21 rs7279498 0.498705 42253571 G A
+21 rs6517635 0.498874 42257498 A G
+21 rs2837867 0.499081 42265286 C A
+21 rs61184507 0.500502 42279883 T C
+21 rs11700509 0.500624 42282191 T C
+21 rs2837873 0.500838 42284292 G C
+21 rs2150431 0.50086 42291678 C T
+21 rs7280789 0.500867 42292965 G T
+21 rs10222077 0.500892 42294868 C T
+21 rs7275916 0.500921 42300859 G A
+21 rs2837885 0.500993 42311908 G T
+21 rs74751940 0.50108 42314565 C T
+21 rs59264000 0.50185 42331690 A G
+21 rs117183559 0.502076 42337481 A G
+21 rs35445942 0.502147 42351105 A C
+21 rs9975638 0.50216 42361699 G C
+21 rs8133497 0.502195 42369070 C T
+21 rs117693750 0.502219 42373262 T A
+21 rs62636184 0.503717 42384998 C T
+21 rs2837902 0.503735 42388524 G A
+21 rs12626949 0.503763 42393205 T C
+21 rs2183582 0.503817 42404017 T C
+21 rs1921982 0.504636 42439105 T C
+21 rs2776340 0.504671 42442550 A G
+21 rs61420067 0.504691 42445184 T A
+21 rs4818202 0.504691 42445300 A C
+21 rs73360506 0.504726 42449519 C T
+21 rs9982249 0.504752 42451715 G A
+21 rs8134658 0.504766 42453728 G A
+21 rs2837928 0.504784 42463764 T C
+21 rs80339162 0.504785 42468546 C T
+21 rs2837948 0.504785 42470603 A G
+21 rs9808669 0.504798 42475390 C T
+21 rs73902917 0.504812 42478282 C T
+21 rs73902918 0.504814 42478750 G A
+21 rs8126990 0.504837 42483632 T G
+21 rs58292227 0.504839 42484001 C A
+21 rs8129429 0.504954 42489730 A G
+21 rs2776341 0.504977 42490274 A C
+21 rs116866410 0.506057 42514815 A G
+21 rs1001453 0.506067 42515277 G A
+21 rs62217915 0.506122 42519037 A C
+21 rs112077699 0.506146 42520773 G T
+21 rs13047965 0.506195 42524342 G A
+21 rs9305726 0.506208 42525272 G A
+21 rs9305727 0.50621 42525411 A G
+21 rs113945363 0.506211 42525452 A G
+21 rs914176 0.506249 42528192 T G
+21 rs77270082 0.506297 42531725 C T
+21 rs146646159 0.506386 42539814 G C
+21 rs4818217 0.507326 42546059 G C
+21 rs55854621 0.509783 42563352 G A
+21 rs9980345 0.509813 42564184 C T
+21 rs71318561 0.510274 42572435 G A
+21 rs138837252 0.510427 42576164 C T
+21 rs1888518 0.510529 42579521 T C
+21 rs7510366 0.51061 42581927 C T
+21 rs35939063 0.510639 42584621 A G
+21 rs113982325 0.510772 42589425 A G
+21 rs34616176 0.510974 42595176 T C
+21 rs28629220 0.51101 42597387 T C
+21 rs9978837 0.511251 42605846 G C
+21 rs79062280 0.511301 42607306 G A
+21 rs17000715 0.511376 42610008 C T
+21 rs74977559 0.511467 42613255 C G
+21 rs67237909 0.511645 42616201 G A
+21 rs67525224 0.51183 42627220 G C
+21 rs4818226 0.511881 42633065 G A
+21 rs3787945 0.511907 42640022 G T
+21 rs58534150 0.511934 42640903 C T
+21 rs12483323 0.511977 42643845 C T
+21 rs144512707 0.512027 42651868 G A
+21 rs1072869 0.512784 42657548 A C
+21 rs73905316 0.513116 42668059 A T
+21 rs147323220 0.513137 42671437 C T
+21 rs11700807 0.513162 42675504 T G
+21 rs9976217 0.513183 42679042 A G
+21 rs112075738 0.513194 42680958 G T
+21 rs67755284 0.513274 42690363 G A
+21 rs2838009 0.513412 42694284 G T
+21 rs2838013 0.513437 42695161 A G
+21 rs2838014 0.513451 42697714 C G
+21 rs75055531 0.513947 42712996 A G
+21 rs73226128 0.513957 42714474 A G
+21 rs4818234 0.51397 42716275 G A
+21 rs60265870 0.513975 42716967 G A
+21 rs2838016 0.513987 42718515 C T
+21 rs188887615 0.513996 42719722 C T
+21 rs113748702 0.51638 42730082 C T
+21 rs982871 0.516393 42731475 C T
+21 rs9981317 0.516446 42737181 G A
+21 rs398206 0.51649 42742036 A C
+21 rs146454620 0.517318 42752835 C A
+21 rs143197614 0.517841 42762886 T C
+21 rs9976700 0.517918 42766607 T C
+21 rs148934070 0.518041 42772655 C T
+21 rs10222067 0.518241 42787804 G T
+21 rs80343070 0.518351 42792528 G T
+21 rs139557023 0.518471 42798318 G C
+21 rs455599 0.518559 42804679 A G
+21 rs150013352 0.51857 42809418 A C
+21 rs7276891 0.518783 42823805 T C
+21 rs2070787 0.519087 42842129 T G
+21 rs191773371 0.519127 42846868 A G
+21 rs150003718 0.519129 42847005 A G
+21 rs9636988 0.519155 42850045 T C
+21 rs61735789 0.519178 42852435 G A
+21 rs9976780 0.519249 42855235 T C
+21 rs146865760 0.519285 42856216 C T
+21 rs4818240 0.519316 42857302 C T
+21 rs139305247 0.519345 42860655 C T
+21 rs391099 0.519371 42863779 T C
+21 rs55760462 0.519932 42868206 T C
+21 rs2187238 0.519992 42872751 T C
+21 rs73230088 0.520056 42879604 C A
+21 rs75603675 0.520058 42879909 C A
+21 rs11088551 0.520061 42880316 A G
+21 rs13046567 0.520085 42884586 T C
+21 rs28751065 0.520091 42885959 C A
+21 rs11701382 0.52013 42890643 G T
+21 rs35247320 0.52015 42892744 G A
+21 rs76044243 0.520229 42899373 G A
+21 rs2838048 0.520334 42908010 A G
+21 rs73359734 0.520374 42911619 C A
+21 rs2838058 0.521078 42924907 C A
+21 rs73359771 0.52148 42932714 G A
+21 rs12627738 0.521753 42938794 G A
+21 rs181980059 0.521755 42939151 C A
+21 rs11088555 0.521761 42939918 T C
+21 rs11702046 0.521769 42941137 C T
+21 rs28550108 0.521872 42942294 T A
+21 rs60194694 0.521906 42942620 G A
+21 rs77471525 0.522303 42946470 T A
+21 rs28875606 0.522386 42947275 C T
+21 rs146420318 0.522687 42950184 G A
+21 rs3922457 0.522755 42950842 T C
+21 rs28700940 0.522882 42952075 C T
+21 rs28580771 0.523202 42955173 G T
+21 rs2838068 0.529039 43012237 A G
+21 rs7281619 0.529185 43024918 C T
+21 rs74688298 0.529256 43026507 C T
+21 rs72613633 0.529282 43026920 G A
+21 rs73903452 0.52931 43027373 G A
+21 rs62219346 0.529391 43029809 G T
+21 rs76579998 0.529555 43034223 G A
+21 rs79348973 0.52956 43034276 A G
+21 rs78385097 0.529577 43034452 C T
+21 rs117092792 0.529941 43044789 C T
+21 rs2309095 0.530017 43047680 G A
+21 rs140081430 0.530028 43048119 G A
+21 rs62217773 0.530087 43051391 T G
+21 rs36015477 0.530209 43059169 T C
+21 rs7275423 0.530311 43060705 T A
+21 rs77935309 0.530759 43069386 C A
+21 rs113905126 0.532356 43074655 C T
+21 rs8132754 0.533362 43080711 G T
+21 rs79904875 0.533365 43080950 C T
+21 rs2255892 0.533584 43087772 C T
+21 rs80311167 0.533786 43093880 C T
+21 rs2849704 0.533901 43097354 T C
+21 rs2849705 0.533938 43098465 A C
+21 rs74721445 0.534078 43101097 C A
+21 rs75408487 0.53416 43102027 C T
+21 rs8132108 0.534514 43108664 C G
+21 rs141491431 0.534519 43109691 C A
+21 rs35108587 0.534979 43121290 C T
+21 rs34811135 0.534979 43121294 T C
+21 rs9982707 0.535691 43129518 C G
+21 rs34439510 0.535736 43130030 G A
+21 rs34230257 0.535885 43131753 G A
+21 rs4919935 0.536319 43136728 T A
+21 rs144311111 0.536563 43140837 C T
+21 rs13048502 0.538365 43167066 A C
+21 rs116954548 0.538459 43179525 T C
+21 rs117871010 0.53846 43180218 T C
+21 rs3787956 0.538461 43180883 G A
+21 rs35744208 0.538474 43182401 C A
+21 rs73371629 0.538492 43183371 G A
+21 rs4919944 0.538538 43185765 C A
+21 rs6586239 0.53856 43187167 G C
+21 rs35879470 0.538937 43198648 G A
+21 rs13049138 0.539151 43202443 C T
+21 rs186053893 0.539168 43202702 C T
+21 rs113036430 0.539194 43203116 G C
+21 rs12482924 0.539238 43211684 C T
+21 rs3737427 0.539374 43223255 G A
+21 rs2298687 0.539384 43224683 G A
+21 rs150080619 0.539456 43234116 G A
+21 rs74592564 0.539461 43234763 T C
+21 rs28477638 0.539504 43239972 G A
+21 rs28708536 0.539546 43244905 G A
+21 rs28738914 0.539549 43245242 A G
+21 rs188390156 0.539721 43264913 T C
+21 rs79379629 0.539756 43267379 G C
+21 rs8130749 0.539802 43270327 T A
+21 rs8129271 0.539803 43270524 G C
+21 rs79091252 0.539894 43277787 T C
+21 rs140798254 0.540044 43288025 G T
+21 rs3746898 0.540082 43291053 A G
+21 rs7275618 0.540101 43292621 T C
+21 rs35109371 0.540103 43292852 C T
+21 rs11702265 0.540134 43297227 C T
+21 rs35840201 0.540248 43303558 A T
+21 rs9984118 0.540477 43309706 C T
+21 rs9974785 0.54058 43312993 G A
+21 rs170135 0.540631 43314604 G A
+21 rs172637 0.540853 43318871 G C
+21 rs7283940 0.541726 43325145 G C
+21 rs12483229 0.542018 43338933 A T
+21 rs9977215 0.542114 43343190 C T
+21 rs365163 0.542582 43345207 A G
+21 rs116569272 0.542687 43352399 G A
+21 rs518035 0.542912 43358595 C T
+21 rs9325617 0.542993 43359769 T C
+21 rs56796252 0.54395 43375811 A G
+21 rs576808 0.543958 43376453 A G
+21 rs4920041 0.543977 43378083 T C
+21 rs13049363 0.544002 43380231 A T
+21 rs146963847 0.544077 43386085 A G
+21 rs7276077 0.544687 43394871 C A
+21 rs7283480 0.544689 43395064 A T
+21 rs9980268 0.544704 43396664 T A
+21 rs73221473 0.544816 43401611 T C
+21 rs151076584 0.544886 43425908 G T
+21 rs141659792 0.544895 43427170 A G
+21 rs9981167 0.545031 43434243 C T
+21 rs76414588 0.545086 43435343 G T
+21 rs220202 0.545245 43438781 C T
+21 rs2839439 0.545258 43439562 G A
+21 rs220208 0.545261 43439679 T C
+21 rs117995466 0.545261 43439681 A C
+21 rs73221486 0.546566 43445366 A C
+21 rs411725 0.546934 43453953 G C
+21 rs220235 0.546949 43454252 T C
+21 rs1788356 0.546959 43454440 T C
+21 rs186659716 0.547591 43457314 C T
+21 rs189352333 0.549862 43462007 G A
+21 rs78248586 0.549898 43465959 C T
+21 rs1869770 0.549936 43470113 C T
+21 rs220252 0.549955 43472438 G A
+21 rs220253 0.54998 43475879 T C
+21 rs220260 0.550036 43480111 C G
+21 rs117201265 0.550104 43484207 T C
+21 rs2124341 0.550831 43497858 T C
+21 rs220291 0.550865 43498042 T G
+21 rs149896652 0.551739 43503563 C T
+21 rs220305 0.551883 43511179 G A
+21 rs220314 0.552034 43519442 T C
+21 rs220318 0.552039 43520106 C T
+21 rs187025147 0.552096 43529173 G C
+21 rs220130 0.55217 43531808 A C
+21 rs61137659 0.552186 43532456 C T
+21 rs73223546 0.552361 43536262 T C
+21 rs35523457 0.553148 43554337 G A
+21 rs11911271 0.553328 43555491 T C
+21 rs3819142 0.553518 43557698 A C
+21 rs73373683 0.553616 43564848 G A
+21 rs4283513 0.553619 43566409 G T
+21 rs113909092 0.553621 43566848 C T
+21 rs9978854 0.553624 43567731 G A
+21 rs148354929 0.553625 43568086 G C
+21 rs62215138 0.553625 43568112 C T
+21 rs220184 0.553641 43571418 G A
+21 rs113385692 0.553691 43595643 A T
+21 rs144535424 0.553703 43599517 T A
+21 rs117900671 0.553715 43604140 G A
+21 rs221952 0.553804 43609733 C T
+21 rs113871782 0.55384 43613970 G A
+21 rs8126796 0.553851 43615752 G A
+21 rs141210811 0.553875 43625640 G A
+21 rs1840593 0.553886 43626673 G A
+21 rs4148088 0.553906 43628935 C G
+21 rs4148089 0.553914 43630172 C T
+21 rs146978161 0.553921 43631270 G A
+21 rs145656134 0.553932 43633045 C T
+21 rs2234715 0.553969 43638855 A G
+21 rs4148098 0.553998 43639553 T A
+21 rs9976024 0.554098 43641657 G A
+21 rs915846 0.555322 43652782 C G
+21 rs113560275 0.555392 43661477 G A
+21 rs7283699 0.55543 43663318 G A
+21 rs2839478 0.555454 43667082 G A
+21 rs9979980 0.555635 43671220 C T
+21 rs113669242 0.555769 43673375 A G
+21 rs4148120 0.555774 43673477 G C
+21 rs225391 0.555927 43684668 G A
+21 rs6586299 0.5563 43698073 T G
+21 rs4148123 0.556323 43698251 G A
+21 rs144038443 0.556427 43699616 C T
+21 rs2839482 0.556452 43700432 A G
+21 rs3788005 0.556505 43702023 C T
+21 rs56337741 0.557595 43716554 G A
+21 rs4148140 0.557612 43718611 C T
+21 rs1023156 0.557622 43720525 G A
+21 rs117158840 0.557627 43721660 C T
+21 rs11910395 0.557632 43722586 C T
+21 rs6586306 0.55764 43723284 A G
+21 rs73364691 0.557647 43723636 T C
+21 rs138410757 0.557761 43726857 C A
+21 rs73225454 0.55929 43731068 C T
+21 rs78811301 0.559425 43740523 G A
+21 rs56097231 0.559426 43740637 G A
+21 rs142488053 0.559592 43756572 A T
+21 rs225325 0.559646 43762840 G A
+21 rs1867032 0.559654 43763376 G A
+21 rs225336 0.55972 43767414 A T
+21 rs3814896 0.559783 43771711 A G
+21 rs111696381 0.559784 43771843 A C
+21 rs172825 0.559794 43773253 G C
+21 rs116937309 0.5598 43775213 T C
+21 rs225343 0.559808 43775780 C A
+21 rs178740 0.559866 43777672 T C
+21 rs9978991 0.559899 43781594 T C
+21 rs143991203 0.5599 43781709 G A
+21 rs117091392 0.55993 43784414 C T
+21 rs12626334 0.560438 43791233 C T
+21 rs225430 0.560857 43801712 C A
+21 rs17114826 0.560872 43804413 A G
+21 rs2839501 0.560892 43805637 C T
+21 rs2839504 0.561935 43808004 A C
+21 rs117552882 0.563312 43814544 C G
+21 rs11203200 0.563312 43814575 G A
+21 rs11909615 0.563316 43815042 A G
+21 rs225319 0.563339 43815773 A G
+21 rs117267633 0.563841 43819281 G A
+21 rs112745380 0.56385 43822174 C T
+21 rs11203201 0.563866 43823726 C G
+21 rs9974289 0.56388 43827562 G A
+21 rs77546520 0.56391 43840089 T G
+21 rs3788013 0.563911 43841328 C A
+21 rs11912007 0.563935 43849054 C G
+21 rs11910025 0.563935 43849109 A G
+21 rs115641947 0.563942 43850905 C T
+21 rs11910326 0.564257 43853459 G A
+21 rs2276236 0.564311 43862255 C A
+21 rs112764046 0.564378 43865293 C T
+21 rs3827231 0.564463 43866166 G C
+21 rs11700872 0.565031 43882741 C G
+21 rs58969530 0.565053 43885402 T G
+21 rs9974527 0.565055 43885551 G A
+21 rs56000893 0.565078 43887092 C T
+21 rs55912114 0.565084 43887469 G A
+21 rs78900688 0.565488 43895896 C T
+21 rs2839535 0.565851 43905327 T C
+21 rs59775135 0.565865 43912021 G A
+21 rs9984443 0.56591 43929514 T A
+21 rs111991358 0.565932 43934148 T G
+21 rs113450922 0.565949 43936246 G C
+21 rs12482752 0.5664 43957256 G A
+21 rs9976600 0.566443 43970875 C T
+21 rs118137491 0.566444 43970988 T C
+21 rs142595043 0.566454 43978149 T G
+21 rs117428310 0.566462 43980347 A G
+21 rs143204768 0.5667 43986436 C T
+21 rs62213023 0.566772 43995327 C T
+21 rs117965939 0.566775 43995935 A G
+21 rs451954 0.566775 43995954 G A
+21 rs17767630 0.566777 43999418 G T
+21 rs17178345 0.566783 44000861 A G
+21 rs2849721 0.566792 44003148 C G
+21 rs388831 0.5668 44004149 T C
+21 rs112822782 0.566922 44016037 C A
+21 rs410698 0.567149 44021566 G C
+21 rs7282743 0.567668 44025148 A G
+21 rs440431 0.568333 44032712 C G
+21 rs66519542 0.568512 44037888 C T
+21 rs9976560 0.568523 44038275 C G
+21 rs36058049 0.56854 44041193 G A
+21 rs73229525 0.568589 44047969 T C
+21 rs143994722 0.568659 44052979 G A
+21 rs408967 0.568696 44056499 G T
+21 rs76732039 0.568709 44059261 T A
+21 rs1539969 0.568718 44061352 A C
+21 rs404714 0.568718 44061357 C T
+21 rs12233328 0.568725 44063008 T G
+21 rs36098823 0.569203 44075537 G T
+21 rs61639224 0.569222 44076382 G T
+21 rs9981767 0.569319 44079875 C A
+21 rs71320536 0.569321 44080009 C T
+21 rs139069801 0.569386 44084914 C A
+21 rs2269136 0.569575 44093942 G A
+21 rs71320544 0.569703 44102020 A G
+21 rs35729571 0.569898 44110493 C T
+21 rs756577 0.569917 44118323 C T
+21 rs13049942 0.569932 44120767 G C
+21 rs8130322 0.569962 44126130 C T
+21 rs71320552 0.57001 44143109 G C
+21 rs148256699 0.570037 44146829 C G
+21 rs56760862 0.570039 44147511 A G
+21 rs962287 0.570097 44155009 T G
+21 rs75905062 0.570332 44158447 G A
+21 rs13052351 0.570768 44163087 T C
+21 rs13047677 0.57092 44168188 A C
+21 rs2269163 0.571341 44181240 C G
+21 rs66484525 0.571371 44183133 T G
+21 rs113090704 0.571376 44184429 T A
+21 rs6586346 0.571429 44189983 C A
+21 rs75918224 0.571478 44191402 C G
+21 rs35179485 0.571772 44198015 G A
+21 rs9653817 0.571848 44200676 A G
+21 rs7283656 0.571887 44202023 T C
+21 rs12053696 0.572043 44207846 A G
+21 rs35628508 0.572271 44227260 T C
+21 rs7277934 0.572294 44229483 C T
+21 rs77802091 0.572322 44238499 T C
+21 rs112978626 0.57236 44251944 A G
+21 rs73231680 0.572365 44254621 T C
+21 rs2156317 0.572367 44257918 C T
+21 rs76271836 0.572371 44263321 G A
+21 rs35709881 0.572372 44264824 G A
+21 rs60406170 0.572487 44269860 A C
+21 rs77095994 0.572488 44270931 G A
+21 rs884153 0.572489 44271703 A G
+21 rs464854 0.572494 44275033 G T
+21 rs1236634 0.572521 44278105 A G
+21 rs62219107 0.572525 44283830 C T
+21 rs7280803 0.572526 44283938 T C
+21 rs7280082 0.572526 44284142 G A
+21 rs7281227 0.572526 44284179 C T
+21 rs13052506 0.572526 44284204 C A
+21 rs13048885 0.572529 44284794 T C
+21 rs111485799 0.57253 44287486 G C
+21 rs13051639 0.57253 44290674 G C
+21 rs151246629 0.57253 44290743 C T
+21 rs8134864 0.57253 44296362 C T
+21 rs12626259 0.57253 44297041 A G
+21 rs2839596 0.57253 44300978 A G
+21 rs57692532 0.57253 44305283 C T
+21 rs140279568 0.57253 44306258 G C
+21 rs11910684 0.572531 44310537 T G
+21 rs112235529 0.572533 44312171 G A
+21 rs6586259 0.572537 44315266 G A
+21 rs7277472 0.572537 44315388 T C
+21 rs8134328 0.572585 44333836 G C
+21 rs11911418 0.572586 44334837 G A
+21 rs34726798 0.572586 44335098 C T
+21 rs11555158 0.57259 44337914 A G
+21 rs80147563 0.572591 44341258 G A
+21 rs6586265 0.572592 44345180 T C
+21 rs73905787 0.572592 44345836 C G
+21 rs57834162 0.572593 44348810 T C
+21 rs12483651 0.572594 44350895 G A
+21 rs71320577 0.573234 44365432 C T
+21 rs6586276 0.573342 44369959 G A
+21 rs6586277 0.573348 44370227 T C
+21 rs9975683 0.573348 44370235 T C
+21 rs6586278 0.573349 44370290 C T
+21 rs62217611 0.573358 44370806 T C
+21 rs73367664 0.573408 44375641 C T
+21 rs140570802 0.573441 44382994 C T
+21 rs191379760 0.573441 44382999 C T
+21 rs11702023 0.573485 44386247 T C
+21 rs145521368 0.573571 44393836 G A
+21 rs71316190 0.573591 44398601 T C
+21 rs71316191 0.573591 44398624 C T
+21 rs4920025 0.573598 44401479 A G
+21 rs77434287 0.573599 44401810 C T
+21 rs71320589 0.573599 44402047 T C
+21 rs35348720 0.5736 44402247 C T
+21 rs1672137 0.573602 44403363 C G
+21 rs34209438 0.573604 44407068 C T
+21 rs2401150 0.573604 44407305 G A
+21 rs28506838 0.573604 44408453 G C
+21 rs139835089 0.573605 44411755 G A
+21 rs149717110 0.573606 44412398 G A
+21 rs12481921 0.573606 44414088 C T
+21 rs9982622 0.573607 44417564 A G
+21 rs56041312 0.573622 44430588 C T
+21 rs1788479 0.573633 44434836 C A
+21 rs77084451 0.573684 44451876 G A
+21 rs234742 0.573693 44454301 T C
+21 rs1672127 0.573693 44460184 G A
+21 rs35568502 0.573694 44463724 T C
+21 rs71320595 0.573697 44464223 C T
+21 rs12483705 0.573709 44466913 A T
+21 rs146347214 0.573714 44467056 C T
+21 rs8127973 0.573953 44474949 C T
+21 rs760124 0.573963 44475953 T C
+21 rs61123777 0.574021 44483748 T A
+21 rs1788466 0.574022 44483773 G A
+21 rs234706 0.574031 44485350 G A
+21 rs2851391 0.574045 44487404 C T
+21 rs71322506 0.574103 44491293 T C
+21 rs1788467 0.574238 44496062 G C
+21 rs117641509 0.574387 44500820 C T
+21 rs138673287 0.574401 44500991 T C
+21 rs59602964 0.574613 44503300 G A
+21 rs3972 0.575281 44506201 C T
+21 rs117627258 0.575633 44512726 G A
+21 rs1672139 0.575717 44534948 G C
+21 rs1789958 0.575724 44535681 C T
+21 rs77909915 0.575734 44536776 G A
+21 rs115916816 0.575973 44561267 C T
+21 rs9653810 0.575989 44562966 A G
+21 rs2839637 0.575999 44563995 C T
+21 rs112545620 0.576339 44569929 T C
+21 rs9980126 0.576393 44571239 A G
+21 rs73374287 0.576678 44579936 T C
+21 rs872331 0.57706 44589215 T C
+21 rs62217721 0.578437 44595039 G A
+21 rs11909053 0.579334 44606790 G A
+21 rs2839649 0.579344 44607103 A G
+21 rs28691163 0.579381 44608227 G A
+21 rs12482670 0.579396 44608660 T G
+21 rs150718026 0.579449 44612256 C G
+21 rs13046255 0.579457 44617110 C T
+21 rs9984575 0.58423 44712163 A G
+21 rs76917707 0.584234 44712356 A G
+21 rs142317129 0.584235 44712415 C A
+21 rs9976679 0.584328 44717003 C T
+21 rs73909407 0.584345 44717781 C T
+21 rs2051405 0.584428 44720146 T C
+21 rs79094191 0.584491 44720890 T C
+21 rs8130169 0.584542 44721475 A G
+21 rs501228 0.584738 44730807 G A
+21 rs74333834 0.586272 44737463 C G
+21 rs4819336 0.58709 44747209 G T
+21 rs2838268 0.587267 44759436 C G
+21 rs2838269 0.58728 44761179 T C
+21 rs2838270 0.587284 44761738 G T
+21 rs9978496 0.587365 44764709 A G
+21 rs476659 0.58741 44766698 C T
+21 rs478570 0.587414 44766920 C T
+21 rs140701167 0.587463 44770052 C T
+21 rs73378255 0.587967 44773295 C T
+21 rs8130681 0.588641 44774853 C A
+21 rs2838282 0.588959 44776695 T C
+21 rs564779 0.589148 44785422 T C
+21 rs668058 0.589157 44786539 G A
+21 rs653435 0.589165 44787553 T C
+21 rs13049305 0.589186 44790028 T C
+21 rs150414296 0.589255 44792762 C A
+21 rs78957366 0.589344 44800830 C T
+21 rs2838293 0.589412 44806314 A G
+21 rs57315329 0.589443 44808713 C T
+21 rs143422847 0.590324 44818058 C T
+21 rs118058929 0.591318 44825423 G A
+21 rs857554 0.591324 44825447 G C
+21 rs59394421 0.591369 44825998 G A
+21 rs74871846 0.591656 44833788 G A
+21 rs117381526 0.591767 44838466 T C
+21 rs118116454 0.591924 44855291 C G
+21 rs35374062 0.592012 44868191 C G
+21 rs12626999 0.592013 44868414 T C
+21 rs11088981 0.59202 44870150 C T
+21 rs17004555 0.592049 44872714 C T
+21 rs463194 0.59219 44877993 A C
+21 rs162348 0.59252 44913747 C T
+21 rs162373 0.592646 44927107 C A
+21 rs79242594 0.592648 44932167 C A
+21 rs75836229 0.592658 44948684 T G
+21 rs7278003 0.592681 44966069 C T
+21 rs76276964 0.592682 44967919 C T
+21 rs77563572 0.592685 44987057 C A
+21 rs13046391 0.592685 44989148 A G
+21 rs2838319 0.592687 44991901 G A
+21 rs183422213 0.59269 45002270 G A
+21 rs73223085 0.592693 45008660 T C
+21 rs78010493 0.592696 45010333 A G
+21 rs2838328 0.592711 45023234 C G
+21 rs2838332 0.592717 45035007 A G
+21 rs2329442 0.592717 45041790 C T
+21 rs144031409 0.592719 45049173 G A
+21 rs1454648 0.592753 45053815 T C
+21 rs9984002 0.592753 45061501 C T
+21 rs2246610 0.592753 45064600 C T
+21 rs2838337 0.592757 45072621 C A
+21 rs116094934 0.59313 45128618 A G
+21 rs111604761 0.59343 45130460 G T
+21 rs10432970 0.594264 45135138 T G
+21 rs4819301 0.594502 45137629 G A
+21 rs13049890 0.594507 45137692 G A
+21 rs62229179 0.59455 45138930 T G
+21 rs11089094 0.594621 45147564 T C
+21 rs9974754 0.59466 45151874 G A
+21 rs2276528 0.594668 45152500 G A
+21 rs79898513 0.594707 45156443 G T
+21 rs73375207 0.594726 45160862 C G
+21 rs140684849 0.594726 45160913 C T
+21 rs8134507 0.59478 45172379 G A
+21 rs117675264 0.594893 45186754 G A
+21 rs74451301 0.594905 45188278 G A
+21 rs140987444 0.594945 45189912 G A
+21 rs73224949 0.595009 45192052 T G
+21 rs59461641 0.595254 45198863 C T
+21 rs13050410 0.595258 45201758 G A
+21 rs55830074 0.595263 45205668 T C
+21 rs2276244 0.595267 45209442 C T
+21 rs2276245 0.595268 45209714 C T
+21 rs2838365 0.59527 45211731 T C
+21 rs62230080 0.595276 45214365 C T
+21 rs2838385 0.595298 45223060 C T
+21 rs2070539 0.595298 45223930 C T
+21 rs9979356 0.595323 45230974 T C
+21 rs11909278 0.595333 45234894 G A
+21 rs1573334 0.59534 45237494 C T
+21 rs1136072 0.595344 45238353 C A
+21 rs11701446 0.595364 45241812 C G
+21 rs139618545 0.595367 45242323 A G
+21 rs8128936 0.595374 45243213 A G
+21 rs144723570 0.595382 45244466 C T
+21 rs4419312 0.595467 45247635 C T
+21 rs2838417 0.595517 45252901 G A
+21 rs2838419 0.595537 45254167 T C
+21 rs11089101 0.595586 45256598 A G
+21 rs4819332 0.595935 45274239 C T
+21 rs762404 0.596051 45295157 A G
+21 rs116847040 0.596057 45298875 G C
+21 rs56242868 0.596061 45301914 G A
+21 rs143858978 0.596086 45317541 G A
+21 rs3788075 0.596089 45318772 T G
+21 rs73371115 0.596089 45319024 T C
+21 rs62228708 0.596093 45320988 T G
+21 rs60986841 0.596094 45322869 G A
+21 rs1006757 0.596151 45341732 C G
+21 rs77817180 0.596231 45351146 G A
+21 rs9983066 0.596243 45355828 C T
+21 rs2838445 0.596381 45362489 G C
+21 rs150098700 0.596399 45367549 C A
+21 rs2838447 0.596612 45370888 A G
+21 rs76291130 0.596826 45392742 G A
+21 rs9647232 0.59684 45393449 G A
+21 rs2838456 0.596858 45396968 A G
+21 rs2070544 0.596865 45398488 T G
+21 rs73906701 0.596939 45405546 G C
+21 rs62229690 0.596958 45408363 G A
+21 rs73373152 0.596976 45410655 G A
+21 rs11702855 0.596999 45413262 G A
+21 rs148908362 0.59764 45431661 C G
+21 rs2838470 0.597664 45437183 G A
+21 rs2299812 0.597666 45438890 C T
+21 rs2238711 0.597679 45453956 A G
+21 rs56273547 0.59768 45454894 T C
+21 rs2247400 0.597682 45456458 C T
+21 rs9306172 0.597703 45475489 G C
+21 rs56340987 0.597703 45477446 C T
+21 rs2838481 0.597704 45487181 G C
+21 rs2255922 0.597724 45492795 A G
+21 rs55794203 0.597724 45500511 C T
+21 rs11550376 0.597746 45527291 G A
+21 rs2855653 0.597769 45540143 A G
+21 rs148012748 0.597779 45543949 T A
+21 rs2242953 0.597779 45545334 A G
+21 rs9978109 0.597779 45547913 C A
+21 rs35538274 0.597787 45565837 C T
+21 rs4819383 0.597818 45573361 A G
+21 rs4818885 0.597862 45580513 G A
+21 rs7283007 0.597893 45584376 T C
+21 rs11702363 0.5979 45585556 A G
+21 rs11702859 0.5979 45585633 G A
+21 rs2838511 0.597928 45589980 G A
+21 rs111443462 0.599038 45606570 A T
+21 rs73228974 0.599157 45609110 G A
+21 rs743476 0.599204 45610369 A G
+21 rs4456788 0.599321 45616324 A G
+21 rs73907113 0.599342 45619519 A G
+21 rs2329709 0.599355 45621560 G A
+21 rs56178904 0.599375 45624551 C T
+21 rs7282137 0.599376 45624667 T C
+21 rs34560460 0.599402 45628546 A T
+21 rs77074020 0.599416 45630591 A G
+21 rs74330716 0.599429 45632229 G C
+21 rs8128765 0.599431 45632427 T A
+21 rs181793439 0.599501 45636665 C T
+21 rs150177720 0.599523 45637648 C G
+21 rs9983276 0.599535 45638288 T C
+21 rs3804033 0.599596 45648488 G C
+21 rs9974245 0.599602 45649499 G A
+21 rs7281454 0.599812 45653759 A G
+21 rs141439063 0.599877 45654774 C A
+21 rs73366529 0.599965 45659025 G A
+21 rs58940818 0.600054 45664784 C A
+21 rs3788111 0.600105 45668171 G T
+21 rs140400131 0.600142 45670601 C T
+21 rs138247506 0.600171 45672692 G T
+21 rs56354854 0.6002 45674950 A T
+21 rs2276248 0.600246 45679258 T C
+21 rs2838540 0.600314 45687581 G A
+21 rs145070872 0.60055 45701294 G A
+21 rs151083841 0.60097 45707609 G A
+21 rs878081 0.601014 45708277 C T
+21 rs150352051 0.601454 45715127 C T
+21 rs2256817 0.601475 45715386 G A
+21 rs9974092 0.601489 45717247 G A
+21 rs9976073 0.601519 45721710 T C
+21 rs56168965 0.601593 45724853 C G
+21 rs140395625 0.601615 45731238 C T
+21 rs2838547 0.601615 45731458 G C
+21 rs143022969 0.601797 45744812 G A
+21 rs3761392 0.601908 45752176 C A
+21 rs66916924 0.60227 45756069 T C
+21 rs111119177 0.602515 45758633 C G
+21 rs13313851 0.602817 45777750 G T
+21 rs9974187 0.602821 45778702 G C
+21 rs28472722 0.602825 45779774 C G
+21 rs4818917 0.602869 45799280 T C
+21 rs11911248 0.602872 45803097 T C
+21 rs141359423 0.602922 45815899 C G
+21 rs189210114 0.602951 45823874 C T
+21 rs1618355 0.60296 45826462 A C
+21 rs1785441 0.602963 45826952 T A
+21 rs8127837 0.602971 45830061 G A
+21 rs2238726 0.603843 45853851 C A
+21 rs45593631 0.60397 45854740 C T
+21 rs72497623 0.604121 45855850 T C
+21 rs7278129 0.604195 45856398 C G
+21 rs4818921 0.604295 45865167 T C
+21 rs145375948 0.604381 45883271 G A
+21 rs73233025 0.604517 45894462 T C
+21 rs73233027 0.604564 45897827 T G
+21 rs2838573 0.604603 45902806 C T
+21 rs79239574 0.604623 45905559 G T
+21 rs78535846 0.604926 45931184 G A
+21 rs73377700 0.605479 45942240 T C
+21 rs75697303 0.605572 45951926 C T
+21 rs2329834 0.607372 45970812 A G
+21 rs478967 0.607373 45971023 G T
+21 rs2838589 0.607733 45977640 A G
+21 rs117347650 0.607776 45979336 G A
+21 rs233254 0.607781 45980477 T C
+21 rs233259 0.607791 45982923 T G
+21 rs233324 0.607801 45985337 T C
+21 rs233275 0.607832 45990446 G A
+21 rs462711 0.607925 45998317 T C
+21 rs420894 0.60793 45998933 G A
+21 rs412062 0.60793 45998935 A G
+21 rs430791 0.60793 45998946 C T
+21 rs149326833 0.607941 46000367 G T
+21 rs455562 0.607955 46002088 T G
+21 rs465930 0.60797 46003900 G A
+21 rs455413 0.607975 46004507 T C
+21 rs1785470 0.608449 46008875 C G
+21 rs233298 0.608592 46010984 G C
+21 rs233309 0.608641 46012182 G A
+21 rs11702525 0.608799 46019183 C A
+21 rs12481814 0.608803 46019672 G A
+21 rs377573 0.608805 46020070 T C
+21 rs76187588 0.608806 46020218 T C
+21 rs77769051 0.608826 46022653 T C
+21 rs1211129 0.608836 46023401 T C
+21 rs415753 0.608849 46023649 T C
+21 rs138752361 0.609228 46032838 G A
+21 rs445214 0.609273 46035495 G T
+21 rs28542851 0.610845 46048657 A G
+21 rs2838602 0.610879 46057393 A T
+21 rs1785474 0.610883 46058229 T A
+21 rs28368348 0.610883 46058244 C A
+21 rs28491945 0.610903 46062481 C A
+21 rs2070574 0.610931 46068532 G A
+21 rs2838614 0.610977 46075307 T C
+21 rs2186933 0.610979 46075590 C A
+21 rs13049382 0.610983 46076186 C T
+21 rs13048478 0.610987 46076751 T G
+21 rs7275203 0.611016 46079413 A G
+21 rs12482870 0.61102 46081379 A G
+21 rs2329840 0.611028 46085733 C T
+21 rs12481809 0.611029 46086407 C T
+21 rs12483730 0.611029 46086457 G A
+21 rs73234805 0.611039 46091856 T A
+21 rs8129089 0.61104 46092107 G A
+21 rs73234825 0.611045 46094784 G T
+21 rs2838625 0.611048 46096731 G A
+21 rs56019874 0.611102 46098517 A C
+21 rs5013903 0.61114 46104481 C T
+21 rs28681623 0.61115 46105373 C T
+21 rs4818963 0.611192 46112862 A G
+21 rs56829302 0.611216 46116447 C T
+21 rs17285466 0.611345 46123526 A T
+21 rs998420 0.611346 46123553 C G
+21 rs143610061 0.611355 46123680 C T
+21 rs2838651 0.6114 46124496 C T
+21 rs9978932 0.611402 46124546 A G
+21 rs7283606 0.611431 46125952 C T
+21 rs11701538 0.611756 46133097 T C
+21 rs235378 0.612083 46143974 C T
+21 rs13050653 0.612363 46153100 G C
+21 rs62222688 0.612424 46160684 T A
+21 rs111642518 0.612435 46161808 G A
+21 rs11909699 0.612505 46168364 G T
+21 rs35656745 0.612536 46169347 G A
+21 rs892517 0.612551 46169818 A C
+21 rs690260 0.612631 46173391 A G
+21 rs35353717 0.613332 46175328 C T
+21 rs1107053 0.613425 46175545 T A
+21 rs117596340 0.614297 46180158 C G
+21 rs2838673 0.6143 46180309 G A
+21 rs13048108 0.614319 46183642 G A
+21 rs3788138 0.614339 46187408 C T
+21 rs73232943 0.614362 46196103 A G
+21 rs2838684 0.614377 46204511 A G
+21 rs73232950 0.61439 46208128 G C
+21 rs79119995 0.614469 46225312 C G
+21 rs9977465 0.6145 46232267 T A
+21 rs11702064 0.614511 46234963 C T
+21 rs57205597 0.614519 46236675 A G
+21 rs73232966 0.614534 46239305 T C
+21 rs235338 0.614665 46245921 A T
+21 rs76496919 0.614747 46255802 G A
+21 rs235258 0.614783 46259884 T C
+21 rs2838703 0.614819 46262733 G A
+21 rs235307 0.614846 46264963 A G
+21 rs741951 0.614892 46269526 G A
+21 rs235312 0.614897 46269753 G A
+21 rs55927826 0.614921 46276686 A G
+21 rs183516 0.614941 46278923 C T
+21 rs235324 0.614951 46279505 C T
+21 rs117115743 0.614978 46288837 A C
+21 rs235373 0.614995 46293692 T C
+21 rs381406 0.615022 46304355 G T
+21 rs35280729 0.615314 46313140 C T
+21 rs56257457 0.615485 46317892 A T
+21 rs35963032 0.615792 46321226 C T
+21 rs3746973 0.616058 46328026 G A
+21 rs58314825 0.616119 46329792 T C
+21 rs9976299 0.617049 46338651 C T
+21 rs6518205 0.617372 46340154 C T
+21 rs3761397 0.617739 46342629 G T
+21 rs2014191 0.618524 46350534 C G
+21 rs9974172 0.618791 46354055 G A
+21 rs62214492 0.618795 46356146 C T
+21 rs4818990 0.618813 46357359 G T
+21 rs13046102 0.618925 46362383 A G
+21 rs4818748 0.618933 46364223 C T
+21 rs139062668 0.618945 46368198 C G
+21 rs149956285 0.618953 46370980 G A
+21 rs7276130 0.619032 46392916 G A
+21 rs7282533 0.619041 46396057 G C
+21 rs1985483 0.619043 46396906 T C
+21 rs1556320 0.619073 46406997 C T
+21 rs149554289 0.61926 46414549 T C
+21 rs75222517 0.619415 46417650 C T
+21 rs117648510 0.619491 46419136 A C
+21 rs73226686 0.620016 46428521 C T
+21 rs73226689 0.620036 46428873 A G
+21 rs4819013 0.620297 46433670 C T
+21 rs2329947 0.620523 46439001 T C
+21 rs2329949 0.620661 46443848 G A
+21 rs12381201 0.620769 46449035 C T
+21 rs186684322 0.620894 46453979 G A
+21 rs6518210 0.620984 46456733 A G
+21 rs13052629 0.620985 46456779 T C
+21 rs56290629 0.621071 46470325 G A
+21 rs4818755 0.621096 46475502 G A
+21 rs75359347 0.621734 46484076 G A
+21 rs77386304 0.621824 46487635 C G
+21 rs2838762 0.621828 46487831 C A
+21 rs187170538 0.621917 46493897 T G
+21 rs3788155 0.621954 46501493 C G
+21 rs729895 0.621993 46515421 C T
+21 rs2838785 0.62203 46529291 G C
+21 rs390504 0.622034 46530945 A G
+21 rs62216599 0.622045 46535138 G C
+21 rs2003273 0.622048 46536586 C T
+21 rs2838788 0.622064 46542374 T C
+21 rs731248 0.622064 46542544 C A
+21 rs142629702 0.622141 46557422 C T
+21 rs117525576 0.622166 46560489 T C
+21 rs116869330 0.622227 46564075 G A
+21 rs420085 0.622286 46572572 G A
+21 rs76628070 0.622291 46573818 T G
+21 rs13048685 0.622291 46573848 T C
+21 rs4819031 0.622305 46578164 G T
+21 rs421716 0.622359 46590422 C T
+21 rs399185 0.622359 46590501 C A
+21 rs11700834 0.622397 46608609 C T
+21 rs2838806 0.622407 46618601 T G
+21 rs112747596 0.622407 46619393 A T
+21 rs1556318 0.622407 46619840 G T
+21 rs2838811 0.622408 46624341 C T
+21 rs57020111 0.622488 46640676 T C
+21 rs144154112 0.622853 46658083 G C
+21 rs2838835 0.622857 46665834 G A
+21 rs2838843 0.622869 46669284 T C
+21 rs7283915 0.622891 46671288 T C
+21 rs113537754 0.622896 46673810 G A
+21 rs35871601 0.622899 46675214 C T
+21 rs1006887 0.622928 46685592 G C
+21 rs76010831 0.62293 46688795 C G
+21 rs79293787 0.622931 46693055 A G
+21 rs914215 0.622934 46698256 T C
+21 rs141775504 0.622935 46699098 A C
+21 rs74961332 0.622939 46702822 A G
+21 rs190868720 0.622943 46707475 G A
+21 rs118190587 0.623088 46717495 C T
+21 rs113412669 0.623115 46720590 A G
+21 rs17004765 0.623185 46729491 G A
+21 rs76886536 0.623574 46738929 C T
+21 rs145189729 0.62418 46743772 C T
+21 rs117376569 0.624342 46747372 G C
+21 rs60997234 0.624353 46747951 G A
+21 rs1109754 0.624427 46753704 C T
+21 rs117462680 0.624446 46755620 C T
+21 rs957794 0.624477 46758766 G C
+21 rs4819064 0.625078 46762213 A C
+21 rs73907436 0.625153 46762722 A G
+21 rs111329446 0.625521 46776382 G A
+21 rs117874286 0.625566 46777638 C T
+21 rs4819070 0.625677 46782383 G A
+21 rs62214459 0.625699 46786997 A G
+21 rs4819075 0.625706 46789984 T A
+21 rs4819086 0.625971 46806537 G A
+21 rs7276817 0.626436 46814171 G A
+21 rs6518240 0.626561 46816144 A G
+21 rs4819097 0.626817 46819149 C A
+21 rs2330105 0.627046 46820786 A G
+21 rs185195057 0.627081 46821036 C A
+21 rs2838907 0.628446 46834311 G C
+21 rs115080577 0.629339 46850015 G A
+21 rs17338076 0.629357 46850351 G A
+21 rs8126556 0.629581 46855397 A G
+21 rs11701363 0.629585 46859163 C T
+21 rs60375909 0.629587 46860905 C T
+21 rs114451752 0.62959 46863586 C T
+21 rs78438877 0.629591 46864316 C T
+21 rs9977280 0.629607 46879338 A G
+21 rs9979599 0.62973 46899672 G A
+21 rs11089006 0.629785 46906246 G C
+21 rs116274237 0.62985 46910352 C G
+21 rs62216329 0.629865 46911276 G T
+21 rs3753020 0.630744 46924805 C G
+21 rs9680189 0.630906 46926659 A G
+21 rs116416605 0.630911 46927262 G A
+21 rs2838955 0.630984 46934518 T C
+21 rs11702537 0.631043 46955883 A G
+21 rs3788203 0.631094 46958586 G T
+21 rs9976727 0.631134 46959179 A G
+21 rs79701880 0.631226 46960598 C T
+21 rs9984031 0.631887 46976828 A T
+21 rs12481898 0.632001 46988540 T A
+21 rs8132738 0.632001 46988804 C T
+21 rs116925823 0.632019 46997131 A C
+21 rs73910125 0.632024 46999466 T C
+21 rs58240688 0.632033 47003853 G A
+21 rs62214345 0.632034 47004163 C T
+21 rs9637201 0.635987 47027636 G A
+21 rs117188250 0.636698 47033608 T C
+21 rs11701249 0.636789 47050714 G A
+21 rs74681567 0.636808 47054985 T C
+21 rs115727620 0.63681 47055991 T C
+21 rs57990579 0.636819 47068170 T G
+21 rs117850327 0.636822 47073646 G A
+21 rs62215031 0.636822 47075406 T C
+21 rs7275796 0.636844 47103998 C T
+21 rs9306138 0.636845 47106428 T A
+21 rs78288391 0.636849 47114310 A G
+21 rs4818795 0.63685 47116489 G T
+21 rs6518255 0.636853 47121578 G A
+21 rs34142541 0.636855 47124281 G A
+21 rs17004795 0.636856 47124912 C G
+21 rs35943749 0.636857 47126309 G A
+21 rs28625750 0.636861 47131175 T C
+21 rs2183596 0.636878 47175451 A G
+21 rs9647230 0.636879 47177064 T C
+21 rs117564991 0.636885 47182627 A T
+21 rs71324444 0.636917 47198301 G A
+21 rs117275582 0.636924 47209325 C A
+21 rs59785950 0.636944 47225742 G A
+21 rs116943168 0.636947 47228172 G A
+21 rs1810871 0.637041 47243947 C G
+21 rs55912352 0.637159 47273127 G A
+21 rs2839026 0.637232 47288665 C G
+21 rs2839030 0.637242 47292007 C A
+21 rs73911030 0.637262 47300216 G C
+21 rs8132062 0.637322 47312804 G A
+21 rs9980645 0.637352 47318224 C T
+21 rs185415278 0.637353 47318510 C T
+21 rs2839054 0.63737 47328449 G A
+21 rs2075908 0.637379 47334276 T A
+21 rs55959976 0.637386 47338611 T A
+21 rs9975216 0.63739 47341590 A G
+21 rs3788239 0.637424 47351173 A G
+21 rs2776402 0.637445 47360761 T C
+21 rs442400 0.637544 47368993 A G
+21 rs426378 0.637562 47369637 A C
+21 rs118113146 0.637626 47371937 T C
+21 rs9977119 0.63764 47372437 G T
+21 rs55972491 0.637698 47374604 G A
+21 rs9978646 0.63772 47376507 G C
+21 rs2150457 0.637723 47376898 T C
+21 rs71324472 0.637761 47383831 C T
+21 rs2150459 0.637764 47384407 T C
+21 rs8129765 0.637782 47388151 G A
+21 rs143116902 0.637814 47394988 G A
+21 rs117775279 0.637827 47397828 G A
+21 rs71324475 0.637859 47404423 G A
+21 rs2277814 0.637897 47409503 G A
+21 rs143704669 0.637984 47413300 A G
+21 rs148827769 0.638176 47422741 G A
+21 rs36067332 0.638261 47427701 G C
+21 rs36002647 0.638306 47430611 A G
+21 rs11089036 0.638308 47430973 T C
+21 rs11700564 0.63838 47437318 T G
+21 rs56304129 0.638388 47437627 G A
+21 rs113786116 0.638424 47439037 C T
+21 rs8131409 0.638496 47442820 C T
+21 rs2243985 0.63852 47450886 G T
+21 rs62211789 0.638528 47453804 C G
+21 rs13053084 0.638567 47466505 G A
+21 rs112396326 0.639035 47472761 C T
+21 rs6518266 0.639049 47473588 G A
+21 rs182215039 0.639072 47474854 T C
+21 rs71324492 0.639121 47477691 G T
+21 rs8127643 0.639173 47480747 G A
+21 rs28519583 0.639253 47485382 A G
+21 rs8126721 0.639283 47487151 T A
+21 rs13052100 0.639309 47488626 A G
+21 rs12481743 0.63975 47490767 A G
+21 rs2839101 0.642313 47498034 C G
+21 rs8126974 0.642385 47498238 A G
+21 rs73380474 0.642823 47499482 C T
+21 rs73380476 0.643059 47500150 T C
+21 rs2839104 0.643286 47501636 A G
+21 rs79537646 0.643319 47501987 C T
+21 rs73159674 0.64339 47503055 C G
+21 rs4305353 0.643453 47507689 C A
+21 rs9975177 0.643646 47517568 G C
+21 rs9975268 0.643689 47524956 T C
+21 rs79894477 0.643715 47530604 C G
+21 rs117154313 0.643721 47531859 G A
+21 rs78822624 0.643724 47532500 C T
+21 rs13050660 0.643796 47546244 C T
+21 rs17272947 0.643797 47546334 T C
+21 rs66514606 0.643854 47550377 G T
+21 rs2839121 0.643869 47550931 C G
+21 rs61314705 0.644177 47562560 T C
+21 rs9974226 0.644345 47569031 G A
+21 rs17004505 0.644412 47571209 T C
+21 rs147874211 0.644818 47575748 C G
+21 rs113525276 0.644839 47577371 G A
+21 rs145907964 0.644843 47580610 G A
+21 rs3827274 0.644845 47581655 C T
+21 rs114783660 0.644846 47582664 C T
+21 rs12329865 0.644847 47583506 C T
+21 rs62214046 0.64485 47585997 T C
+21 rs8134875 0.644852 47587940 G A
+21 rs62214052 0.644857 47591744 C T
+21 rs75003466 0.644858 47592249 A G
+21 rs73144730 0.644895 47599668 G A
+21 rs79937763 0.644962 47605510 G A
+21 rs183470806 0.644978 47606955 C T
+21 rs56173768 0.644984 47608877 C A
+21 rs9979525 0.645005 47616818 C T
+21 rs11909555 0.645005 47617489 T G
+21 rs115295783 0.645005 47618203 G C
+21 rs2839146 0.645017 47630550 C T
+21 rs78276120 0.645023 47631784 C G
+21 rs76489504 0.645032 47636557 G A
+21 rs2280956 0.645038 47642323 C G
+21 rs139806871 0.645051 47644999 T C
+21 rs1060609 0.645076 47662446 G C
+21 rs17183130 0.645085 47671917 C T
+21 rs17183123 0.645085 47671961 G C
+21 rs2839173 0.645095 47676711 G A
+21 rs8131960 0.64512 47687808 C G
+21 rs2839184 0.645121 47689013 G A
+21 rs17182664 0.645146 47697650 A G
+21 rs17368547 0.645146 47697867 A C
+21 rs117360371 0.64515 47702187 T C
+21 rs17176177 0.645151 47702616 G C
+21 rs13047784 0.645151 47702843 C T
+21 rs7275639 0.645157 47708333 T C
+21 rs118135197 0.645157 47708662 T C
+21 rs13051845 0.645162 47710684 A G
+21 rs13052101 0.645163 47710884 A G
+21 rs117442381 0.645312 47715208 C T
+21 rs189875351 0.645335 47717143 G T
+21 rs914252 0.645337 47719091 A G
+21 rs73146707 0.645337 47719481 T C
+21 rs62224180 0.645344 47726332 G C
+21 rs62224181 0.645344 47726523 A T
+21 rs56107738 0.645345 47727402 G A
+21 rs118098334 0.645354 47736303 G A
+21 rs4819236 0.645356 47739146 C T
+21 rs76945616 0.645356 47739165 G A
+21 rs77558558 0.645359 47743954 C T
+21 rs61735822 0.645365 47754563 A G
+21 rs114224027 0.645372 47766154 C T
+21 rs79850630 0.645372 47767482 C T
+21 rs17297961 0.645385 47776845 G A
+21 rs11089062 0.645386 47778712 T C
+21 rs75183852 0.645394 47788686 G A
+21 rs3788258 0.645395 47789854 G A
+21 rs150720703 0.645405 47803787 G A
+21 rs9989933 0.645408 47808253 G A
+21 rs881924 0.645411 47812009 A T
+21 rs148342136 0.645411 47812997 G A
+21 rs75764723 0.645412 47813647 C T
+21 rs77952827 0.645417 47822781 C T
+21 rs2066932 0.645417 47823229 G T
+21 rs61735543 0.645426 47831522 G A
+21 rs184420466 0.645427 47831758 C T
+21 rs145606525 0.645431 47833530 G T
+21 rs112037421 0.645431 47834185 C T
+21 rs882402 0.645431 47834673 G A
+21 rs73379371 0.645434 47840345 A C
+21 rs61735815 0.645435 47841933 T C
+21 rs74752283 0.645437 47847318 C T
+21 rs2073377 0.645456 47860217 C T
+21 rs2073379 0.645464 47863025 C T
+21 rs2839264 0.645467 47866227 C T
+21 rs11909560 0.645471 47869878 T A
+21 rs62224255 0.645483 47881023 C T
+21 rs4276098 0.645489 47886065 A G
+21 rs12106325 0.645494 47889397 G A
+21 rs2839277 0.645514 47895601 A G
+21 rs2839279 0.645516 47896540 C A
+21 rs73148706 0.645518 47897458 G T
+21 rs150890844 0.645555 47922702 T C
+21 rs146494332 0.64556 47932241 T C
+21 rs140821238 0.64556 47932466 T C
+21 rs79188938 0.645575 47940444 A G
+21 rs2096507 0.645575 47941916 G A
+21 rs116897734 0.645603 47957957 G A
+21 rs1008549 0.645604 47959698 C T
+21 rs150180949 0.645676 47965205 C T
+21 rs4818845 0.645692 47966016 C G
+21 rs11910506 0.645702 47966504 C T
+21 rs73152864 0.645707 47966791 G A
+21 rs79100421 0.645789 47969962 T C
+21 rs12627695 0.645797 47970214 G A
+21 rs143364701 0.645844 47973501 G A
+21 rs2839327 0.64587 47982652 A G
+21 rs146953731 0.645884 47987412 G A
+21 rs2226381 0.645897 47996061 G A
+21 rs12482040 0.645899 47996490 A G
+21 rs144028811 0.645912 48001388 G A
+21 rs2877165 0.645912 48001534 G C
+21 rs4818847 0.645912 48004563 T A
+21 rs9980109 0.645913 48008577 C T
+21 rs60829612 0.645914 48013752 C T
+21 rs2839347 0.645921 48016948 G A
+21 rs74578526 0.646027 48026630 C T
+21 rs12152107 0.646039 48027880 C A
+21 rs9979500 0.646051 48029698 C A
+21 rs60955061 0.64606 48031557 T C
+21 rs9980033 0.646089 48038639 T C
+21 rs77151118 0.646091 48039219 C T
+21 rs9980532 0.646103 48042270 G A
+21 rs145324003 0.64613 48050879 T C
+21 rs77626284 0.646142 48054609 T C
+21 rs2256208 0.646191 48062091 A G
+21 rs7276092 0.646193 48063735 C G
+21 rs7279368 0.646196 48065999 A G
+21 rs8127804 0.6462 48068946 G A
+21 rs117530130 0.646227 48075976 A G
+21 rs117282473 0.646256 48084177 G C
+21 rs118032579 0.646259 48085140 G A
+21 rs79913394 0.646279 48088571 A G
+21 rs118189563 0.646515 48099610 T C
+22 rs62224621 0.000581 16060639 C T
+22 rs2508062 0.001318 16079795 T C
+22 rs2713394 0.003835 16140991 C A
+22 rs4819397 0.010433 16228619 C T
+22 rs73877828 0.015306 16287155 G T
+22 rs142725898 0.016599 16302423 C T
+22 rs115323858 0.020387 16346811 G A
+22 rs77828151 0.021076 16420712 G A
+22 rs62222525 0.021094 16433771 T C
+22 rs139659485 0.021216 16486205 C G
+22 rs3949130 0.021229 16488635 C A
+22 rs137964151 0.021349 16534156 G A
+22 rs149248677 0.021382 16547075 G A
+22 rs4552291 0.021388 16549683 A G
+22 rs8190080 0.021495 16590536 A T
+22 rs142040194 0.021532 16600056 G A
+22 rs10154459 0.021698 16631953 G T
+22 rs138896054 0.021822 16687476 T A
+22 rs183011531 0.021837 16695882 G T
+22 rs4419320 0.022123 16860496 T C
+22 rs4350847 0.022123 16860752 A G
+22 rs131519 0.022298 16868143 C T
+22 rs131526 0.022329 16869617 G A
+22 rs131554 0.022421 16874715 C T
+22 rs116999918 0.022434 16875489 G T
+22 rs78489 0.022446 16876216 G A
+22 rs131567 0.022486 16880072 G A
+22 rs12162616 0.022499 16882562 C A
+22 rs3954522 0.02252 16886618 A G
+22 rs1807483 0.02253 16888577 G A
+22 rs147473505 0.02255 16892549 C G
+22 rs5994034 0.022558 16894090 C T
+22 rs3888500 0.022569 16895990 T A
+22 rs9605160 0.022572 16896468 C T
+22 rs150876251 0.022574 16896773 C T
+22 rs147803974 0.022576 16897060 T A
+22 rs5746922 0.022594 16899526 C T
+22 rs144438039 0.022597 16899951 A T
+22 rs142320851 0.022613 16902099 A T
+22 rs12106623 0.022623 16903138 G A
+22 rs5994092 0.022626 16903536 C T
+22 rs5748797 0.022717 16911828 A C
+22 rs186984098 0.022931 16920118 A T
+22 rs2096680 0.022945 16933929 C T
+22 rs145990184 0.022946 16935176 G A
+22 rs3954561 0.022947 16935528 A T
+22 rs9605244 0.022947 16935609 C A
+22 rs4010381 0.022947 16935866 G A
+22 rs148021587 0.022951 16939232 C T
+22 rs4010371 0.022952 16940305 G A
+22 rs146134165 0.023081 17003990 A C
+22 rs11089213 0.023093 17009174 C T
+22 rs11089214 0.023096 17010387 C T
+22 rs11089215 0.023098 17011221 C T
+22 rs144727258 0.023101 17012861 C T
+22 rs139007749 0.023121 17021691 A C
+22 rs9306210 0.023138 17029410 A C
+22 rs5993081 0.023141 17030614 C T
+22 rs2379981 0.023142 17030792 A G
+22 rs5747680 0.023173 17035072 G C
+22 rs148159902 0.023441 17058526 A T
+22 rs738842 0.023488 17060166 T C
+22 rs9605917 0.023601 17064010 A G
+22 rs5993540 0.024249 17075467 C T
+22 rs138291450 0.024335 17079196 G T
+22 rs2070502 0.024395 17084762 A T
+22 rs140287588 0.024694 17092247 A G
+22 rs9604959 0.024738 17099107 C T
+22 rs4008579 0.024767 17106549 G A
+22 rs9606047 0.024777 17114304 G A
+22 rs9606061 0.024778 17116233 G T
+22 rs9605006 0.024788 17130402 C G
+22 rs5992472 0.024789 17132490 A G
+22 rs9618674 0.02479 17133465 G C
+22 rs11703273 0.024794 17140023 G C
+22 rs145735629 0.024798 17145314 C T
+22 rs151001144 0.024798 17145351 C T
+22 rs4819849 0.024808 17152611 A G
+22 rs141852043 0.025236 17178924 A T
+22 rs79553865 0.025944 17194326 T C
+22 rs2845380 0.025975 17203103 G A
+22 rs62227176 0.025992 17211331 C A
+22 rs181600904 0.026009 17218745 G A
+22 rs2845348 0.026013 17220001 T C
+22 rs112413654 0.026015 17220864 A T
+22 rs139765179 0.026025 17223271 G C
+22 rs5748586 0.026037 17225692 T G
+22 rs2845350 0.026047 17226065 A G
+22 rs113555274 0.026319 17233590 C T
+22 rs137861991 0.026422 17244420 T A
+22 rs115008433 0.027101 17276470 A G
+22 rs62227212 0.027108 17278495 A G
+22 rs5748656 0.027144 17288200 T C
+22 rs113701576 0.027192 17298641 T G
+22 rs79319248 0.027321 17311685 T G
+22 rs165645 0.027447 17317346 T C
+22 rs165670 0.027465 17318682 T C
+22 rs165925 0.027579 17333872 T C
+22 rs165740 0.0276 17338687 G C
+22 rs165826 0.027604 17339612 C T
+22 rs16981786 0.027606 17340077 A G
+22 rs112842438 0.027607 17340365 C G
+22 rs12171022 0.028194 17350670 G A
+22 rs9606494 0.028747 17357880 G T
+22 rs62235933 0.02903 17363636 G T
+22 rs35217916 0.029439 17372088 C T
+22 rs117743589 0.029445 17372255 C T
+22 rs4239863 0.029708 17379570 C A
+22 rs5994072 0.029756 17380905 C A
+22 rs9617933 0.029758 17380976 T G
+22 rs5748727 0.02993 17385750 A T
+22 rs148020872 0.030006 17388591 G A
+22 rs115783727 0.030067 17394799 G A
+22 rs13433675 0.030231 17402459 C G
+22 rs60588169 0.030238 17403036 G A
+22 rs759079 0.030268 17405698 A T
+22 rs117630245 0.030515 17412411 T C
+22 rs5994097 0.031146 17414103 T C
+22 rs5746937 0.031166 17414998 A G
+22 rs17806382 0.031177 17415490 G A
+22 rs12170139 0.031185 17416454 A G
+22 rs118166666 0.031303 17436608 A T
+22 rs142018998 0.031313 17439144 C T
+22 rs4819542 0.031317 17439826 G A
+22 rs1024732 0.031356 17443381 G A
+22 rs1541529 0.031365 17446157 G T
+22 rs145847126 0.031402 17457849 C G
+22 rs11704838 0.031416 17461914 G A
+22 rs11703645 0.031421 17464047 G A
+22 rs5746955 0.031436 17472032 A T
+22 rs117381747 0.031456 17480927 G T
+22 rs11704927 0.031464 17483953 C G
+22 rs35364127 0.031466 17484578 A C
+22 rs73147679 0.031476 17488172 C T
+22 rs192498589 0.032963 17516047 A G
+22 rs7284327 0.032964 17516282 G A
+22 rs62236167 0.032969 17517198 C T
+22 rs2845396 0.032985 17521093 T C
+22 rs2845389 0.033009 17527495 C G
+22 rs881623 0.03301 17529583 C T
+22 rs2845386 0.03301 17531288 A G
+22 rs1005209 0.033014 17537530 T C
+22 rs2845407 0.033014 17537830 G A
+22 rs112010533 0.033029 17550694 A G
+22 rs5994156 0.033046 17556720 T C
+22 rs2270243 0.033101 17566504 T C
+22 rs9606607 0.03333 17568467 C T
+22 rs5746991 0.033504 17569017 A T
+22 rs917867 0.034063 17570785 G A
+22 rs12172072 0.037761 17584667 A C
+22 rs2241046 0.037837 17586471 T C
+22 rs55654777 0.038084 17599035 G A
+22 rs5747000 0.038203 17608507 A G
+22 rs5748882 0.038228 17611179 C A
+22 rs5994176 0.038295 17618164 C G
+22 rs77243639 0.038424 17635527 G C
+22 rs4819972 0.038431 17635858 A G
+22 rs5748920 0.038508 17644725 T C
+22 rs8139868 0.038628 17653367 C T
+22 rs143804144 0.038671 17656518 A G
+22 rs7289697 0.038747 17661922 C G
+22 rs9617964 0.038773 17664346 C A
+22 rs5994192 0.038776 17664788 G A
+22 rs9306252 0.038825 17666280 G T
+22 rs3788273 0.038853 17666702 C T
+22 rs9617966 0.039062 17669469 T C
+22 rs75626246 0.039696 17676932 C T
+22 rs78315277 0.039815 17679377 G C
+22 rs12166983 0.039852 17685677 A T
+22 rs362129 0.039881 17690409 G A
+22 rs5747023 0.039886 17691166 C T
+22 rs111511446 0.039983 17692978 G A
+22 rs5748950 0.040654 17696942 C T
+22 rs5748959 0.040866 17699642 T C
+22 rs113385391 0.040869 17699865 G A
+22 rs8136533 0.040875 17700284 C T
+22 rs737969 0.040895 17702046 G A
+22 rs5994208 0.040938 17707618 G A
+22 rs111586752 0.043706 17720854 T C
+22 rs62236468 0.045284 17728038 C T
+22 rs74423436 0.046259 17733865 T A
+22 rs5747044 0.047007 17736507 C T
+22 rs4819988 0.048388 17743581 C T
+22 rs73153442 0.04867 17750190 C T
+22 rs4819991 0.04868 17750681 C T
+22 rs141489572 0.048694 17751310 T C
+22 rs62236493 0.048838 17758053 C T
diff --git a/example/EUR_test.fam b/example/EUR_test.fam
new file mode 100644
index 0000000..3ec0bb6
--- /dev/null
+++ b/example/EUR_test.fam
@@ -0,0 +1,379 @@
+1 HG00096 0 0 1 1
+2 HG00097 0 0 2 1
+3 HG00099 0 0 2 1
+4 HG00100 0 0 2 1
+5 HG00101 0 0 1 1
+6 HG00102 0 0 2 1
+7 HG00103 0 0 1 1
+8 HG00104 0 0 2 1
+9 HG00106 0 0 2 1
+10 HG00108 0 0 1 1
+11 HG00109 0 0 1 1
+12 HG00110 0 0 2 1
+13 HG00111 0 0 2 1
+14 HG00112 0 0 1 1
+15 HG00113 0 0 1 1
+16 HG00114 0 0 1 1
+17 HG00116 0 0 1 1
+18 HG00117 0 0 1 1
+19 HG00118 0 0 2 1
+20 HG00119 0 0 1 1
+21 HG00120 0 0 2 1
+22 HG00121 0 0 2 1
+23 HG00122 0 0 2 1
+24 HG00123 0 0 2 1
+25 HG00124 0 0 2 1
+26 HG00125 0 0 2 1
+27 HG00126 0 0 1 1
+28 HG00127 0 0 2 1
+29 HG00128 0 0 2 1
+30 HG00129 0 0 1 1
+31 HG00130 0 0 2 1
+32 HG00131 0 0 1 1
+33 HG00133 0 0 2 1
+34 HG00134 0 0 2 1
+35 HG00135 0 0 2 1
+36 HG00136 0 0 1 1
+37 HG00137 0 0 2 1
+38 HG00138 0 0 1 1
+39 HG00139 0 0 1 1
+40 HG00140 0 0 1 1
+41 HG00141 0 0 1 1
+42 HG00142 0 0 1 1
+43 HG00143 0 0 1 1
+44 HG00146 0 0 2 1
+45 HG00148 0 0 1 1
+46 HG00149 0 0 1 1
+47 HG00150 0 0 2 1
+48 HG00151 0 0 1 1
+49 HG00152 0 0 1 1
+50 HG00154 0 0 2 1
+51 HG00155 0 0 1 1
+52 HG00156 0 0 1 1
+53 HG00158 0 0 2 1
+54 HG00159 0 0 1 1
+55 HG00160 0 0 1 1
+56 HG00171 0 0 2 1
+57 HG00173 0 0 2 1
+58 HG00174 0 0 2 1
+59 HG00176 0 0 2 1
+60 HG00177 0 0 2 1
+61 HG00178 0 0 2 1
+62 HG00179 0 0 2 1
+63 HG00180 0 0 2 1
+64 HG00182 0 0 1 1
+65 HG00183 0 0 1 1
+66 HG00185 0 0 1 1
+67 HG00186 0 0 1 1
+68 HG00187 0 0 1 1
+69 HG00188 0 0 1 1
+70 HG00189 0 0 1 1
+71 HG00190 0 0 1 1
+72 HG00231 0 0 2 1
+73 HG00232 0 0 2 1
+74 HG00233 0 0 2 1
+75 HG00234 0 0 1 1
+76 HG00235 0 0 2 1
+77 HG00236 0 0 2 1
+78 HG00237 0 0 2 1
+79 HG00238 0 0 2 1
+80 HG00239 0 0 2 1
+81 HG00240 0 0 2 1
+82 HG00242 0 0 1 1
+83 HG00243 0 0 1 1
+84 HG00244 0 0 1 1
+85 HG00245 0 0 2 1
+86 HG00246 0 0 1 1
+87 HG00247 0 0 2 1
+88 HG00249 0 0 2 1
+89 HG00250 0 0 2 1
+90 HG00251 0 0 1 1
+91 HG00252 0 0 1 1
+92 HG00253 0 0 2 1
+93 HG00254 0 0 2 1
+94 HG00255 0 0 2 1
+95 HG00256 0 0 1 1
+96 HG00257 0 0 2 1
+97 HG00258 0 0 2 1
+98 HG00259 0 0 2 1
+99 HG00260 0 0 1 1
+100 HG00261 0 0 2 1
+101 HG00262 0 0 2 1
+102 HG00263 0 0 2 1
+103 HG00264 0 0 1 1
+104 HG00265 0 0 1 1
+105 HG00266 0 0 2 1
+106 HG00267 0 0 1 1
+107 HG00268 0 0 2 1
+108 HG00269 0 0 2 1
+109 HG00270 0 0 2 1
+110 HG00271 0 0 1 1
+111 HG00272 0 0 2 1
+112 HG00273 0 0 1 1
+113 HG00274 0 0 2 1
+114 HG00275 0 0 2 1
+115 HG00276 0 0 2 1
+116 HG00277 0 0 1 1
+117 HG00278 0 0 1 1
+118 HG00280 0 0 1 1
+119 HG00281 0 0 2 1
+120 HG00282 0 0 2 1
+121 HG00284 0 0 1 1
+122 HG00285 0 0 2 1
+123 HG00306 0 0 2 1
+124 HG00309 0 0 2 1
+125 HG00310 0 0 1 1
+126 HG00311 0 0 1 1
+127 HG00312 0 0 1 1
+128 HG00313 0 0 2 1
+129 HG00315 0 0 2 1
+130 HG00318 0 0 2 1
+131 HG00319 0 0 2 1
+132 HG00320 0 0 2 1
+133 HG00321 0 0 1 1
+134 HG00323 0 0 2 1
+135 HG00324 0 0 2 1
+136 HG00325 0 0 1 1
+137 HG00326 0 0 2 1
+138 HG00327 0 0 2 1
+139 HG00328 0 0 2 1
+140 HG00329 0 0 1 1
+141 HG00330 0 0 2 1
+142 HG00331 0 0 2 1
+143 HG00332 0 0 2 1
+144 HG00334 0 0 2 1
+145 HG00335 0 0 1 1
+146 HG00336 0 0 1 1
+147 HG00337 0 0 2 1
+148 HG00338 0 0 1 1
+149 HG00339 0 0 2 1
+150 HG00341 0 0 1 1
+151 HG00342 0 0 1 1
+152 HG00343 0 0 2 1
+153 HG00344 0 0 2 1
+154 HG00345 0 0 1 1
+155 HG00346 0 0 2 1
+156 HG00349 0 0 2 1
+157 HG00350 0 0 2 1
+158 HG00351 0 0 1 1
+159 HG00353 0 0 2 1
+160 HG00355 0 0 2 1
+161 HG00356 0 0 2 1
+162 HG00357 0 0 2 1
+163 HG00358 0 0 1 1
+164 HG00359 0 0 2 1
+165 HG00360 0 0 1 1
+166 HG00361 0 0 2 1
+167 HG00362 0 0 2 1
+168 HG00364 0 0 2 1
+169 HG00366 0 0 1 1
+170 HG00367 0 0 2 1
+171 HG00369 0 0 1 1
+172 HG00372 0 0 1 1
+173 HG00373 0 0 2 1
+174 HG00375 0 0 1 1
+175 HG00376 0 0 2 1
+176 HG00377 0 0 2 1
+177 HG00378 0 0 2 1
+178 HG00381 0 0 2 1
+179 HG00382 0 0 1 1
+180 HG00383 0 0 2 1
+181 HG00384 0 0 2 1
+182 HG01334 0 0 1 1
+183 HG01515 0 0 1 1
+184 HG01516 0 0 2 1
+185 HG01518 0 0 1 1
+186 HG01519 0 0 2 1
+187 HG01521 0 0 1 1
+188 HG01522 0 0 2 1
+189 HG01617 0 0 1 1
+190 HG01618 0 0 2 1
+191 HG01619 0 0 1 1
+192 HG01620 0 0 2 1
+193 HG01623 0 0 2 1
+194 HG01624 0 0 1 1
+195 HG01625 0 0 1 1
+196 HG01626 0 0 2 1
+197 NA06984 0 0 1 1
+198 NA06986 0 0 1 1
+199 NA06989 0 0 2 1
+200 NA06994 0 0 1 1
+201 NA07000 0 0 2 1
+202 NA07037 0 0 2 1
+203 NA07048 0 0 1 1
+204 NA07051 0 0 1 1
+205 NA07056 0 0 2 1
+206 NA07347 0 0 1 1
+207 NA07357 0 0 1 1
+208 NA10847 0 0 2 1
+209 NA10851 0 0 1 1
+210 NA11829 0 0 1 1
+211 NA11830 0 0 2 1
+212 NA11831 0 0 1 1
+213 NA11843 0 0 1 1
+214 NA11892 0 0 2 1
+215 NA11893 0 0 1 1
+216 NA11894 0 0 2 1
+217 NA11919 0 0 1 1
+218 NA11920 0 0 2 1
+219 NA11930 0 0 1 1
+220 NA11931 0 0 2 1
+221 NA11932 0 0 1 1
+222 NA11933 0 0 2 1
+223 NA11992 0 0 1 1
+224 NA11993 0 0 2 1
+225 NA11994 0 0 1 1
+226 NA11995 0 0 2 1
+227 NA12003 0 0 1 1
+228 NA12004 0 0 2 1
+229 NA12006 0 0 2 1
+230 NA12043 0 0 1 1
+231 NA12044 0 0 2 1
+232 NA12045 0 0 1 1
+233 NA12046 0 0 2 1
+234 NA12058 0 0 2 1
+235 NA12144 0 0 1 1
+236 NA12154 0 0 1 1
+237 NA12155 0 0 1 1
+238 NA12249 0 0 2 1
+239 NA12272 0 0 1 1
+240 NA12273 0 0 2 1
+241 NA12275 0 0 2 1
+242 NA12282 0 0 1 1
+243 NA12283 0 0 2 1
+244 NA12286 0 0 1 1
+245 NA12287 0 0 2 1
+246 NA12340 0 0 1 1
+247 NA12341 0 0 2 1
+248 NA12342 0 0 1 1
+249 NA12347 0 0 1 1
+250 NA12348 0 0 2 1
+251 NA12383 0 0 2 1
+252 NA12399 0 0 1 1
+253 NA12400 0 0 2 1
+254 NA12413 0 0 1 1
+255 NA12489 0 0 2 1
+256 NA12546 0 0 1 1
+257 NA12716 0 0 1 1
+258 NA12717 0 0 2 1
+259 NA12718 0 0 2 1
+260 NA12748 0 0 1 1
+261 NA12749 0 0 2 1
+262 NA12750 0 0 1 1
+263 NA12751 0 0 2 1
+264 NA12761 0 0 2 1
+265 NA12763 0 0 2 1
+266 NA12775 0 0 1 1
+267 NA12777 0 0 1 1
+268 NA12778 0 0 2 1
+269 NA12812 0 0 1 1
+270 NA12814 0 0 1 1
+271 NA12815 0 0 2 1
+272 NA12827 0 0 1 1
+273 NA12829 0 0 1 1
+274 NA12830 0 0 2 1
+275 NA12842 0 0 1 1
+276 NA12843 0 0 2 1
+277 NA12872 0 0 1 1
+278 NA12873 0 0 2 1
+279 NA12874 0 0 1 1
+280 NA12889 0 0 1 1
+281 NA12890 0 0 2 1
+282 NA20502 0 0 2 1
+283 NA20503 0 0 2 1
+284 NA20504 0 0 2 1
+285 NA20505 0 0 2 1
+286 NA20506 0 0 2 1
+287 NA20507 0 0 2 1
+288 NA20508 0 0 2 1
+289 NA20509 0 0 1 1
+290 NA20510 0 0 1 1
+291 NA20512 0 0 1 1
+292 NA20513 0 0 1 1
+293 NA20515 0 0 1 1
+294 NA20516 0 0 1 1
+295 NA20517 0 0 2 1
+296 NA20518 0 0 1 1
+297 NA20519 0 0 1 1
+298 NA20520 0 0 1 1
+299 NA20521 0 0 1 1
+300 NA20522 0 0 2 1
+301 NA20524 0 0 1 1
+302 NA20525 0 0 1 1
+303 NA20527 0 0 1 1
+304 NA20528 0 0 1 1
+305 NA20529 0 0 2 1
+306 NA20530 0 0 2 1
+307 NA20531 0 0 2 1
+308 NA20532 0 0 1 1
+309 NA20533 0 0 2 1
+310 NA20534 0 0 1 1
+311 NA20535 0 0 2 1
+312 NA20536 0 0 1 1
+313 NA20537 0 0 1 1
+314 NA20538 0 0 1 1
+315 NA20539 0 0 1 1
+316 NA20540 0 0 2 1
+317 NA20541 0 0 2 1
+318 NA20542 0 0 2 1
+319 NA20543 0 0 1 1
+320 NA20544 0 0 1 1
+321 NA20581 0 0 1 1
+322 NA20582 0 0 2 1
+323 NA20585 0 0 2 1
+324 NA20586 0 0 1 1
+325 NA20588 0 0 1 1
+326 NA20589 0 0 2 1
+327 NA20752 0 0 1 1
+328 NA20753 0 0 2 1
+329 NA20754 0 0 1 1
+330 NA20755 0 0 1 1
+331 NA20756 0 0 2 1
+332 NA20757 0 0 2 1
+333 NA20758 0 0 1 1
+334 NA20759 0 0 1 1
+335 NA20760 0 0 2 1
+336 NA20761 0 0 2 1
+337 NA20765 0 0 1 1
+338 NA20766 0 0 2 1
+339 NA20768 0 0 2 1
+340 NA20769 0 0 2 1
+341 NA20770 0 0 1 1
+342 NA20771 0 0 2 1
+343 NA20772 0 0 2 1
+344 NA20773 0 0 2 1
+345 NA20774 0 0 2 1
+346 NA20775 0 0 2 1
+347 NA20778 0 0 1 1
+348 NA20783 0 0 1 1
+349 NA20785 0 0 1 1
+350 NA20786 0 0 2 1
+351 NA20787 0 0 1 1
+352 NA20790 0 0 2 1
+353 NA20792 0 0 1 1
+354 NA20795 0 0 2 1
+355 NA20796 0 0 1 1
+356 NA20797 0 0 2 1
+357 NA20798 0 0 1 1
+358 NA20799 0 0 2 1
+359 NA20800 0 0 2 1
+360 NA20801 0 0 1 1
+361 NA20802 0 0 2 1
+362 NA20803 0 0 1 1
+363 NA20804 0 0 2 1
+364 NA20805 0 0 1 1
+365 NA20806 0 0 1 1
+366 NA20807 0 0 2 1
+367 NA20808 0 0 2 1
+368 NA20809 0 0 1 1
+369 NA20810 0 0 1 1
+370 NA20811 0 0 1 1
+371 NA20812 0 0 1 1
+372 NA20813 0 0 2 1
+373 NA20814 0 0 1 1
+374 NA20815 0 0 1 1
+375 NA20816 0 0 1 1
+376 NA20818 0 0 2 1
+377 NA20819 0 0 2 1
+378 NA20826 0 0 2 1
+379 NA20828 0 0 2 1
diff --git a/example/EUR_test.vcf.gz b/example/EUR_test.vcf.gz
new file mode 100644
index 0000000..be75a22
Binary files /dev/null and b/example/EUR_test.vcf.gz differ
diff --git a/example/example.log b/example/example.log
new file mode 100644
index 0000000..394a0ed
--- /dev/null
+++ b/example/example.log
@@ -0,0 +1,194 @@
+ +-----------------------------+
+ | |
+ | Eagle v2.3 |
+ | July 22, 2016 |
+ | Po-Ru Loh |
+ | |
+ +-----------------------------+
+
+Copyright (C) 2015-2016 Harvard University.
+Distributed under the GNU GPLv3 open source license.
+
+Command line options:
+
+../eagle \
+ --bfile=EUR_test \
+ --geneticMapFile=USE_BIM \
+ --chrom=21 \
+ --outPrefix=phased \
+ --numThreads=4
+
+Setting number of threads to 4
+
+=== Reading genotype data ===
+
+Reading fam file: EUR_test.fam
+Total indivs in PLINK data: Nbed = 379
+Total indivs stored in memory: NpreQC = 379
+Reading bim file: EUR_test.bim
+Total snps in PLINK data: Mbed = 2000
+Restricting to 1813 SNPs on chrom 21 in region [bpStart,bpEnd] = [0,1e+09]
+Total SNPs stored in memory: MpreQC = 1813
+Allocating 1813 x 379 bytes to temporarily store genotypes
+Reading genotypes and performing QC filtering on snps and indivs...
+Reading bed file: EUR_test.bed
+ Expecting 190000 (+3) bytes for 379 indivs, 2000 snps
+
+Total post-QC indivs: N = 379
+Total post-QC SNPs: M = 1813
+MAF spectrum:
+ 0- 5%: 495
+ 5-10%: 290
+ 10-20%: 332
+ 20-30%: 248
+ 30-40%: 234
+ 40-50%: 214
+Physical distance range: 9752235 base pairs
+Genetic distance range: 23.0881 cM
+Average # SNPs per cM: 79
+Auto-selecting --maxBlockLen: 0.25 cM
+Number of <=(64-SNP, 0.25cM) segments: 68
+Average # SNPs per segment: 26
+Estimating LD scores using 379 indivs
+Fraction of heterozygous genotypes: 0.246308
+Typical span of default 100-het history length: 5.17 cM
+Setting --histFactor=1.00
+
+BEGINNING STEP 1
+
+Time for step 1: 0.373152
+Time for step 1 MN^2: 0.0242926
+
+Making hard calls (time: 0.0283859)
+
+
+BEGINNING STEP 2
+
+BATCH 1 OF 1
+Building hash tables
+.................................................................. (time: 0.0523882)
+
+Phasing samples 1-379
+Time for phasing batch: 0.449098
+
+Making hard calls (time: 0.029567)
+
+Time for step 2: 0.531062
+Time for step 2 MN^2: 0.0591508
+
+
+BEGINNING STEP 3 (PBWT ITERS)
+
+Auto-selecting number of PBWT iterations: setting --pbwtIters to 2
+
+
+BEGINNING PBWT ITER 1
+
+BATCH 1 OF 10
+
+Phasing samples 1-37
+Time for phasing batch: 1.75458
+
+BATCH 2 OF 10
+
+Phasing samples 38-75
+Time for phasing batch: 1.59901
+
+BATCH 3 OF 10
+
+Phasing samples 76-113
+Time for phasing batch: 1.69437
+
+BATCH 4 OF 10
+
+Phasing samples 114-151
+Time for phasing batch: 1.62289
+
+BATCH 5 OF 10
+
+Phasing samples 152-189
+Time for phasing batch: 1.59901
+
+BATCH 6 OF 10
+
+Phasing samples 190-227
+Time for phasing batch: 1.63321
+
+BATCH 7 OF 10
+
+Phasing samples 228-265
+Time for phasing batch: 1.63894
+
+BATCH 8 OF 10
+
+Phasing samples 266-303
+Time for phasing batch: 1.57722
+
+BATCH 9 OF 10
+
+Phasing samples 304-341
+Time for phasing batch: 1.56258
+
+BATCH 10 OF 10
+
+Phasing samples 342-379
+Time for phasing batch: 1.63269
+
+Time for PBWT iter 1: 16.3145
+
+BEGINNING PBWT ITER 2
+
+BATCH 1 OF 10
+
+Phasing samples 1-37
+Time for phasing batch: 2.73515
+
+BATCH 2 OF 10
+
+Phasing samples 38-75
+Time for phasing batch: 2.58695
+
+BATCH 3 OF 10
+
+Phasing samples 76-113
+Time for phasing batch: 2.70714
+
+BATCH 4 OF 10
+
+Phasing samples 114-151
+Time for phasing batch: 2.54992
+
+BATCH 5 OF 10
+
+Phasing samples 152-189
+Time for phasing batch: 2.51631
+
+BATCH 6 OF 10
+
+Phasing samples 190-227
+Time for phasing batch: 2.62562
+
+BATCH 7 OF 10
+
+Phasing samples 228-265
+Time for phasing batch: 2.60083
+
+BATCH 8 OF 10
+
+Phasing samples 266-303
+Time for phasing batch: 2.4503
+
+BATCH 9 OF 10
+
+Phasing samples 304-341
+Time for phasing batch: 2.54366
+
+BATCH 10 OF 10
+
+Phasing samples 342-379
+Time for phasing batch: 2.56976
+
+Time for PBWT iter 2: 25.8857
+Writing .haps.gz and .sample output
+Time for writing output: 0.370972
+Total elapsed time for analysis = 43.6733 sec
diff --git a/example/example_ref.log b/example/example_ref.log
new file mode 100644
index 0000000..e8fb1b7
--- /dev/null
+++ b/example/example_ref.log
@@ -0,0 +1,79 @@
+ +-----------------------------+
+ | |
+ | Eagle v2.3 |
+ | July 22, 2016 |
+ | Po-Ru Loh |
+ | |
+ +-----------------------------+
+
+Copyright (C) 2015-2016 Harvard University.
+Distributed under the GNU GPLv3 open source license.
+
+Command line options:
+
+../eagle \
+ --vcfRef=ref.bcf \
+ --vcfTarget=target.vcf.gz \
+ --geneticMapFile=../tables/genetic_map_hg19_withX.txt.gz \
+ --outPrefix=target.phased
+
+Setting number of threads to 1
+Warning: The index file is older than the data file: ref.bcf.csi
+
+Reference samples: Nref = 169
+Target samples: Ntarget = 8
+[W::vcf_parse] INFO 'AC' is not defined in the header, assuming Type=String
+[W::vcf_parse] INFO 'AN' is not defined in the header, assuming Type=String
+[W::vcf_parse] INFO 'DP' is not defined in the header, assuming Type=String
+[W::vcf_parse] INFO 'AFR_AF' is not defined in the header, assuming Type=String
+[W::vcf_parse] INFO 'EX_TARGET' is not defined in the header, assuming Type=String
+SNPs to analyze: M = 430 SNPs in both target and reference
+
+SNPs ignored: 0 SNPs in target but not reference
+ 215 SNPs in reference but not target
+ 0 multi-allelic SNPs
+ 0 monomorphic SNPs
+
+Missing rate in target genotypes: 0.00116279
+
+Filling in genetic map coordinates using reference file:
+ ../tables/genetic_map_hg19_withX.txt.gz
+Physical distance range: 3595565 base pairs
+Genetic distance range: 10.292 cM
+Average # SNPs per cM: 42
+Number of <=(64-SNP, 1cM) segments: 9
+Average # SNPs per segment: 47
+Fraction of heterozygous genotypes: 0.178114
+Typical span of default 100-het history length: 13.44 cM
+Setting --histFactor=1.00
+
+Auto-selecting number of phasing iterations: setting --pbwtIters to 1
+
+
+BEGINNING PHASING
+
+PHASING ITER 1 OF 1
+
+
+Phasing target samples
+................................................................................
+Time for phasing iter 1: 0.23367
+Writing vcf.gz output to target.phased.vcf.gz
+[W::vcf_parse] INFO 'AC' is not defined in the header, assuming Type=String
+[W::vcf_parse] INFO 'AN' is not defined in the header, assuming Type=String
+[W::vcf_parse] INFO 'DP' is not defined in the header, assuming Type=String
+[W::vcf_parse] INFO 'AFR_AF' is not defined in the header, assuming Type=String
+[W::vcf_parse] INFO 'EX_TARGET' is not defined in the header, assuming Type=String
+Time for writing output: 0.028584
+Total elapsed time for analysis = 9.43616 sec
+
+Mean phase confidence of each target individual:
+ID PHASE_CONFIDENCE
+HG00403 0.953163
+HG00404 0.949485
+HG00406 0.927005
+HG00407 0.950292
+HG00419 0.938822
+HG00421 0.93367
+HG00422 0.952224
+HG00428 0.944651
diff --git a/example/example_vcf.log b/example/example_vcf.log
new file mode 100644
index 0000000..23decc9
--- /dev/null
+++ b/example/example_vcf.log
@@ -0,0 +1,177 @@
+ +-----------------------------+
+ | |
+ | Eagle v2.3 |
+ | July 22, 2016 |
+ | Po-Ru Loh |
+ | |
+ +-----------------------------+
+
+Copyright (C) 2015-2016 Harvard University.
+Distributed under the GNU GPLv3 open source license.
+
+Command line options:
+
+../eagle \
+ --vcf=EUR_test.vcf.gz \
+ --geneticMapFile=../tables/genetic_map_hg19_withX.txt.gz \
+ --chrom=21 \
+ --outPrefix=phased \
+ --numThreads=4
+
+Setting number of threads to 4
+
+=== Reading genotype data ===
+
+Reading genotypes for N = 379 samples
+Read M = 1813 variants
+Filling in genetic map coordinates using reference file:
+ ../tables/genetic_map_hg19_withX.txt.gz
+Physical distance range: 9752235 base pairs
+Genetic distance range: 23.0881 cM
+Average # SNPs per cM: 79
+Auto-selecting --maxBlockLen: 0.25 cM
+Number of <=(64-SNP, 0.25cM) segments: 68
+Average # SNPs per segment: 26
+Estimating LD scores using 379 indivs
+Fraction of heterozygous genotypes: 0.246308
+Typical span of default 100-het history length: 5.17 cM
+Setting --histFactor=1.00
+
+BEGINNING STEP 1
+
+Time for step 1: 0.375713
+Time for step 1 MN^2: 0.0244864
+
+Making hard calls (time: 0.028863)
+
+
+BEGINNING STEP 2
+
+BATCH 1 OF 1
+Building hash tables
+.................................................................. (time: 0.0508139)
+
+Phasing samples 1-379
+Time for phasing batch: 0.437996
+
+Making hard calls (time: 0.029588)
+
+Time for step 2: 0.51841
+Time for step 2 MN^2: 0.0594165
+
+
+BEGINNING STEP 3 (PBWT ITERS)
+
+Auto-selecting number of PBWT iterations: setting --pbwtIters to 2
+
+
+BEGINNING PBWT ITER 1
+
+BATCH 1 OF 10
+
+Phasing samples 1-37
+Time for phasing batch: 1.77567
+
+BATCH 2 OF 10
+
+Phasing samples 38-75
+Time for phasing batch: 1.64262
+
+BATCH 3 OF 10
+
+Phasing samples 76-113
+Time for phasing batch: 1.71332
+
+BATCH 4 OF 10
+
+Phasing samples 114-151
+Time for phasing batch: 1.61786
+
+BATCH 5 OF 10
+
+Phasing samples 152-189
+Time for phasing batch: 1.6091
+
+BATCH 6 OF 10
+
+Phasing samples 190-227
+Time for phasing batch: 1.6461
+
+BATCH 7 OF 10
+
+Phasing samples 228-265
+Time for phasing batch: 1.651
+
+BATCH 8 OF 10
+
+Phasing samples 266-303
+Time for phasing batch: 1.58484
+
+BATCH 9 OF 10
+
+Phasing samples 304-341
+Time for phasing batch: 1.59481
+
+BATCH 10 OF 10
+
+Phasing samples 342-379
+Time for phasing batch: 1.63671
+
+Time for PBWT iter 1: 16.4721
+
+BEGINNING PBWT ITER 2
+
+BATCH 1 OF 10
+
+Phasing samples 1-37
+Time for phasing batch: 2.72875
+
+BATCH 2 OF 10
+
+Phasing samples 38-75
+Time for phasing batch: 2.59217
+
+BATCH 3 OF 10
+
+Phasing samples 76-113
+Time for phasing batch: 2.72183
+
+BATCH 4 OF 10
+
+Phasing samples 114-151
+Time for phasing batch: 2.57471
+
+BATCH 5 OF 10
+
+Phasing samples 152-189
+Time for phasing batch: 2.5283
+
+BATCH 6 OF 10
+
+Phasing samples 190-227
+Time for phasing batch: 2.64632
+
+BATCH 7 OF 10
+
+Phasing samples 228-265
+Time for phasing batch: 2.63781
+
+BATCH 8 OF 10
+
+Phasing samples 266-303
+Time for phasing batch: 2.52006
+
+BATCH 9 OF 10
+
+Phasing samples 304-341
+Time for phasing batch: 2.53771
+
+BATCH 10 OF 10
+
+Phasing samples 342-379
+Time for phasing batch: 2.51691
+
+Time for PBWT iter 2: 26.0046
+Writing vcf.gz output to phased.vcf.gz
+Time for writing output: 0.331217
+Total elapsed time for analysis = 52.9965 sec
diff --git a/example/phased.haps.gz b/example/phased.haps.gz
new file mode 100644
index 0000000..aa3d21e
Binary files /dev/null and b/example/phased.haps.gz differ
diff --git a/example/phased.sample b/example/phased.sample
new file mode 100644
index 0000000..96030f2
--- /dev/null
+++ b/example/phased.sample
@@ -0,0 +1,381 @@
+ID_1 ID_2 missing
+0 0 0
+1 HG00096 0
+2 HG00097 0
+3 HG00099 0
+4 HG00100 0
+5 HG00101 0
+6 HG00102 0
+7 HG00103 0
+8 HG00104 0
+9 HG00106 0
+10 HG00108 0
+11 HG00109 0
+12 HG00110 0
+13 HG00111 0
+14 HG00112 0
+15 HG00113 0
+16 HG00114 0
+17 HG00116 0
+18 HG00117 0
+19 HG00118 0
+20 HG00119 0
+21 HG00120 0
+22 HG00121 0
+23 HG00122 0
+24 HG00123 0
+25 HG00124 0
+26 HG00125 0
+27 HG00126 0
+28 HG00127 0
+29 HG00128 0
+30 HG00129 0
+31 HG00130 0
+32 HG00131 0
+33 HG00133 0
+34 HG00134 0
+35 HG00135 0
+36 HG00136 0
+37 HG00137 0
+38 HG00138 0
+39 HG00139 0
+40 HG00140 0
+41 HG00141 0
+42 HG00142 0
+43 HG00143 0
+44 HG00146 0
+45 HG00148 0
+46 HG00149 0
+47 HG00150 0
+48 HG00151 0
+49 HG00152 0
+50 HG00154 0
+51 HG00155 0
+52 HG00156 0
+53 HG00158 0
+54 HG00159 0
+55 HG00160 0
+56 HG00171 0
+57 HG00173 0
+58 HG00174 0
+59 HG00176 0
+60 HG00177 0
+61 HG00178 0
+62 HG00179 0
+63 HG00180 0
+64 HG00182 0
+65 HG00183 0
+66 HG00185 0
+67 HG00186 0
+68 HG00187 0
+69 HG00188 0
+70 HG00189 0
+71 HG00190 0
+72 HG00231 0
+73 HG00232 0
+74 HG00233 0
+75 HG00234 0
+76 HG00235 0
+77 HG00236 0
+78 HG00237 0
+79 HG00238 0
+80 HG00239 0
+81 HG00240 0
+82 HG00242 0
+83 HG00243 0
+84 HG00244 0
+85 HG00245 0
+86 HG00246 0
+87 HG00247 0
+88 HG00249 0
+89 HG00250 0
+90 HG00251 0
+91 HG00252 0
+92 HG00253 0
+93 HG00254 0
+94 HG00255 0
+95 HG00256 0
+96 HG00257 0
+97 HG00258 0
+98 HG00259 0
+99 HG00260 0
+100 HG00261 0
+101 HG00262 0
+102 HG00263 0
+103 HG00264 0
+104 HG00265 0
+105 HG00266 0
+106 HG00267 0
+107 HG00268 0
+108 HG00269 0
+109 HG00270 0
+110 HG00271 0
+111 HG00272 0
+112 HG00273 0
+113 HG00274 0
+114 HG00275 0
+115 HG00276 0
+116 HG00277 0
+117 HG00278 0
+118 HG00280 0
+119 HG00281 0
+120 HG00282 0
+121 HG00284 0
+122 HG00285 0
+123 HG00306 0
+124 HG00309 0
+125 HG00310 0
+126 HG00311 0
+127 HG00312 0
+128 HG00313 0
+129 HG00315 0
+130 HG00318 0
+131 HG00319 0
+132 HG00320 0
+133 HG00321 0
+134 HG00323 0
+135 HG00324 0
+136 HG00325 0
+137 HG00326 0
+138 HG00327 0
+139 HG00328 0
+140 HG00329 0
+141 HG00330 0
+142 HG00331 0
+143 HG00332 0
+144 HG00334 0
+145 HG00335 0
+146 HG00336 0
+147 HG00337 0
+148 HG00338 0
+149 HG00339 0
+150 HG00341 0
+151 HG00342 0
+152 HG00343 0
+153 HG00344 0
+154 HG00345 0
+155 HG00346 0
+156 HG00349 0
+157 HG00350 0
+158 HG00351 0
+159 HG00353 0
+160 HG00355 0
+161 HG00356 0
+162 HG00357 0
+163 HG00358 0
+164 HG00359 0
+165 HG00360 0
+166 HG00361 0
+167 HG00362 0
+168 HG00364 0
+169 HG00366 0
+170 HG00367 0
+171 HG00369 0
+172 HG00372 0
+173 HG00373 0
+174 HG00375 0
+175 HG00376 0
+176 HG00377 0
+177 HG00378 0
+178 HG00381 0
+179 HG00382 0
+180 HG00383 0
+181 HG00384 0
+182 HG01334 0
+183 HG01515 0
+184 HG01516 0
+185 HG01518 0
+186 HG01519 0
+187 HG01521 0
+188 HG01522 0
+189 HG01617 0
+190 HG01618 0
+191 HG01619 0
+192 HG01620 0
+193 HG01623 0
+194 HG01624 0
+195 HG01625 0
+196 HG01626 0
+197 NA06984 0
+198 NA06986 0
+199 NA06989 0
+200 NA06994 0
+201 NA07000 0
+202 NA07037 0
+203 NA07048 0
+204 NA07051 0
+205 NA07056 0
+206 NA07347 0
+207 NA07357 0
+208 NA10847 0
+209 NA10851 0
+210 NA11829 0
+211 NA11830 0
+212 NA11831 0
+213 NA11843 0
+214 NA11892 0
+215 NA11893 0
+216 NA11894 0
+217 NA11919 0
+218 NA11920 0
+219 NA11930 0
+220 NA11931 0
+221 NA11932 0
+222 NA11933 0
+223 NA11992 0
+224 NA11993 0
+225 NA11994 0
+226 NA11995 0
+227 NA12003 0
+228 NA12004 0
+229 NA12006 0
+230 NA12043 0
+231 NA12044 0
+232 NA12045 0
+233 NA12046 0
+234 NA12058 0
+235 NA12144 0
+236 NA12154 0
+237 NA12155 0
+238 NA12249 0
+239 NA12272 0
+240 NA12273 0
+241 NA12275 0
+242 NA12282 0
+243 NA12283 0
+244 NA12286 0
+245 NA12287 0
+246 NA12340 0
+247 NA12341 0
+248 NA12342 0
+249 NA12347 0
+250 NA12348 0
+251 NA12383 0
+252 NA12399 0
+253 NA12400 0
+254 NA12413 0
+255 NA12489 0
+256 NA12546 0
+257 NA12716 0
+258 NA12717 0
+259 NA12718 0
+260 NA12748 0
+261 NA12749 0
+262 NA12750 0
+263 NA12751 0
+264 NA12761 0
+265 NA12763 0
+266 NA12775 0
+267 NA12777 0
+268 NA12778 0
+269 NA12812 0
+270 NA12814 0
+271 NA12815 0
+272 NA12827 0
+273 NA12829 0
+274 NA12830 0
+275 NA12842 0
+276 NA12843 0
+277 NA12872 0
+278 NA12873 0
+279 NA12874 0
+280 NA12889 0
+281 NA12890 0
+282 NA20502 0
+283 NA20503 0
+284 NA20504 0
+285 NA20505 0
+286 NA20506 0
+287 NA20507 0
+288 NA20508 0
+289 NA20509 0
+290 NA20510 0
+291 NA20512 0
+292 NA20513 0
+293 NA20515 0
+294 NA20516 0
+295 NA20517 0
+296 NA20518 0
+297 NA20519 0
+298 NA20520 0
+299 NA20521 0
+300 NA20522 0
+301 NA20524 0
+302 NA20525 0
+303 NA20527 0
+304 NA20528 0
+305 NA20529 0
+306 NA20530 0
+307 NA20531 0
+308 NA20532 0
+309 NA20533 0
+310 NA20534 0
+311 NA20535 0
+312 NA20536 0
+313 NA20537 0
+314 NA20538 0
+315 NA20539 0
+316 NA20540 0
+317 NA20541 0
+318 NA20542 0
+319 NA20543 0
+320 NA20544 0
+321 NA20581 0
+322 NA20582 0
+323 NA20585 0
+324 NA20586 0
+325 NA20588 0
+326 NA20589 0
+327 NA20752 0
+328 NA20753 0
+329 NA20754 0
+330 NA20755 0
+331 NA20756 0
+332 NA20757 0
+333 NA20758 0
+334 NA20759 0
+335 NA20760 0
+336 NA20761 0
+337 NA20765 0
+338 NA20766 0
+339 NA20768 0
+340 NA20769 0
+341 NA20770 0
+342 NA20771 0
+343 NA20772 0
+344 NA20773 0
+345 NA20774 0
+346 NA20775 0
+347 NA20778 0
+348 NA20783 0
+349 NA20785 0
+350 NA20786 0
+351 NA20787 0
+352 NA20790 0
+353 NA20792 0
+354 NA20795 0
+355 NA20796 0
+356 NA20797 0
+357 NA20798 0
+358 NA20799 0
+359 NA20800 0
+360 NA20801 0
+361 NA20802 0
+362 NA20803 0
+363 NA20804 0
+364 NA20805 0
+365 NA20806 0
+366 NA20807 0
+367 NA20808 0
+368 NA20809 0
+369 NA20810 0
+370 NA20811 0
+371 NA20812 0
+372 NA20813 0
+373 NA20814 0
+374 NA20815 0
+375 NA20816 0
+376 NA20818 0
+377 NA20819 0
+378 NA20826 0
+379 NA20828 0
diff --git a/example/phased.vcf.gz b/example/phased.vcf.gz
new file mode 100644
index 0000000..33f7f98
Binary files /dev/null and b/example/phased.vcf.gz differ
diff --git a/example/ref.bcf b/example/ref.bcf
new file mode 100644
index 0000000..3afddd3
Binary files /dev/null and b/example/ref.bcf differ
diff --git a/example/ref.bcf.csi b/example/ref.bcf.csi
new file mode 100644
index 0000000..6a954b9
Binary files /dev/null and b/example/ref.bcf.csi differ
diff --git a/example/run_example.sh b/example/run_example.sh
new file mode 100644
index 0000000..c536e1f
--- /dev/null
+++ b/example/run_example.sh
@@ -0,0 +1,19 @@
+../eagle \
+ --bfile=EUR_test \
+ --geneticMapFile=USE_BIM \
+ --chrom=21 \
+ --outPrefix=phased \
+ --numThreads=4 \
+ 2>&1 | tee example.log
+
+### run eagle without any parameters to list options
+
+### typical options for phasing without a reference:
+# to import genetic map coordinates: --geneticMapFile=tables/genetic_map_hg##.txt.gz
+# to remove indivs or exclude SNPs: --remove, --exclude
+# to perform QC on missingness: --maxMissingPerIndiv, --maxMissingPerSnp
+# to select a region to phase: --bpStart, --bpEnd
+
+### old:
+# to use Eagle1 algorithm: --v1
+# to use Eagle1 fast mode: --v1fast
diff --git a/example/run_example_ref.sh b/example/run_example_ref.sh
new file mode 100644
index 0000000..c91e485
--- /dev/null
+++ b/example/run_example_ref.sh
@@ -0,0 +1,9 @@
+echo 'If pulling from github, change --geneticMapFile to ../tables/genetic_map_hg19_example.txt.gz'
+echo
+
+../eagle \
+ --vcfRef=ref.bcf \
+ --vcfTarget=target.vcf.gz \
+ --geneticMapFile=../tables/genetic_map_hg19_withX.txt.gz \
+ --outPrefix=target.phased \
+ 2>&1 | tee example_ref.log
diff --git a/example/run_example_vcf.sh b/example/run_example_vcf.sh
new file mode 100644
index 0000000..15d2b19
--- /dev/null
+++ b/example/run_example_vcf.sh
@@ -0,0 +1,17 @@
+../eagle \
+ --vcf=EUR_test.vcf.gz \
+ --geneticMapFile=../tables/genetic_map_hg19_withX.txt.gz \
+ --chrom=21 \
+ --outPrefix=phased \
+ --numThreads=4 \
+ 2>&1 | tee example_vcf.log
+
+### run eagle without any parameters to list options
+
+### typical options for phasing without a reference in VCF/BCF mode:
+# to import genetic map coordinates: --geneticMapFile=tables/genetic_map_hg##.txt.gz
+# to select a region to phase: --bpStart, --bpEnd
+
+### old:
+# to use Eagle1 algorithm: --v1
+# to use Eagle1 fast mode: --v1fast
diff --git a/example/target.phased.vcf.gz b/example/target.phased.vcf.gz
new file mode 100644
index 0000000..270221f
Binary files /dev/null and b/example/target.phased.vcf.gz differ
diff --git a/example/target.vcf.gz b/example/target.vcf.gz
new file mode 100644
index 0000000..52f66f6
Binary files /dev/null and b/example/target.vcf.gz differ
diff --git a/example/target.vcf.gz.tbi b/example/target.vcf.gz.tbi
new file mode 100644
index 0000000..42afe84
Binary files /dev/null and b/example/target.vcf.gz.tbi differ
diff --git a/src/COPYING b/src/COPYING
new file mode 100644
index 0000000..10926e8
--- /dev/null
+++ b/src/COPYING
@@ -0,0 +1,675 @@
+ GNU GENERAL PUBLIC LICENSE
+ Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+ The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works. By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users. We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors. You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+ To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights. Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received. You must make sure that they, too, receive
+or can get the source code. And you must show them these terms so they
+know their rights.
+
+ Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+ For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software. For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+ Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so. This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software. The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable. Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products. If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+ Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary. To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ TERMS AND CONDITIONS
+
+ 0. Definitions.
+
+ "This License" refers to version 3 of the GNU General Public License.
+
+ "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+ "The Program" refers to any copyrightable work licensed under this
+License. Each licensee is addressed as "you". "Licensees" and
+"recipients" may be individuals or organizations.
+
+ To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy. The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+ A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+ To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy. Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+ To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies. Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+ An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License. If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+ 1. Source Code.
+
+ The "source code" for a work means the preferred form of the work
+for making modifications to it. "Object code" means any non-source
+form of a work.
+
+ A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+ The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form. A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+ The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities. However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work. For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+ The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+ The Corresponding Source for a work in source code form is that
+same work.
+
+ 2. Basic Permissions.
+
+ All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met. This License explicitly affirms your unlimited
+permission to run the unmodified Program. The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work. This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+ You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force. You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright. Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+ Conveying under any other circumstances is permitted solely under
+the conditions stated below. Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+ No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+ When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+ 4. Conveying Verbatim Copies.
+
+ You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+ You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+ 5. Conveying Modified Source Versions.
+
+ You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+ a) The work must carry prominent notices stating that you modified
+ it, and giving a relevant date.
+
+ b) The work must carry prominent notices stating that it is
+ released under this License and any conditions added under section
+ 7. This requirement modifies the requirement in section 4 to
+ "keep intact all notices".
+
+ c) You must license the entire work, as a whole, under this
+ License to anyone who comes into possession of a copy. This
+ License will therefore apply, along with any applicable section 7
+ additional terms, to the whole of the work, and all its parts,
+ regardless of how they are packaged. This License gives no
+ permission to license the work in any other way, but it does not
+ invalidate such permission if you have separately received it.
+
+ d) If the work has interactive user interfaces, each must display
+ Appropriate Legal Notices; however, if the Program has interactive
+ interfaces that do not display Appropriate Legal Notices, your
+ work need not make them do so.
+
+ A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit. Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+ 6. Conveying Non-Source Forms.
+
+ You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+ a) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by the
+ Corresponding Source fixed on a durable physical medium
+ customarily used for software interchange.
+
+ b) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by a
+ written offer, valid for at least three years and valid for as
+ long as you offer spare parts or customer support for that product
+ model, to give anyone who possesses the object code either (1) a
+ copy of the Corresponding Source for all the software in the
+ product that is covered by this License, on a durable physical
+ medium customarily used for software interchange, for a price no
+ more than your reasonable cost of physically performing this
+ conveying of source, or (2) access to copy the
+ Corresponding Source from a network server at no charge.
+
+ c) Convey individual copies of the object code with a copy of the
+ written offer to provide the Corresponding Source. This
+ alternative is allowed only occasionally and noncommercially, and
+ only if you received the object code with such an offer, in accord
+ with subsection 6b.
+
+ d) Convey the object code by offering access from a designated
+ place (gratis or for a charge), and offer equivalent access to the
+ Corresponding Source in the same way through the same place at no
+ further charge. You need not require recipients to copy the
+ Corresponding Source along with the object code. If the place to
+ copy the object code is a network server, the Corresponding Source
+ may be on a different server (operated by you or a third party)
+ that supports equivalent copying facilities, provided you maintain
+ clear directions next to the object code saying where to find the
+ Corresponding Source. Regardless of what server hosts the
+ Corresponding Source, you remain obligated to ensure that it is
+ available for as long as needed to satisfy these requirements.
+
+ e) Convey the object code using peer-to-peer transmission, provided
+ you inform other peers where the object code and Corresponding
+ Source of the work are being offered to the general public at no
+ charge under subsection 6d.
+
+ A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+ A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling. In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage. For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product. A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+ "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source. The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+ If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information. But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+ The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed. Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+ Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+ 7. Additional Terms.
+
+ "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law. If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+ When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it. (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.) You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+ Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+ a) Disclaiming warranty or limiting liability differently from the
+ terms of sections 15 and 16 of this License; or
+
+ b) Requiring preservation of specified reasonable legal notices or
+ author attributions in that material or in the Appropriate Legal
+ Notices displayed by works containing it; or
+
+ c) Prohibiting misrepresentation of the origin of that material, or
+ requiring that modified versions of such material be marked in
+ reasonable ways as different from the original version; or
+
+ d) Limiting the use for publicity purposes of names of licensors or
+ authors of the material; or
+
+ e) Declining to grant rights under trademark law for use of some
+ trade names, trademarks, or service marks; or
+
+ f) Requiring indemnification of licensors and authors of that
+ material by anyone who conveys the material (or modified versions of
+ it) with contractual assumptions of liability to the recipient, for
+ any liability that these contractual assumptions directly impose on
+ those licensors and authors.
+
+ All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10. If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term. If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+ If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+ Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+ 8. Termination.
+
+ You may not propagate or modify a covered work except as expressly
+provided under this License. Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+ However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+ Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+ Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License. If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+ 9. Acceptance Not Required for Having Copies.
+
+ You are not required to accept this License in order to receive or
+run a copy of the Program. Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance. However,
+nothing other than this License grants you permission to propagate or
+modify any covered work. These actions infringe copyright if you do
+not accept this License. Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+ 10. Automatic Licensing of Downstream Recipients.
+
+ Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License. You are not responsible
+for enforcing compliance by third parties with this License.
+
+ An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations. If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+ You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License. For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+ 11. Patents.
+
+ A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based. The
+work thus licensed is called the contributor's "contributor version".
+
+ A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version. For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+ In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement). To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+ If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients. "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+ If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+ A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License. You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+ Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+ 12. No Surrender of Others' Freedom.
+
+ If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all. For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+ 13. Use with the GNU Affero General Public License.
+
+ Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work. The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+ 14. Revised Versions of this License.
+
+ The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+ Each version is given a distinguishing version number. If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation. If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+ If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+ Later license versions may give you additional or different
+permissions. However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+ 15. Disclaimer of Warranty.
+
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+ 16. Limitation of Liability.
+
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+ 17. Interpretation of Sections 15 and 16.
+
+ If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+ If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+ <program> Copyright (C) <year> <name of author>
+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+ You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<http://www.gnu.org/licenses/>.
+
+ The GNU General Public License does not permit incorporating your program
+into proprietary programs. If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License. But first, please read
+<http://www.gnu.org/philosophy/why-not-lgpl.html>.
+
diff --git a/src/DipTreePBWT.cpp b/src/DipTreePBWT.cpp
new file mode 100644
index 0000000..dde7e3e
--- /dev/null
+++ b/src/DipTreePBWT.cpp
@@ -0,0 +1,611 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <vector>
+#include <iostream>
+#include <map>
+#include <utility>
+#include <algorithm>
+#include <cmath>
+#include <cstdlib>
+#include <cstring>
+#include <cassert>
+
+#include <boost/random.hpp>
+#include <boost/random/lagged_fibonacci.hpp>
+#include <boost/random/uniform_01.hpp>
+
+#include "HapHedge.hpp"
+#include "NumericUtils.hpp"
+#include "Timer.hpp"
+#include "DipTreePBWT.hpp"
+
+namespace EAGLE {
+
+ using std::vector;
+ using std::cout;
+ using std::endl;
+
+ const int TO_UNKNOWN = -2, TO_NONE = -1; // TO_NONE used in HapPathSplit and HapPrefix
+
+
+ // struct HapPathSplit
+
+ HapPathSplit::HapPathSplit(void) {};
+ HapPathSplit::HapPathSplit(int _t) : t(_t), relProbLastStop(1), hapPrefixInd(0) {
+ hapPrefixTo[0] = hapPrefixTo[1] = TO_UNKNOWN;
+ };
+ HapPathSplit::HapPathSplit(int _t, float relProb, int ind)
+ : t(_t), relProbLastStop(relProb), hapPrefixInd(ind) {
+ hapPrefixTo[0] = hapPrefixTo[1] = TO_UNKNOWN;
+ };
+
+
+ // struct HapPath
+
+ HapPath::HapPath(void) {};
+
+
+ // struct HapPrefix
+
+ HapPrefix::HapPrefix(void) {};
+ HapPrefix::HapPrefix(const HapTreeState &_state) {
+ state = _state;
+ //to[0] = to[1] = TO_UNKNOWN;
+ };
+
+ // class HapWaves
+
+ HapWaves::HapWaves(const HapHedgeErr &_hapHedge, const vector <double> &_cMcoords,
+ double _cMexpect, int _histLength, int _beamWidth, float _logPerr,
+ int _tCur) :
+
+ rng(123), rand01(rng, boost::uniform_01<>()),
+ hapHedge(_hapHedge), cMcoords(_cMcoords), cMexpect(_cMexpect), histLength(_histLength),
+ beamWidth(_beamWidth), pErr(expf(_logPerr)), maxHapPaths(2*beamWidth),
+ maxHapPrefixes(maxHapPaths*histLength*2+1), tCur(_tCur) {
+
+ curMod = tCur % HAPWAVES_HIST; nextMod = (tCur+1) % HAPWAVES_HIST;
+
+ for (int p = 0; p < HAPWAVES_HIST; p++) {
+ hapPathSizes[p] = 0;
+ hapPaths[p] = new HapPath[maxHapPaths];
+ for (int i = 0; i < maxHapPaths; i++)
+ hapPaths[p][i].splitList = new HapPathSplit[histLength];
+ hapPrefixes[p] = new HapPrefix[maxHapPrefixes];
+ }
+
+ // add root of 0th HapTree as cur HapPath
+ hapPaths[curMod][0].cumLogP = 0;
+ hapPaths[curMod][0].splitListLength = 1;
+ hapPaths[curMod][0].splitList[0] = HapPathSplit(tCur);
+ hapPaths[curMod][0].to[0] = hapPaths[0][0].to[1] = TO_UNKNOWN;
+ hapPathSizes[curMod] = 1;
+
+ hapPrefixes[curMod][0] = HapPrefix(hapHedge.getHapTreeMulti(tCur).getRootState());
+ hapPrefixSizes[curMod] = 1;
+
+ }
+
+ HapWaves::~HapWaves(void) {
+ for (int p = 0; p < HAPWAVES_HIST; p++) {
+ for (int i = 0; i < maxHapPaths; i++)
+ delete[] hapPaths[p][i].splitList;
+ delete[] hapPaths[p];
+ delete[] hapPrefixes[p];
+ }
+ }
+
+ inline double sq(double x) { return x*x; }
+
+ // use cMcoords and cMexpect (cMexpect>0 => coalescent; cMexpect<0 => Li-Stephens)
+ float HapWaves::recombP(int tCur, int tSplit) const {
+ double p = 0;
+ if (cMexpect > 0) { // coalescent IBD length distribution with mean a = cMexpect
+ double a = cMexpect;
+ double term1 = 1 / sq(1 + (cMcoords[tCur]-cMcoords[tSplit])/a);
+ double term2 = tCur+1 == (int) cMcoords.size() ? 0 :
+ 1 / sq(1 + (cMcoords[tCur+1]-cMcoords[tSplit])/a);
+ p = term1 - term2;
+ }
+ else { // Li-Stephens IBD length distribution with mean a = -cMexpect
+ double a = -cMexpect;
+ double term1 = exp(-(cMcoords[tCur]-cMcoords[tSplit])/a);
+ double term2 = tCur+1 == (int) cMcoords.size() ? 0 :
+ exp(-(cMcoords[tCur+1]-cMcoords[tSplit])/a);
+ p = term1 - term2;
+ }
+ const double minRecombP = 0.000001, maxRecombP = 1.0;//pErr;
+ return std::max(std::min(p, maxRecombP), minRecombP);
+ }
+
+ // populate hapPrefixes[nextMod]
+ // populate toCumLogP[] in hapPaths[curMod] (but don't populate hapPaths[nextMod])
+ void HapWaves::computeAllExtensions(const vector <uchar> &nextPossibleBits) {
+ // add root of next (= new cur) HapTree as beginning of HapPrefix list
+ if (tCur+1 < (int) cMcoords.size()) {
+ hapPrefixes[nextMod][0] = HapPrefix(hapHedge.getHapTreeMulti(tCur+1).getRootState());
+ hapPrefixSizes[nextMod] = 1;
+ }
+
+ float mult = hapHedge.getHapTreeMulti(tCur).getInvNhaps();
+
+ // iterate over paths
+ for (int i = 0; i < hapPathSizes[curMod]; i++) {
+ float relProbStopNext[2] = {0, 0};
+ // iterate over splits
+ for (int j = 0; j < hapPaths[curMod][i].splitListLength; j++) {
+ HapPathSplit &split = hapPaths[curMod][i].splitList[j];
+ // iterate over next possible bits
+ for (int b = 0; b < 2; b++) {
+ if (!((nextPossibleBits[i]>>b)&1)) continue;
+ HapPrefix &hapPrefix = hapPrefixes[curMod][split.hapPrefixInd];
+ // if extension of hap prefix hasn't been attempted, attempt to perform extension
+ if (split.hapPrefixTo[b] == TO_UNKNOWN) {
+ split.hapPrefixTo[b] = TO_NONE; // default: can't extend (overwrite if path found)
+ hapPrefix.toHetOnlyProb[b] = 0;
+ // try to extend hap prefix:
+ // fill in split.hapPrefixTo[b], hapPrefixes[curMod][split.hapPrefixInd].to*[b]
+ const HapTreeMulti &hapTree = hapHedge.getHapTreeMulti(split.t);
+
+ HapTreeState state = hapPrefix.state;
+ if (hapTree.next(2*tCur, state, b)) { // can extend to match at het
+ hapPrefix.toHetOnlyProb[b] += mult * state.count;
+ if (hapTree.next(2*tCur+1, state, 0)) { // no err in inter-het region
+ // create and link new HapPrefix node in hapPrefixes[nextMod]; link
+ split.hapPrefixTo[b] = hapPrefixSizes[nextMod]++;
+ hapPrefixes[nextMod][split.hapPrefixTo[b]].state = state;
+ }
+ }
+ }
+ relProbStopNext[b] += split.relProbLastStop * hapPrefix.toHetOnlyProb[b]
+ * recombP(tCur, split.t);
+ }
+ }
+ for (int b = 0; b < 2; b++) {
+ if (!((nextPossibleBits[i]>>b)&1)) continue;
+ float relLogP = -1000;
+ if (relProbStopNext[b] != 0) relLogP = logf(relProbStopNext[b]);
+ hapPaths[curMod][i].toCumLogP[b] =
+ hapPaths[curMod][i].cumLogP + relLogP;// + recombLogPs[tCur];
+ }
+ }
+ }
+
+ float HapWaves::getToCumLogProb(int ind, int nextBit) const {
+ return hapPaths[curMod][ind].toCumLogP[nextBit];
+ }
+
+ // look up/create extension of hapPaths[curMod][ind] in hapPaths[nextMod]
+ // return index in hapPaths[nextMod]
+ int HapWaves::extendPath(int ind, int nextBit) {
+ HapPath &curHapPath = hapPaths[curMod][ind];
+ if (curHapPath.to[nextBit] == TO_UNKNOWN) {
+ int nextInd = hapPathSizes[nextMod]++;
+ assert(hapPathSizes[nextMod]<=maxHapPaths);
+ curHapPath.to[nextBit] = nextInd;
+ HapPath &nextHapPath = hapPaths[nextMod][nextInd];
+ nextHapPath.cumLogP = curHapPath.toCumLogP[nextBit];
+ float calibP = expf(curHapPath.cumLogP - nextHapPath.cumLogP);
+ int &nSplit = nextHapPath.splitListLength; nSplit = 0;
+ nextHapPath.to[0] = nextHapPath.to[1] = TO_UNKNOWN;
+ for (int j = (curHapPath.splitList[0].t + histLength == tCur+1 ? 1 : 0);
+ j < curHapPath.splitListLength; j++) {
+ const HapPathSplit &curSplit = curHapPath.splitList[j];
+ if (curSplit.hapPrefixTo[nextBit] != TO_NONE) {
+ nextHapPath.splitList[nSplit++] = HapPathSplit(curSplit.t,
+ curSplit.relProbLastStop * calibP,
+ curSplit.hapPrefixTo[nextBit]);
+ }
+ }
+ nextHapPath.splitList[nSplit++] = HapPathSplit(tCur+1); // restart
+ }
+ return curHapPath.to[nextBit];
+ }
+
+ void HapWaves::advance(void) {
+ tCur++; curMod = tCur % HAPWAVES_HIST; nextMod = (tCur+1) % HAPWAVES_HIST;
+ hapPathSizes[nextMod] = 0;
+ hapPrefixSizes[nextMod] = 0;
+ }
+
+ /*
+ * At any point along a haplotype path (i.e., a sequence of alleles at split sites),
+ * we have stored a "split list" of positions at which the last copied segment could have begun.
+ * We can compute the relative probabilities of these split positions (given the next allele),
+ * which allows us to sample the last copied segment.
+ *
+ * INPUT: (t, hapPathInd, tBit) designating a stored haplotype path extended to tBit at t
+ * - t = position; (t % HAPWAVES_HIST) is index in haplotype paths ending in hom region before t
+ * - hapPathInd = index
+ * - tBit = haplotype bit at t (to which to extend haplotype prefixes in split list)
+ *
+ * OUTPUT: (tStart, state) designating a haplotype segment randomly sampled from the split list
+ * - tStart = start position of copied segment
+ * - state = state corresponding to copied segment in HapTree at tStart
+ */
+ void HapWaves::sampleLastPrefix(int &tStart, HapTreeState &state, int t, int hapPathInd,
+ int tBit) {
+ assert(tCur+1 - t < HAPWAVES_HIST); // the relevant history shouldn't have been overwritten
+ int tMod = t % HAPWAVES_HIST;
+ const HapPath &hapPath = hapPaths[tMod][hapPathInd];
+
+ // compute (unscaled) probabilities of each possible split point in the list
+ float relProbStopNext = 0;
+ vector <float> cumRelProbStopNext(hapPath.splitListLength);
+ for (int j = 0; j < hapPath.splitListLength; j++) {
+ const HapPathSplit &split = hapPath.splitList[j];
+ const HapPrefix &hapPrefix = hapPrefixes[tMod][split.hapPrefixInd];
+ // this computation was previously done to determine the relative probabilities
+ // of extending the path to tBit=0 vs. tBit=1 at tree index t (= split site t-1)
+ relProbStopNext += split.relProbLastStop * hapPrefix.toHetOnlyProb[tBit]
+ * recombP(t, split.t);
+ cumRelProbStopNext[j] = relProbStopNext;
+ }
+
+ float relLogP = -1000;
+ if (relProbStopNext != 0) relLogP = logf(relProbStopNext);
+ assert(hapPaths[tMod][hapPathInd].toCumLogP[tBit] == hapPath.cumLogP + relLogP);
+
+ // randomly sample a split point
+ float r = rand01();
+ for (int j = 0; j < hapPath.splitListLength; j++)
+ if (cumRelProbStopNext[j] > r*relProbStopNext || j+1 == hapPath.splitListLength) {
+ const HapPathSplit &split = hapPath.splitList[j];
+ const HapPrefix &hapPrefix = hapPrefixes[tMod][split.hapPrefixInd];
+ tStart = split.t;
+ state = hapPrefix.state;
+ return;
+ }
+ }
+
+
+ // struct DipTreeNode
+
+ bool DipTreeNode::operator < (const DipTreeNode &dNode) const {
+ return logP+boostLogP > dNode.logP+dNode.boostLogP;
+ }
+
+
+ // class DipTree
+
+ void DipTree::traceNode(int t, int i) {
+ int from = nodes[t][i].from;
+ if (t>1) traceNode(t-1, from);
+ cout << "(" << (int) nodes[t][i].hapMat << "," << (int) nodes[t][i].hapPat << ") ";
+ }
+
+ std::pair <uint64, uint64> truncPair(uint64 histMat, uint64 histPat, uint64 histBits) {
+ uint64 mask = histBits>=64ULL ? -1ULL : (1ULL<<histBits)-1;
+ uint64 x = histMat&mask, y = histPat&mask;
+ return x<y ? std::make_pair(x, y) : std::make_pair(y, x);
+ }
+
+ void DipTree::advance(void) {
+
+ bool isOppConstrained = constraints[tCur]==OPP_CONSTRAINT; // constrained to be 0|1 or 1|0
+ bool isFullyConstrained = !isOppConstrained && constraints[tCur]!=NO_CONSTRAINT;
+
+ // populate next possible bits: nextPossibleBits[i] corresponds to hapPaths[curMod][i]
+ // for i = dNode.hapPathInds[0], dNode.hapPathInds[1]
+ vector <uchar> nextPossibleBits(2*beamWidth);
+ int checkWidth = std::min((int) nodes[tCur].size(), beamWidth);
+ vector <char> reqMats(checkWidth), reqPats(checkWidth);
+ const float logPthresh = 2*logPerr;//logf(0.000001f);
+ for (int i = 0; i < checkWidth; i++) {
+ const DipTreeNode &dNode = nodes[tCur][i];
+ if (dNode.logP+dNode.boostLogP < nodes[tCur][0].logP+nodes[tCur][0].boostLogP + logPthresh) {
+ checkWidth = i;
+ break;
+ }
+ assert(dNode.hapPathInds[0] < (int) nextPossibleBits.size());
+ assert(dNode.hapPathInds[1] < (int) nextPossibleBits.size());
+ if (isFullyConstrained) {
+ char &reqMat = reqMats[i], &reqPat = reqPats[i];
+ if ((constraints[tCur]>>1) == 0) // no-hom-err constraint
+ reqMat = reqPat = constraints[tCur]&1;
+ else { // rel phase constraint
+ int t = tCur, ind = i;
+ for (int d = 0; d < (constraints[tCur]>>1)-1; d++)
+ ind = nodes[t--][ind].from;
+ reqMat = nodes[t][ind].hapMat ^ (constraints[tCur]&1);
+ reqPat = nodes[t][ind].hapPat ^ (constraints[tCur]&1);
+ }
+ nextPossibleBits[dNode.hapPathInds[0]] |= 1<<reqMat;
+ nextPossibleBits[dNode.hapPathInds[1]] |= 1<<reqPat;
+ }
+ else {
+ nextPossibleBits[dNode.hapPathInds[0]] = 3;
+ nextPossibleBits[dNode.hapPathInds[1]] = 3;
+ }
+ }
+ // extend hap paths (part 1)
+ hapWaves.computeAllExtensions(nextPossibleBits);
+
+ // extend dip paths
+ vector <DipTreeNode> nextNodes;
+ for (int i = 0; i < checkWidth; i++) {
+ const DipTreeNode &dNode = nodes[tCur][i];
+ for (char hapMat = 0; hapMat < 2; hapMat++)
+ for (char hapPat = 0; hapPat < 2; hapPat++) {
+ if (!dNode.unequalAnc && hapMat > hapPat) continue;
+ if (isFullyConstrained && (hapMat != reqMats[i] || hapPat != reqPats[i])) continue;
+ if (isOppConstrained && hapMat==hapPat) continue;
+ DipTreeNode nextNode;
+ nextNode.from = i;
+ nextNode.unequalAnc = dNode.unequalAnc || (hapMat != hapPat);
+ nextNode.hapMat = hapMat;
+ nextNode.hapPat = hapPat;
+ nextNode.numErr = dNode.numErr + (genos[tCur]<=2 && hapMat+hapPat != genos[tCur]);
+ nextNode.logP = hapWaves.getToCumLogProb(dNode.hapPathInds[0], hapMat) +
+ hapWaves.getToCumLogProb(dNode.hapPathInds[1], hapPat) + nextNode.numErr * logPerr;
+ nextNode.boostLogP = dNode.boostLogP;
+ if (isFullyConstrained) {
+ nextNode.histMat = dNode.histMat;
+ nextNode.histPat = dNode.histPat;
+ }
+ else {
+ nextNode.histMat = (dNode.histMat<<1ULL) | hapMat;
+ nextNode.histPat = (dNode.histPat<<1ULL) | hapPat;
+ }
+ nextNodes.push_back(nextNode);
+ }
+ }
+
+ if (!isFullyConstrained) {
+ // compute number of bits of history to use (histLength minus # of fully constrained sites)
+ int histBits = 0;
+ for (int t = tCur; t > std::max(tCur-histLength, 0); t--)
+ if (constraints[t]==OPP_CONSTRAINT || constraints[t]==NO_CONSTRAINT)
+ histBits++;
+
+ // aggregate DipTree paths that agree exactly in past histLength
+ std::sort(nextNodes.begin(), nextNodes.end());
+ std::map < std::pair <uint64, uint64>, int > histToInd;
+ for (int i = 0; i < (int) nextNodes.size(); i++) {
+ const DipTreeNode &nextNode = nextNodes[i];
+ std::pair <uint64, uint64> histPair =
+ truncPair(nextNode.histMat, nextNode.histPat, histBits);
+ std::map < std::pair <uint64, uint64>, int >::iterator it = histToInd.find(histPair);
+ if (it == histToInd.end()) {
+ histToInd[histPair] = nodes[tCur+1].size();
+ nodes[tCur+1].push_back(nextNode);
+ }
+ else {
+ int j = it->second;
+ float sumLogPj = nodes[tCur+1][j].logP + nodes[tCur+1][j].boostLogP;
+ float sumLogPi = nextNode.logP + nextNode.boostLogP;
+ NumericUtils::logSumExp(sumLogPi, sumLogPj); // prob i += prob existing tCur+1 node j
+ nodes[tCur+1][j].boostLogP += sumLogPi - sumLogPj; // augment boost for existing node j
+ }
+ }
+ //cout << " " << nodes[tCur+1].size() << "/" << nextNodes.size() << std::flush;
+ }
+ else
+ nodes[tCur+1] = nextNodes;
+
+ // extend hap paths of top beamWidth DipTree nodes (part 2)
+ for (int i = 0; i < std::min((int) nodes[tCur+1].size(), beamWidth); i++) {
+ DipTreeNode &nextNode = nodes[tCur+1][i];
+ const DipTreeNode &dNode = nodes[tCur][nextNode.from];
+ nextNode.hapPathInds[0] = hapWaves.extendPath(dNode.hapPathInds[0], nextNode.hapMat);
+ nextNode.hapPathInds[1] = hapWaves.extendPath(dNode.hapPathInds[1], nextNode.hapPat);
+ }
+
+ hapWaves.advance();
+ tCur++;
+
+ float totLogP = nodes[tCur][0].logP + nodes[tCur][0].boostLogP;
+ for (int i = 1; i < (int) nodes[tCur].size(); i++)
+ NumericUtils::logSumExp(totLogP, nodes[tCur][i].logP + nodes[tCur][i].boostLogP);
+ for (int i = 0; i < (int) nodes[tCur].size(); i++) {
+ normProbs[tCur].push_back(expf(nodes[tCur][i].logP + nodes[tCur][i].boostLogP - totLogP));
+ //traceNode(tCur, i); cout << normProbs[tCur].back() << endl;
+ }
+ }
+
+ DipTree::DipTree(const HapHedgeErr &_hapHedge, const vector <uchar> &_genos,
+ const char *_constraints, const vector <double> &_cMcoords, double _cMexpect,
+ int _histLength, int _beamWidth, float _logPerr, int _tCur) :
+ rng(12345), rand01(rng, boost::uniform_01<>()),
+ hapHedge(_hapHedge),
+ hapWaves(_hapHedge, _cMcoords, _cMexpect, _histLength, _beamWidth, _logPerr, _tCur),
+ genos(_genos), constraints(_constraints), histLength(_histLength), beamWidth(_beamWidth),
+ logPerr(_logPerr), tCur(_tCur), T(_hapHedge.getNumTrees()), nodes(T+1), normProbs(T+1) {
+
+ DipTreeNode dNode;
+ dNode.from = -1; dNode.unequalAnc = 0; dNode.logP = 0; dNode.numErr = 0;
+ dNode.hapPathInds[0] = dNode.hapPathInds[1] = 0;
+ dNode.histMat = 0; dNode.histPat = 0; dNode.boostLogP = 0;
+ nodes[tCur].push_back(dNode); // root of DipTree
+ }
+
+ // compute probability of AA at hets tCallLoc1 and tCallLoc2
+ float DipTree::callProbAA(int tCallLoc1, int tCallLoc2, int callLength) {
+ assert(tCallLoc1>0 && tCallLoc2<T);
+ int tFront = std::min(T, tCallLoc2 + callLength);
+ while (tCur < tFront)
+ advance();
+ float probAA = 0, probAB = 0;
+ for (int i = 0; i < (int) nodes[tFront].size(); i++) {
+ int t = tFront, ind = i;
+ while (t != tCallLoc2+1)
+ ind = nodes[t--][ind].from;
+ char hapMat2 = nodes[t][ind].hapMat, hapPat2 = nodes[t][ind].hapPat; // alleles at tCallLoc2
+ while (t != tCallLoc1+1)
+ ind = nodes[t--][ind].from;
+ char hapMat1 = nodes[t][ind].hapMat, hapPat1 = nodes[t][ind].hapPat; // alleles at tCallLoc1
+ if (hapMat2 != hapPat2 && hapMat1 != hapPat1) {
+ if (hapMat1 == hapMat2)
+ probAA += normProbs[tFront][i];
+ else
+ probAB += normProbs[tFront][i];
+ }
+ else {
+ probAA += normProbs[tFront][i] / 2;
+ probAB += normProbs[tFront][i] / 2;
+ }
+ }
+ if (probAA + probAB == 0) return 0.5;
+ return probAA / (probAA + probAB);
+ }
+
+ // compute diploid dosage at tCallLoc
+ float DipTree::callDosage(int tCallLoc, int callLength) {
+ assert(tCallLoc>0 && tCallLoc<T);
+ int tFront = std::min(T, tCallLoc + callLength);
+ while (tCur < tFront)
+ advance();
+ float prob1 = 0, probTot = 0;
+ for (int i = 0; i < (int) nodes[tFront].size(); i++) {
+ int t = tFront, ind = i;
+ while (t != tCallLoc+1)
+ ind = nodes[t--][ind].from;
+ char hapMat = nodes[t][ind].hapMat, hapPat = nodes[t][ind].hapPat; // alleles at tCallLoc
+ prob1 += (hapMat+hapPat) * normProbs[tFront][i];
+ probTot += normProbs[tFront][i];
+ }
+ if (probTot == 0) return 1.0;
+ return prob1 / probTot;
+ }
+
+ // note: part of this backtrace is already performed in sampleRefs(); can optimize if need be
+ void computeHetMasks(RefHap &refHap, const vector < vector <DipTreeNode> > &nodes, int tCallLoc,
+ int t, int ind, int h, bool isFwd) {
+ refHap.tMaskFwd = refHap.tMaskRev = 0;
+ const int maxShift = 8*sizeof(refHap.tMaskFwd);
+ while (t > 0) { // rewind DipTree
+ int t1Bit = h==0 ? nodes[t][ind].hapMat : nodes[t][ind].hapPat; // allele at t-1
+ int dist = t-1 - tCallLoc;
+ if (dist <= 0) dist--;
+ if (dist < -maxShift) break;
+ else if (dist <= maxShift) {
+ int t1BitShift = t1Bit<<(abs(dist)-1);
+ if (isFwd == (dist > 0)) refHap.tMaskFwd |= t1BitShift;
+ else refHap.tMaskRev |= t1BitShift;
+ }
+ ind = nodes[t--][ind].from;
+ }
+ }
+
+ /*
+ * INPUT:
+ * - tCallLoc = tree index of left side of interval of interest: (tCallLoc, tCallLoc+1)
+ * - callLength = number of positions to look ahead
+ * - samples = number of random samples to take
+ * - bestHaps = actual indices of Kpbwt haplotypes currently encoded in HapHedgeErr
+ * - isFwd = flag indicating whether output het masks should be little- or big-endian
+ *
+ * OUTPUT:
+ * - vector of sampled reference haplotype pairs
+ */
+ vector <HapPair> DipTree::sampleRefs(int tCallLoc, int callLength, int samples,
+ const vector <uint> &bestHaps, bool isFwd) {
+ if (callLength > HAPWAVES_HIST-5) {
+ cerr << "ERROR in DipTree::sampleRefs(): callLength=" << callLength << ", HAPWAVES_HIST="
+ << HAPWAVES_HIST << endl;
+ cerr << " To use this callLength, increase HAPWAVES_HIST and recompile" << endl;
+ assert(callLength <= HAPWAVES_HIST-5);
+ }
+ assert(tCallLoc>=0 && tCallLoc<T-1); // (tCallLoc, tCallLoc+1) must be a valid interval
+ int tFront = std::min(T, tCallLoc + callLength); // look ahead approx. callLength positions
+ while (tCur < tFront)
+ advance();
+
+ float probTot = 0; // compute total probability of saved DipTree nodes at current pos (tFront)
+ for (int i = 0; i < (int) nodes[tFront].size(); i++)
+ probTot += normProbs[tFront][i];
+
+ vector <HapPair> ret(samples);
+
+ for (int s = 0; s < samples; s++) {
+ // randomly sample a DipTree node
+ float r = rand01();
+ float cumProb = 0;
+ for (int i = 0; i < (int) nodes[tFront].size(); i++) {
+ cumProb += normProbs[tFront][i];
+ if (cumProb > r*probTot || i+1 == (int) nodes[tFront].size()) {
+ // sample a reference haplotype for each parental path in the sampled DipTree node
+ for (int h = 0; h < 2; h++) {
+ int t = tFront; // current position in DipTree
+ int ind = i; // index into stored DipTree nodes at t
+ int tStart = tFront; // start of last copied haplotype segment (for now, set to tFront)
+
+ // set ret[s].haps[h].tMask{Fwd,Rev} to use when aligning parental paths to phase calls
+ computeHetMasks(ret[s].haps[h], nodes, tCallLoc, t, ind, h, isFwd);
+ /*
+ traceNode(t, i);
+ cout << "tFront = " << tFront << endl;
+ cout << "tCallLoc: " << tCallLoc << " tStart: " << tStart << " T: " << T << endl;
+ */
+ HapTreeState state; int tBit = 0;
+ ret[s].haps[h].isEnd = false;
+ // jump backward one copied segment at a time until we get one starting <= tCallLoc
+ while (tStart > tCallLoc) {
+ if (tStart == tCallLoc+1) // prev segment ends in (tCallLoc, tCallLoc+1)
+ ret[s].haps[h].isEnd = true;
+ while (t != tStart) // rewind DipTree from t to tStart (start of last copied segment)
+ ind = nodes[t--][ind].from;
+ tBit = h==0 ? nodes[t][ind].hapMat : nodes[t][ind].hapPat; // allele at tStart-1
+ ind = nodes[t--][ind].from; // move t back 1; now tBit is allele at t (= tStart-1)
+ // sample previous segment; output is written to (tStart, state)
+ hapWaves.sampleLastPrefix(tStart, state, t, nodes[t][ind].hapPathInds[h], tBit);
+ ret[s].haps[h].tLength = t-tStart;
+ }
+
+ const HapTreeMulti &hapTree = hapHedge.getHapTreeMulti(tStart);
+ bool tBitExtOK = hapTree.next(2*t, state, tBit); // extend state to bit=tBit @ t
+ if (!tBitExtOK) {
+ cerr << "Internal error in sampleRefs(): Could not extend haplotype" << endl;
+ cerr << " tStart = " << tStart << endl;
+ cerr << " t = " << t << endl;
+ cerr << " tCallLoc = " << tCallLoc << endl;
+ cerr << " tFront = " << tFront << endl;
+ cerr << " T = " << T << endl;
+ cerr << " tBit = " << tBit << endl;
+ cerr << " state.seq = " << state.seq << endl;
+ cerr << " state.node = " << state.node << endl;
+ cerr << " state.count = " << state.count << endl;
+ assert(tBitExtOK); // error out
+ }
+
+ // randomly sample an actual haplotype from this prefix, moving up to 10 hets ahead
+ for (int m = 2*t+1; m < 2*T && m < 2*t+20; m++) {
+ if (m % 2 == 1) // error bit: extend to 0-err hom region if possible
+ hapTree.nextAtFrac(m, state, 0);
+ else // het bit: randomly choose extension
+ hapTree.nextAtFrac(m, state, rand01());
+ }
+ /*
+ const HapBitsT &hapBitsT = hapHedge.getHapBitsT();
+ int refSeq = state.seq;
+ // refSeq's bit at tCallLoc: hapBitsT.getBit(refSeq, 2*tCallLoc)
+ // check tStart..t of refSeq matches geno
+ for (int m = 2*tStart+1; m <= 2*t; m += 2)
+ assert(hapBitsT.getBit(refSeq, m)==0);
+ */
+ ret[s].haps[h].refSeq = bestHaps[state.seq];
+ }
+ break;
+ }
+ }
+ }
+ return ret;
+ }
+
+}
diff --git a/src/DipTreePBWT.hpp b/src/DipTreePBWT.hpp
new file mode 100644
index 0000000..24201f0
--- /dev/null
+++ b/src/DipTreePBWT.hpp
@@ -0,0 +1,169 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef DIPTREEPBWT_HPP
+#define DIPTREEPBWT_HPP
+
+#include <vector>
+
+#include <boost/random.hpp>
+#include <boost/random/lagged_fibonacci.hpp>
+#include <boost/random/uniform_01.hpp>
+
+#include "HapHedge.hpp"
+
+namespace EAGLE {
+
+ struct HapPathSplit {
+ int t; // location of most recent start = tree index (note HapPath goes to tCur >= t)
+ float relProbLastStop; // cumP for path ending just before t, relative to
+ // cumP = exp(cumLogP) for full HapPath ending just before tCur
+ int hapPrefixInd; // index (in HapWaves::hapPrefixes[curMod][.])
+ // of the HapPrefix starting from t that ends the HapPath
+ int hapPrefixTo[2]; // indices (in HapWaves::hapPrefixes[nextMod][.])
+ // of the extended HapPrefixes starting from t that have 0 (resp. 1)
+ // at split site tCur (i.e, bit 2*tCur) and no err (i.e., 0) at 2*tCur+1
+ HapPathSplit(void);
+ HapPathSplit(int _t); // split corresponding to new start at tCur=_t (i.e., root state)
+ HapPathSplit(int _t, float relProb, int ind);
+ };
+
+ struct HapPath {
+ float cumLogP; // log(cumP) for this path ending just before tCur
+ int splitListLength;
+ HapPathSplit *splitList; // [max size = histLength] list of prev split points, probs, prefixes
+ int to[2]; // index of extension to bit = 0 (resp. 1) at split site tCur (i.e., HapPath tCur+1)
+ float toCumLogP[2]; // log(cumP) for extended path
+ HapPath(void);
+ };
+
+ struct HapPrefix {
+ HapTreeState state; // state in tree getHapTreeMulti(split.t)
+ //int to[2];
+ float toHetOnlyProb[2]; // probability of extension to bit = 0 (resp. 1) at split site tCur
+ HapPrefix(void);
+ HapPrefix(const HapTreeState &_state);
+ };
+
+ struct RefHap {
+ uint refSeq;
+ short tLength;
+ bool isEnd;
+ short tMaskFwd, tMaskRev;
+ };
+ struct HapPair {
+ RefHap haps[2];
+ };
+
+// history length to save for sampling ref haps (for in-sample imputation):
+// needs to be a few splits longer than callLength passed to sampleRefs()
+#define HAPWAVES_HIST 25
+
+ class HapWaves {
+
+ boost::lagged_fibonacci607 rng;
+ boost::variate_generator<boost::lagged_fibonacci607&, boost::uniform_01<> > rand01;
+
+ const HapHedgeErr &hapHedge;
+ const std::vector <double> &cMcoords;
+ const double cMexpect;
+ const int histLength, beamWidth;
+ const float pErr;
+ const int maxHapPaths, maxHapPrefixes;
+ int tCur, curMod, nextMod;
+ int hapPathSizes[HAPWAVES_HIST];
+ HapPath *hapPaths[HAPWAVES_HIST]; // [max size = 2*beamWidth each]
+ int hapPrefixSizes[HAPWAVES_HIST];
+ HapPrefix *hapPrefixes[HAPWAVES_HIST]; // [max size = 2*beamWidth * histLength * 2 each]
+
+ public:
+
+ HapWaves(const HapHedgeErr &_hapHedge, const std::vector <double> &_cMcoords, double cMexpect,
+ int _histLength, int _beamWidth, float _logPerr, int _tCur);
+ ~HapWaves(void);
+
+ float recombP(int tCur, int tSplit) const;
+
+ // populate hapPrefixes[nextMod]
+ // populate toCumLogP[] in hapPaths[curMod] (but don't populate hapPaths[nextMod])
+ void computeAllExtensions(const std::vector <uchar> &nextPossibleBits);
+
+ float getToCumLogProb(int ind, int nextBit) const;
+
+ // look up/create extension of hapPaths[curMod][ind] in hapPaths[nextMod]
+ // return index in hapPaths[nextMod]
+ int extendPath(int ind, int nextBit);
+
+ void advance(void);
+
+ void sampleLastPrefix(int &tStart, HapTreeState &state, int t, int hapPathInd, int tBit);
+ };
+
+ struct DipTreeNode {
+ int from;
+ char unequalAnc, hapMat, hapPat;
+ uint64 histMat, histPat;
+ float boostLogP;
+ float logP;
+ int numErr;
+ int hapPathInds[2];
+ bool operator < (const DipTreeNode &dNode) const;
+ };
+
+ // constraint encoding:
+ const char OPP_CONSTRAINT = -2; // require no het err, i.e., 0|1 or 1|0
+ const char NO_CONSTRAINT = -1;
+ // relative phase contraints are encoded as (dist=num_splits_to_ref_het<<1)|(rel_phase)
+ // no hom err constraints (i.e., 0|0 at 0, 1|1 at 2) are encoded as 0 or 1 (i.e., dist=0 above)
+
+ class DipTree {
+
+ boost::lagged_fibonacci607 rng;
+ boost::variate_generator<boost::lagged_fibonacci607&, boost::uniform_01<> > rand01;
+
+ const HapHedgeErr &hapHedge;
+ HapWaves hapWaves;
+ const std::vector <uchar> &genos;
+ const char *constraints;
+ const int histLength, beamWidth;
+ const float logPerr;
+ int tCur; const int T;
+ std::vector < std::vector <DipTreeNode> > nodes;
+ std::vector < std::vector <float> > normProbs;
+
+ void traceNode(int t, int i);
+ void advance(void);
+
+ public:
+
+ DipTree(const HapHedgeErr &_hapHedge, const std::vector <uchar> &_genos,
+ const char *_constraints, const std::vector <double> &_cMcoords, double cMexpect,
+ int _histLength, int _beamWidth, float _logPerr, int _tCur);
+
+ // compute probability of AA at hets tCallLoc1 and tCallLoc2
+ float callProbAA(int tCallLoc1, int tCallLoc2, int callLength);
+ // compute diploid dosage at tCallLoc
+ float callDosage(int tCallLoc, int callLength);
+
+ std::vector <HapPair> sampleRefs(int tCallLoc, int callLength, int samples,
+ const std::vector <uint> &bestHaps, bool isFwd);
+ };
+
+}
+
+#endif
diff --git a/src/Eagle.cpp b/src/Eagle.cpp
new file mode 100644
index 0000000..45bb95c
--- /dev/null
+++ b/src/Eagle.cpp
@@ -0,0 +1,3576 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <vector>
+#include <string>
+#include <iostream>
+#include <iomanip>
+#include <fstream>
+#include <sstream>
+#include <map>
+#include <set>
+#include <unordered_map>
+#include <unordered_set>
+#include <queue>
+#include <utility>
+#include <numeric>
+#include <algorithm>
+#include <cstdio>
+#include <cstring>
+#include <cmath>
+
+#include "omp.h"
+
+#include <htslib/vcf.h>
+
+#include "Types.hpp"
+#include "FileUtils.hpp"
+#include "MemoryUtils.hpp"
+#include "NumericUtils.hpp"
+#include "StringUtils.hpp"
+#include "Timer.hpp"
+#include "HapHedge.hpp"
+#include "Version.hpp"
+#include "Eagle.hpp"
+
+//#define DETAILS
+
+namespace EAGLE {
+
+ using std::vector;
+ using std::string;
+ using std::pair;
+ using std::make_pair;
+ using std::cout;
+ using std::cerr;
+ using std::endl;
+ using std::max;
+ using std::min;
+
+ const double MEMO_UNSET = -1000;
+ const char noTrioInfo = '`';
+ const string trio1 = "\033[1;36mo\033[0m";
+ const string trio2 = "\033[1;31mx\033[0m";
+ const char IBDx2char = '_';
+ const char ROHchar = '=';
+ const char wrongChar = '@';
+ const char conflictChar = '#';
+
+ inline uint popcount64(uint64 i) {
+ i = i - ((i >> 1) & 0x5555555555555555);
+ i = (i & 0x3333333333333333) + ((i >> 2) & 0x3333333333333333);
+ i = (i + (i >> 4)) & 0xF0F0F0F0F0F0F0F;
+ return (i * 0x101010101010101) >> 56;
+ }
+
+ void Eagle::init() {
+ totTicks = 0; extTicks = 0; diphapTicks = 0; lshTicks = 0; lshCheckTicks = 0;
+ dpTicks = 0; dpStaticTicks = 0; dpSwitchTicks = 0; dpUpdateTicks = 0; dpSortTicks = 0;
+ dpUpdateCalls = 0; blipFixTicks = 0; blipPopTicks = 0; blipVoteTicks = 0; blipLshTicks = 0;
+
+ maskSnps64j = ALIGNED_MALLOC_UCHARS(Mseg64*64);
+ cMs64j = ALIGNED_MALLOC_DOUBLES(Mseg64*64+1);
+ double cMlast = 0;
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ for (uint64 j = 0; j < seg64cMvecs[m64].size(); j++) {
+ maskSnps64j[m64*64+j] = 1;
+ cMs64j[m64*64+j] = cMlast = seg64cMvecs[m64][j];
+ }
+ for (uint64 j = seg64cMvecs[m64].size(); j < 64; j++) {
+ maskSnps64j[m64*64+j] = 0;
+ cMs64j[m64*64+j] = cMlast;
+ }
+ }
+ cMs64j[Mseg64*64] = cMlast;
+
+ haploBits = ALIGNED_MALLOC_UINT64S(Mseg64*2*N);
+ haploBitsT = ALIGNED_MALLOC_UINT64S(2*N*Mseg64);
+ segConfs = ALIGNED_MALLOC_UCHARS(2*N*Mseg64);
+
+ maskIndivs = vector <uchar> (N, 1);
+
+ for (uint wrongBitsA = 0; wrongBitsA < (1U<<switchScoreLutBits); wrongBitsA++)
+ for (uint wrongBitsB = 0; wrongBitsB < (1U<<switchScoreLutBits); wrongBitsB++)
+ for (uint hetBits = 0; hetBits < (1U<<switchScoreLutBits); hetBits++) {
+ uint wrongHomBitsA = wrongBitsA & ~hetBits;
+ uint wrongHetBitsA = wrongBitsA & hetBits;
+ uint wrongHomBitsB = wrongBitsB & ~hetBits;
+ uint wrongHetBitsB = wrongBitsB & hetBits;
+ uint lutInd = (wrongBitsA<<(switchScoreLutBits+switchScoreLutBits))
+ | (wrongBitsB<<(switchScoreLutBits))
+ | hetBits;
+ char &minDiff = switchScoreLut[lutInd][0]; minDiff = 0;
+ char &cumDiff = switchScoreLut[lutInd][1]; cumDiff = 0;
+ for (uint k = 0; k < switchScoreLutBits; k++) {
+ cumDiff += ((wrongHomBitsA>>k)&1)*homErrCost + ((wrongHetBitsA>>k)&1)*hetErrCost
+ - ((wrongHomBitsB>>k)&1)*homErrCost - ((wrongHetBitsB>>k)&1)*hetErrCost;
+ if (cumDiff < minDiff) minDiff = cumDiff;
+ }
+ }
+ }
+
+ Eagle::Eagle(uint64 _N, uint64 _Mseg64, const uint64_masks *_genoBits,
+ vector < vector <double> > _seg64cMvecs, const AlleleFreqs *_seg64logPs,
+ vector <double> _invLD64j, const vector <IndivInfoX> &_indivs,
+ const vector <SnpInfoX> &_snps, const string &maskFile,
+ const vector <bool> &_isFlipped64j, double _pErr, int runStep2) :
+ N(_N), Nref(0), Mseg64(_Mseg64), genoBits(_genoBits), seg64cMvecs(_seg64cMvecs),
+ seg64logPs(_seg64logPs), invLD64j(_invLD64j), indivs(_indivs), snps(_snps),
+ isFlipped64j(_isFlipped64j), logPerr(log10(_pErr)) {
+
+ init();
+
+ if (runStep2) {
+ phaseConfs = ALIGNED_MALLOC_UCHARS(2*N*Mseg64*64);
+ phaseConfs2 = ALIGNED_MALLOC_UCHARS(2*N*Mseg64*64);
+ tmpHaploBitsT = NULL;
+ }
+ else {
+ phaseConfs = phaseConfs2 = NULL;
+ tmpHaploBitsT = ALIGNED_MALLOC_UINT64S(2*N*Mseg64);
+ memset(tmpHaploBitsT, 0, 2*N*Mseg64*sizeof(tmpHaploBitsT[0]));
+ }
+
+ if (!maskFile.empty()) {
+ int masked = 0;
+ vector < pair <string, string> > maskFidIids = FileUtils::readFidIids(maskFile);
+ std::set < pair < string, string> > maskSet(maskFidIids.begin(), maskFidIids.end());
+ for (uint64 n = 0; n < N; n++)
+ if (maskSet.count(make_pair(indivs[n].famID, indivs[n].indivID))) {
+ maskIndivs[n] = 0;
+ masked++;
+ }
+ cout << "Number of indivs masked: " << masked << endl;
+ }
+ }
+
+ // constructor for ref-mode
+ Eagle::Eagle(uint64 _Nref, uint64 _Ntarget, uint64 _Mseg64, const uint64_masks *_genoBits,
+ vector < vector <double> > _seg64cMvecs, double _pErr) :
+ N(_Nref+_Ntarget), Nref(_Nref), Mseg64(_Mseg64), genoBits(_genoBits),
+ seg64cMvecs(_seg64cMvecs), logPerr(log10(_pErr)) {
+
+ init();
+ isFlipped64j = vector <bool> (Mseg64*64); // no flipping in ref mode
+
+ phaseConfs = phaseConfs2 = NULL;
+ tmpHaploBitsT = ALIGNED_MALLOC_UINT64S(2*(N-Nref)*Mseg64);
+
+ memset(segConfs, 0, 2*N*Mseg64*sizeof(segConfs[0]));
+ for (uint64 nRef = 0; nRef < Nref; nRef++)
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) { // copy ref haploBits stored in genoBits
+ haploBits[m64*2*N + 2*nRef] = genoBits[m64*N + nRef].is0;
+ haploBits[m64*2*N + 2*nRef+1] = genoBits[m64*N + nRef].is2;
+ for (uint64 nHap = 2*nRef; nHap <= 2*nRef+1; nHap++)
+ haploBitsT[nHap*Mseg64 + m64] = haploBits[m64*2*N + nHap];
+ }
+ }
+
+ void Eagle::reallocLRPtoPBWT(void) { // non-ref mode transition: LRP iters 1-2 -> PBWT iters 3+
+ assert(phaseConfs != NULL);
+ ALIGNED_FREE(phaseConfs2); phaseConfs2 = NULL;
+ ALIGNED_FREE(phaseConfs); phaseConfs = NULL;
+
+ assert(tmpHaploBitsT == NULL);
+ tmpHaploBitsT = ALIGNED_MALLOC_UINT64S(2*N*Mseg64);
+ }
+
+ Eagle::~Eagle() {
+ ALIGNED_FREE(segConfs);
+ ALIGNED_FREE(haploBitsT);
+ ALIGNED_FREE(haploBits);
+ if (phaseConfs != NULL) {
+ ALIGNED_FREE(phaseConfs2);
+ ALIGNED_FREE(phaseConfs);
+ }
+ if (tmpHaploBitsT != NULL) {
+ ALIGNED_FREE(tmpHaploBitsT); // allocated only in ref-mode
+ }
+ ALIGNED_FREE(cMs64j);
+ ALIGNED_FREE(maskSnps64j);
+ }
+
+ inline uint getNonMissingGeno(const uint64_masks &bits, uint64 j) {
+ if (bits.is0 & (1ULL<<j)) return 0;
+ if (bits.is2 & (1ULL<<j)) return 2;
+ return 1; // assumed to be non-missing
+ }
+
+ inline uint bgetGeno0123(const uint64_masks &bits, uint64 j) {
+ if (bits.is0 & (1ULL<<j)) return 0;
+ if (bits.is2 & (1ULL<<j)) return 2;
+ if (bits.is9 & (1ULL<<j)) return 3;
+ return 1;
+ }
+
+ uint Eagle::getGeno0123(uint64 m64j, uint64 n) const {
+ return bgetGeno0123(genoBits[m64j/64*N + n], m64j&63);
+ }
+
+ void Eagle::retractMatch(uint n0, Match &match, double memoLogBF[][4]) const {
+ for (int dir = 0; dir < 2; dir++) {
+ double cumLogBF = 0;
+ while (cumLogBF < log10(4)) {
+ uint m64j;
+ if (dir == 0) m64j = match.m64jStart++;
+ else m64j = match.m64jEnd--;
+ cumLogBF += memoLogBF[m64j][getGeno0123(m64j, match.n)];
+ }
+ }
+ match.m64jStart--;
+ match.m64jEnd++;
+ }
+
+ Match Eagle::computeDuoLogBF(double memoLogBF[][4], double workLogBF[], uint64 n0, uint64 n1, uint64 m64cur) const {
+ //double snpsChecked = 1;
+ Match match(n1, m64cur*64, m64cur*64, 0);
+ workLogBF[m64cur*64] = 0;
+ for (int dir = 0; dir < 2; dir++) {
+ uint64 inc, m64start, m64end, jStart, jEnd;
+ if (dir == 0) {
+ inc = 1; m64start = m64cur; m64end = Mseg64; jStart = 0; jEnd = 64;
+ }
+ else {
+ inc = -1ULL; m64start = m64cur-1; m64end = -1ULL; jStart = 63; jEnd = -1ULL;
+ }
+ double maxLogBF = 0, curLogBF = 0;
+ for (uint64 m64 = m64start; m64 != m64end; m64 += inc) {
+ const uint64_masks &bits0 = genoBits[m64*N + n0], &bits1 = genoBits[m64*N + n1];
+ uint64 wrongBits = (bits0.is0 & bits1.is2) | (bits0.is2 & bits1.is0);
+ if (wrongBits & (wrongBits-1)) // 2+ wrong => fail
+ break;
+ uint64 missMask = bits0.is9 | bits1.is9;
+ for (uint64 j = jStart; j != jEnd; j += inc) {
+ uint geno1 = bgetGeno0123(bits1, j);
+ double logBFj = 0;
+ if (!(missMask & (1ULL<<j))) {
+ double &memoLogBFj = memoLogBF[m64*64+j][geno1];
+ if (memoLogBFj == MEMO_UNSET) {
+ uint geno0 = getNonMissingGeno(bits0, j);
+ double logP_geno1_null = seg64logPs[m64*64+j].cond[geno1][3];
+ double logP_geno1_duo = seg64logPs[m64*64+j].cond[geno1][geno0];
+ memoLogBFj = min(max((logP_geno1_duo - logP_geno1_null) * invLD64j[m64*64+j],
+ logPerr), -logPerr);
+ }
+ logBFj = memoLogBFj;
+ }
+ workLogBF[m64*64+j] = logBFj;
+ curLogBF += logBFj;
+ //if (logBFj != 0) snpsChecked += invLD64j[m64*64+j];
+ if (curLogBF > maxLogBF) {
+ maxLogBF = curLogBF;
+ if (inc == 1) match.m64jEnd = m64*64+j;
+ else match.m64jStart = m64*64+j;
+ }
+ }
+ }
+ match.logBF += maxLogBF;
+ }
+ double minLogBF = 0, curLogBF = 0;
+ for (uint64 m64j = match.m64jStart; m64j <= match.m64jEnd; m64j++) {
+ curLogBF += workLogBF[m64j];
+ if (curLogBF < minLogBF) {
+ minLogBF = curLogBF;
+ match.m64jStart = m64j+1;
+ while (!maskSnps64j[match.m64jStart]) match.m64jStart++;
+ }
+ }
+ match.logBF -= minLogBF;
+ //match.logBF -= log10(snpsChecked);
+ match.cMlenInit = cMs64j[match.m64jEnd] - cMs64j[match.m64jStart];
+ return match;
+ }
+
+ void Eagle::trim(Match &match, const Match &ref, uint64 n0, int orientation, uint64 trimStart,
+ int inc, double workLogBF[]) const {
+
+ uint64 n1 = match.n, n2 = ref.n;
+
+ // find IBDx2; store IBDx2 status in workLogBF (0 or 1) to compute probabilities accordingly
+ double IBDx2logBF = 0; uint64 IBDx2start = trimStart;
+ for (uint64 m64j = trimStart; m64j+1!=match.m64jStart && m64j!=match.m64jEnd+1; m64j += inc)
+ workLogBF[m64j] = 0; // initialize workLogBF to 0 (not IBDx2) in *MATCH* (n1)
+ // check for IBDx2 in *REF* (n2)
+ uint64 m64jLast; // last SNP to check: go beyond end of match to ensure detection of ref IBDx2
+ if (inc == 1)
+ m64jLast = min(match.m64jEnd + 50ULL, Mseg64*64-1);
+ else
+ m64jLast = max((int) match.m64jStart - 50, 0);
+ for (uint64 m64j = trimStart; m64j!=m64jLast+inc; m64j += inc) {
+ uint g0 = getGeno0123(m64j, n0);
+ uint g2 = getGeno0123(m64j, n2);
+ bool mismatch = false;
+ if (g0 != 3 && g2 != 3) {
+ if (g0 == g2)
+ IBDx2logBF += seg64logPs[m64j].cond[g2][g0] * invLD64j[m64j];
+ else
+ mismatch = true;
+ }
+ if (mismatch || m64j==m64jLast) { // end of IBDx2 segment
+ if (IBDx2logBF < -1) { // 10:1 IBDx2
+#ifdef VERBOSE
+ printf("IBDx2 detected in %d: %.1f-%.1f (%d SNPs)\n", (int) n2, cMs64j[IBDx2start], cMs64j[m64j], (int) (m64j-IBDx2start));
+#endif
+ for (uint64 m64j2 = IBDx2start; m64j2 != m64j; m64j2 += inc)
+ workLogBF[m64j2] = 1;
+ }
+ IBDx2logBF = 0; // reset
+ IBDx2start = m64j+inc;
+ }
+ }
+
+ double maxLogBF = 0, curLogBF = 0; uint64 m64jBest = trimStart;
+ for (uint64 m64j = trimStart; m64j+1!=match.m64jStart && m64j!=match.m64jEnd+1; m64j += inc) {
+ double logBF = 0;
+ uint g0 = getGeno0123(m64j, n0);
+ uint g1 = getGeno0123(m64j, n1);
+ uint g2 = getGeno0123(m64j, n2);
+ if (g0 != 3 && g1 != 3) {
+ uint g0eff = g0;
+ if (g0 == 1 && ref.m64jStart <= m64j && m64j <= ref.m64jEnd && g2 != 3) {
+ // n0 het and n2 (ref) match info available
+ if (g2 != 1) { // n2 hom
+ if (orientation == 1)
+ g0eff = g2; // treat g0 as n2 hom
+ else
+ g0eff = 2-g2; // treat g0 as opp n2 hom
+ }
+ else if (workLogBF[m64j] == 0) { // n0 and n2 both hets and not IBDx2
+ if (orientation == 1)
+ g0eff = 4; // same orientation as het-het => p(hap=1) = 1-p
+ else
+ g0eff = 5; // opp orientation to het-het => p(hap=1) = p
+ }
+ }
+ double logP_geno1_null = seg64logPs[m64j].cond[g1][3];
+ double logP_geno1_duo = seg64logPs[m64j].cond[g1][g0eff];
+ logBF = min(max((logP_geno1_duo - logP_geno1_null) * invLD64j[m64j],
+ logPerr), -logPerr);
+ }
+ curLogBF += logBF;
+ if (curLogBF > maxLogBF) {
+ maxLogBF = curLogBF;
+ m64jBest = m64j;
+ }
+ workLogBF[m64j] = curLogBF;
+ }
+
+ uint64 m64jTrim = m64jBest;
+ if (inc == 1) match.m64jEnd = m64jTrim;
+ else match.m64jStart = m64jTrim;
+
+ // conservative trimming: backtrack to 10x higher prob
+ uint m64jTrimCons = trimStart;
+ for (uint64 m64j = trimStart; m64j != m64jBest; m64j += inc)
+ if (workLogBF[m64j] < maxLogBF - log10(10))
+ m64jTrimCons = m64j;
+ if (inc == 1) match.m64jEndCons = std::min(match.m64jEndCons, m64jTrimCons);
+ else match.m64jStartCons = std::max(match.m64jStartCons, m64jTrimCons);
+ }
+
+ vector <int> searchSigns(const vector <Match> &matches, const vector < vector <uint> > &sameEdges, const vector < vector <uint> > &oppEdges, const vector <bool> &kept) {
+ // process left to right so that when sign choice is arbitrary, adjacent matches have same sign
+ uint V = matches.size();
+ vector <int> signs(V);
+ vector < pair <uint, uint> > order(V);
+ for (uint v = 0; v < V; v++)
+ order[v] = make_pair(matches[v].m64jStart, v);
+ sort(order.begin(), order.end());
+ uint lastEnd = 0; int lastSign = 1; // sign of farthest-right match seen so far
+ std::queue <uint> q;
+ for (uint i = 0; i < V; i++) {
+ uint v = order[i].second;
+ if (!kept[v]) continue; // not used
+ if (signs[v]) continue; // already visited
+ signs[v] = lastSign;
+ q.push(v);
+ while (!q.empty()) {
+ uint u = q.front(); q.pop();
+ for (uint i = 0; i < sameEdges[u].size(); i++) {
+ uint w = sameEdges[u][i];
+ if (!kept[w]) continue;
+ if (signs[w]) {
+ if (signs[w] != signs[u])
+ return vector <int> ();
+ }
+ else {
+ signs[w] = signs[u];
+ q.push(w);
+ if (matches[w].m64jEnd > lastEnd) {
+ lastEnd = matches[w].m64jEnd;
+ lastSign = signs[w];
+ }
+ }
+ }
+ for (uint i = 0; i < oppEdges[u].size(); i++) {
+ uint w = oppEdges[u][i];
+ if (!kept[w]) continue;
+ if (signs[w]) {
+ if (signs[w] == signs[u])
+ return vector <int> ();
+ }
+ else {
+ signs[w] = -signs[u];
+ q.push(w);
+ if (matches[w].m64jEnd > lastEnd) {
+ lastEnd = matches[w].m64jEnd;
+ lastSign = signs[w];
+ }
+ }
+ }
+ }
+ }
+ return signs;
+ }
+
+ void updateVote(int &votesCur, int votesThresh, int vote) {
+ if (abs(votesCur) >= votesThresh) return;
+ votesCur += vote;
+ }
+
+ void Eagle::computePhaseConfs(uint64 n0, const vector <Match> &matches,
+ const vector <int> &signs, bool cons) {
+
+ vector < vector <int> > votes(2, vector <int> (Mseg64*64));
+ vector <int> votesThresh(Mseg64*64);
+ const int votesMax = 1000000;
+
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++) {
+ if (maskSnps64j[m64j]) {
+ votesThresh[m64j] = // 2 / log10((1-p)/p) = number of votes needed to get >100:1 odds
+ (int) (2 / fabs(seg64logPs[m64j].cond[0][4] - seg64logPs[m64j].cond[0][5])) + 1;
+ if (!(votesThresh[m64j] < votesMax)) votesThresh[m64j] = votesMax;
+ uint g0 = getGeno0123(m64j, n0);
+ if (g0 == 0) { votes[0][m64j] = votes[1][m64j] = -votesMax; }
+ else if (g0 == 2) { votes[0][m64j] = votes[1][m64j] = votesMax; }
+ else if (g0 == 3) { votes[0][m64j] = votes[1][m64j] = -1; } // missing: default P(1) = p
+ }
+ }
+
+ vector <uchar> isIBDx2(Mseg64*64);
+
+ for (uint i = 0; i < matches.size(); i++) {
+ if (!signs[i]) continue;
+ uint64 start, end;
+ if (cons) {
+ start = std::max(matches[i].m64jStartCons, matches[i].m64jStart);
+ end = std::min(matches[i].m64jEndCons, matches[i].m64jEnd);
+ }
+ else {
+ start = matches[i].m64jStart;
+ end = matches[i].m64jEnd;
+ }
+#ifdef VERBOSE
+ printf("match %d (%.1f-%.1f)\n", (int) i, cMs64j[start], cMs64j[end]);
+#endif
+
+ // find IBDx2 regions
+ vector < pair <uint64, uint64> > IBDx2regions;
+ uint64 m64jFirst = max((int) start - 50, 0); // go beyond ends to detect overhanging IBDx2
+ uint64 m64jLast = min(end + 50ULL, Mseg64*64-1);
+ double IBDx2logBF = 0; uint64 IBDx2start = m64jFirst;
+ for (uint64 m64j = m64jFirst; m64j <= m64jLast; m64j++) {
+ uint g0 = getGeno0123(m64j, n0);
+ uint g1 = getGeno0123(m64j, matches[i].n);
+ bool mismatch = false;
+ if (g0 != 3 && g1 != 3) {
+ if (g0 == g1)
+ IBDx2logBF += seg64logPs[m64j].cond[g1][g0] * invLD64j[m64j];
+ else
+ mismatch = true;
+ }
+ if (mismatch || m64j==m64jLast) { // end of IBDx2 segment
+ if (IBDx2logBF < -1) { // 10:1 IBDx2
+ IBDx2regions.push_back(make_pair(IBDx2start, m64j));
+#ifdef VERBOSE
+ printf("IBDx2 detected in %d: %.1f-%.1f (%d SNPs)\n", (int) matches[i].n, cMs64j[IBDx2start], cMs64j[m64j], (int) (m64j-IBDx2start));
+#endif
+ }
+ IBDx2logBF = 0; // reset
+ IBDx2start = m64j+1;
+ }
+ }
+
+ for (uint r = 0; r < IBDx2regions.size(); r++) // set IBDx2 region flags
+ memset(&isIBDx2[IBDx2regions[r].first], 1, IBDx2regions[r].second-IBDx2regions[r].first);
+
+ // accumulate votes
+ for (uint64 m64j = start; m64j <= end; m64j++) {
+ if (!maskSnps64j[m64j]) continue;
+ uint g0 = getGeno0123(m64j, n0);
+ uint g1 = getGeno0123(m64j, matches[i].n);
+
+ int vote = 0;
+ if (g1 == 0 || g1 == 2) // n1 hom: IBDx2 status irrelevant; phase determined (votesMax)
+ vote = (g1-1)*votesMax*2; // super strong vote (overrides any previous small votes)
+ else if (g1 == 1 && !isIBDx2[m64j]) // n1 het and not IBDx2; weak phase info
+ vote = 1;
+
+ if (vote) {
+ int q = (signs[i] == 1);
+ updateVote(votes[q][m64j], votesThresh[m64j], vote);
+ if (g0 == 1) // n0 het: pass info to opp chromosome
+ updateVote(votes[!q][m64j], votesThresh[m64j], -vote);
+ }
+ }
+
+ for (uint r = 0; r < IBDx2regions.size(); r++) // unset IBDx2 region flags
+ memset(&isIBDx2[IBDx2regions[r].first], 0, IBDx2regions[r].second-IBDx2regions[r].first);
+ }
+
+ // fast rng: last 16 bits of Marsaglia's MWC
+ uint w = 521288629;
+ if (phaseConfs != NULL) { // need to make hard calls here
+ for (uint i = 0; i < (n0 & 0xff); i++)
+ w=18000*(w&65535)+(w>>16);
+ }
+
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++) {
+ if (maskSnps64j[m64j]) {
+ for (uint64 q = 0; q <= 1; q++) {
+ double phaseConf;
+ if (votes[q][m64j] >= votesMax)
+ phaseConf = 1;
+ else if (votes[q][m64j] <= -votesMax)
+ phaseConf = 0;
+ else {
+ double OR = pow(10.0, fabs(seg64logPs[m64j].cond[0][4] - seg64logPs[m64j].cond[0][5])
+ * votes[q][m64j]);
+ phaseConf = OR / (1 + OR);
+ }
+ if (phaseConfs != NULL)
+ phaseConfs[(2*n0+q)*Mseg64*64 + m64j] = (uchar) (phaseConf * 255);
+ else {
+ uchar uPhaseConf = (uchar) (phaseConf * 255);
+ if (uPhaseConf == (uchar) 255 || ((w=18000*(w&65535)+(w>>16))&255) < uPhaseConf)
+ tmpHaploBitsT[(2*n0+q)*Mseg64 + (m64j/64)] |= 1ULL<<(m64j&63);
+ }
+ }
+ }
+ else {
+ if (phaseConfs != NULL)
+ phaseConfs[2*n0*Mseg64*64 + m64j] = phaseConfs[(2*n0+1)*Mseg64*64 + m64j] = 0;
+ }
+ }
+ }
+
+ vector <int> Eagle::trioRelPhase(uint64 n0, uint64 nF1, uint64 nF2) const {
+
+ bool isParent = false;
+ if (((int) nF1) < 0) { // nF1 is the child; n0 is a parent
+ isParent = true;
+ nF1 = -nF1;
+ }
+
+ vector <int> trioPhaseVec;
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++) {
+ if (!maskSnps64j[m64j]) continue;
+ uint64 m64cur = m64j/64; uint64 j = m64j&63;
+ const uint64_masks &bits0 = genoBits[m64cur*N + n0];
+ if ((bits0.is0|bits0.is2|bits0.is9)&(1ULL<<j)) continue; // not het
+ const uint64_masks &bitsF1 = genoBits[m64cur*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64cur*N + nF2];
+ int trioPhase = 0;
+ if (!isParent) {
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase++;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase--;
+ }
+ else { // n0 is a parent; nF1 is the child
+ int g0 = getGeno0123(m64j, nF1); // child
+ int g2 = getGeno0123(m64j, nF2); // other parent
+ if (g0+g2 != 2) { // not Mendel error or triple het
+ if (g0 == 0) trioPhase = -1;
+ if (g0 == 2) trioPhase = 1;
+ if (g0 == 1) { // child is a het
+ if (g2 == 0) trioPhase = 1;
+ if (g2 == 2) trioPhase = -1;
+ }
+ }
+ }
+ trioPhaseVec.push_back(trioPhase); // 0 => unknown; +/-1 => pat/mat
+ if (trioPhase == 0) continue;
+ }
+ vector <int> trioRelPhaseVec(trioPhaseVec.size()-1);
+ for (uint i = 1; i < trioPhaseVec.size(); i++)
+ trioRelPhaseVec[i-1] = (trioPhaseVec[i-1]==0 || trioPhaseVec[i]==0) ? -1 :
+ (trioPhaseVec[i-1]==trioPhaseVec[i] ? 0 : 1); // -1 => unknown; 0 => same; 1 => opp
+ return trioRelPhaseVec;
+ }
+
+ void Eagle::checkPhase(uint64 n0, uint64 nF1, uint64 nF2, double thresh) const {
+ cout << "checking at thresh=" << thresh << ": ";
+
+ double lastPhased = cMs64j[0];
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++) {
+ if (!maskSnps64j[m64j]) continue;
+ uint64 m64cur = m64j/64; uint64 j = m64j&63;
+ const uint64_masks &bits0 = genoBits[m64cur*N + n0];
+ if ((bits0.is0|bits0.is2)&(1ULL<<j)) continue; // hom
+ const uint64_masks &bitsF1 = genoBits[m64cur*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64cur*N + nF2];
+ int trioPhase = 0;
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase++;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase--;
+ if (!trioPhase) continue;
+ double phaseConf = phaseConfs[2*n0*Mseg64*64 + m64j] / 255.0;
+ if (std::min(phaseConf, 1-phaseConf) <= thresh) {
+ double cM = cMs64j[m64j];
+ for (int tick = (int) (10*lastPhased) + 1; tick < 10*cM; tick++) {
+ if (tick % 10 == 0) cout << StringUtils::itos(tick/10); //(char) ('0' + (tick/10)%10);
+ else cout << '-';
+ }
+ if ((phaseConf < 0.5) == (trioPhase == 1))
+ cout << trio1;
+ else
+ cout << trio2;
+ lastPhased = cM;
+ }
+ else
+ cout << '?';
+ }
+ cout << endl;
+ }
+
+ vector <bool> Eagle::checkPhaseConfsPhase(uint64 n0, uint64 nF1, uint64 nF2) const {
+ vector <bool> ret;
+ int lastPhased64j = -1;
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++) {
+ if (!maskSnps64j[m64j]) continue;
+ uint64 m64cur = m64j/64; uint64 j = m64j&63;
+ const uint64_masks &bits0 = genoBits[m64cur*N + n0];
+ if ((bits0.is0|bits0.is2|bits0.is9)&(1ULL<<j)) continue; // hom
+ const uint64_masks &bitsF1 = genoBits[m64cur*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64cur*N + nF2];
+ int trioPhase = 0;
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase++;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase--;
+ if (!trioPhase) continue;
+ int hapBit = (int) phaseConfs[2*n0*Mseg64*64 + m64j] >= 128;
+ bool phase = hapBit == (trioPhase == 1);
+ if (lastPhased64j != -1 && ret.back() != phase)
+ printf(" %.2f", (cMs64j[lastPhased64j] + cMs64j[m64j]) / 2);
+ lastPhased64j = m64j;
+ ret.push_back(phase);
+ }
+ cout << endl;
+ return ret;
+ }
+
+ void Eagle::checkHapPhase(uint64 n0, uint64 nF1, uint64 nF2, const uint64 curHaploBitsT[],
+ uint64 m64, uint64 side, vector < vector <int> > votes) const {
+ if ((int) nF1 == -1) return;
+ for (uint64 m64j = (m64-side)*64; m64j < (m64+side+1)*64; m64j++) {
+ if (!maskSnps64j[m64j]) continue;
+ uint64 m64cur = m64j/64; uint64 j = m64j&63;
+ const uint64_masks &bits0 = genoBits[m64cur*N + n0];
+ if ((bits0.is0|bits0.is2|bits0.is9)&(1ULL<<j)) continue; // not het
+ const uint64_masks &bitsF1 = genoBits[m64cur*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64cur*N + nF2];
+ int trioPhase = 0;
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase++;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase--;
+ if (!trioPhase) continue;
+ //if (((haploBits[m64cur*2*N + n1hap]>>j)&1) == (trioPhase == 1))
+ //if (((haploBitsT[n1hap*Mseg64 + m64cur]>>j)&1) == (trioPhase == 1))
+ if (((curHaploBitsT[m64cur]>>j)&1) == (trioPhase == 1))
+ cout << trio1;
+ else
+ cout << trio2;
+ if (!votes.empty())
+ cout << "[" << votes[j][(curHaploBitsT[m64cur]>>j)&1] << "|" << votes[j][!((curHaploBitsT[m64cur]>>j)&1)] << ";" << votes[j][((curHaploBitsT[m64cur]>>j)&1)+2] << "|" << votes[j][!((curHaploBitsT[m64cur]>>j)&1)+2] << "]";
+ }
+ cout << endl;
+ }
+
+ vector <bool> Eagle::checkHapPhase1(uint64 n0, uint64 nF1, uint64 nF2, uint64 n1hap,
+ uint64 m64start, uint64 m64end, int sign) const {
+ vector <bool> ret;
+ if ((int) nF1 == -1) return ret;
+ cout << "n1hap = " << n1hap << "; m64 = [" << m64start << "," << m64end << "): ";
+ for (uint64 m64j = m64start*64; m64j < m64end*64; m64j++) {
+ if (m64j != m64start*64 && (m64j&63)==0) cout << "|";
+ if (!maskSnps64j[m64j]) continue;
+ uint64 m64cur = m64j/64; uint64 j = m64j&63;
+ const uint64_masks &bits0 = genoBits[m64cur*N + n0];
+ if ((bits0.is0|bits0.is2|bits0.is9)&(1ULL<<j)) continue; // not het
+ const uint64_masks &bitsF1 = genoBits[m64cur*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64cur*N + nF2];
+ int trioPhase = 0;
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase++;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase--;
+ if (!trioPhase) continue;
+ if (((haploBits[m64cur*2*N + n1hap]>>j)&1) == (trioPhase == sign)) {
+ cout << trio1;
+ ret.push_back(0);
+ }
+ else {
+ cout << trio2;
+ ret.push_back(1);
+ }
+ }
+ cout << endl;
+ return ret;
+ }
+
+ vector <bool> Eagle::checkHapPhase1j(uint64 n0, uint64 nF1, uint64 nF2, uint64 n1hap,
+ uint64 m64jStart, uint64 m64jEnd, int sign) const {
+ vector <bool> ret;
+ if ((int) nF1 == -1) return ret;
+ //cout << "n1hap = " << n1hap << "; m64 = [" << m64start << "," << m64end << "): ";
+ for (uint64 m64j = m64jStart; m64j < m64jEnd; m64j++) {
+ if (m64j != m64jStart && (m64j&63)==0) cout << m64j/64;//"|";
+ if (!maskSnps64j[m64j]) continue;
+ uint64 m64cur = m64j/64; uint64 j = m64j&63;
+ const uint64_masks &bits0 = genoBits[m64cur*N + n0];
+ if ((bits0.is0|bits0.is2|bits0.is9)&(1ULL<<j)) continue; // not het
+ const uint64_masks &bitsF1 = genoBits[m64cur*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64cur*N + nF2];
+ int trioPhase = 0;
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase++;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase--;
+ if (!trioPhase) continue;
+ if (((haploBits[m64cur*2*N + n1hap]>>j)&1) == (trioPhase == sign)) {
+ cout << trio1;
+ ret.push_back(0);
+ }
+ else {
+ cout << trio2;
+ ret.push_back(1);
+ }
+ }
+ //cout << endl;
+ return ret;
+ }
+
+ vector <bool> Eagle::checkHapPhase1jCall(uint64 n0, uint64 nF1, uint64 nF2, uint64 callBitsT[],
+ uint64 m64jStart, uint64 m64jEnd, bool print, int sign) const {
+ vector <bool> ret;
+ if ((int) nF1 == -1) return ret;
+ for (uint64 m64j = m64jStart; m64j < m64jEnd; m64j++) {
+ if (m64j != m64jStart && (m64j&63)==0)
+ if (print) cout << m64j/64;//"|";
+ if (!maskSnps64j[m64j]) continue;
+ uint64 m64cur = m64j/64; uint64 j = m64j&63;
+ const uint64_masks &bits0 = genoBits[m64cur*N + n0];
+ if ((bits0.is0|bits0.is2|bits0.is9)&(1ULL<<j)) continue; // not het
+ const uint64_masks &bitsF1 = genoBits[m64cur*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64cur*N + nF2];
+ int trioPhase = 0;
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase++;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase--;
+ if (!trioPhase) continue;
+ if (((callBitsT[m64cur]>>j)&1) == (trioPhase == sign)) {
+ if (print) cout << trio1;
+ ret.push_back(0);
+ }
+ else {
+ if (print) cout << trio2;
+ ret.push_back(1);
+ }
+ }
+ if (print) cout << endl;
+ return ret;
+ }
+
+ int Eagle::checkHapPhase2(uint64 n0, uint64 nF1, uint64 nF2, uint64 n1hap,
+ uint64 n2hapA, uint64 n2hapB, uint64 m64, int sign) const {
+ vector <bool> ret;
+ if ((int) nF1 == -1) return 0;/*ret*/;
+
+ uint64 n1is1 = haploBitsT[n1hap*Mseg64 + m64];
+ uint64 n2is1A = haploBitsT[n2hapA*Mseg64 + m64];
+ uint64 n2is1B = haploBitsT[n2hapB*Mseg64 + m64];
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ uint64 wrongHomBitsA = (bits0.is0 & (n1is1 | n2is1A)) | (bits0.is2 & ~(n1is1 & n2is1A));
+ uint64 wrongHetBitsA = (~(bits0.is0|bits0.is2|bits0.is9) & ~(n1is1 ^ n2is1A));
+ uint64 wrongHomBitsB = (bits0.is0 & (n1is1 | n2is1B)) | (bits0.is2 & ~(n1is1 & n2is1B));
+ uint64 wrongHetBitsB = (~(bits0.is0|bits0.is2|bits0.is9) & ~(n1is1 ^ n2is1B));
+ uint score = popcount64(wrongHomBitsB)*homErrCost
+ + popcount64(wrongHetBitsB)*hetErrCost;
+ uint minScore = score; uint64 kSwitch = 0;
+ for (uint64 k = 0; k < 64; k++) {
+ score += ((wrongHomBitsA>>k)&1)*homErrCost + ((wrongHetBitsA>>k)&1)*hetErrCost
+ - ((wrongHomBitsB>>k)&1)*homErrCost - ((wrongHetBitsB>>k)&1)*hetErrCost;
+ if (score < minScore) {
+ minScore = score;
+ kSwitch = k+1;
+ }
+ }
+
+ cout << "m64 = " << m64 << ": (" << n1hap << "," << n2hapA;
+ if (n2hapA != n2hapB) cout << "-" << n2hapB;
+ cout << ") score = " << minScore << " ";
+ cout << " conf = " << (int) segConfs[n1hap*Mseg64+m64] << "," << (int) segConfs[n2hapA*Mseg64+m64];
+ if (n2hapA != n2hapB) cout << "-" << (int) segConfs[n2hapB*Mseg64+m64];
+ cout << " ";
+
+ for (uint64 j = 0; j < 64; j++) {
+ uint64 m64j = m64*64+j;
+ if (!maskSnps64j[m64j]) continue;
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ if ((bits0.is0|bits0.is2|bits0.is9)&(1ULL<<j)) continue; // not het
+ const uint64_masks &bitsF1 = genoBits[m64*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64*N + nF2];
+ int trioPhase = 0;
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase++;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase--;
+ if (!trioPhase) continue;
+ bool phase;
+ if (((haploBits[m64*2*N + n1hap]>>j)&1) == (trioPhase == sign))
+ phase = 0;
+ else
+ phase = 1;
+ bool hetErr =
+ ((haploBits[m64*2*N + n1hap]>>j)&1) ==
+ ((haploBits[m64*2*N + (j<kSwitch?n2hapA:n2hapB)]>>j)&1);
+ uchar conf1 = 0, conf2 = 0;
+ if (hetErr) {
+ conf1 = phaseConfs[n1hap*Mseg64*64 + m64j];
+ conf2 = phaseConfs[(j<kSwitch?n2hapA:n2hapB)*Mseg64*64 + m64j];
+ if (min((int) conf2, 255-conf2) < min((int) conf1, 255-conf1))
+ phase = !phase;
+ }
+ cout << (phase==0?trio1:trio2);
+ if (hetErr) cout << "?" << "[" << (int) conf1 << "|" << (int) conf2 << "]";
+ ret.push_back(phase);
+ }
+ //cout << endl;
+ return minScore/*ret*/;
+ }
+
+ vector <bool> Eagle::checkHaploBits(uint64 n0, uint64 nF1, uint64 nF2, uint64 hapBits,
+ uint64 m64, int pad) const {
+ vector <bool> ret;
+ if ((int) nF1 == -1) return ret;
+
+ bool isParent = false;
+ if (((int) nF1) < 0) { // nF1 is the child; n0 is a parent
+ isParent = true;
+ nF1 = -nF1;
+ }
+
+ int printed = 0;
+ for (uint64 j = 0; j < 64; j++) {
+ uint64 m64j = m64*64+j;
+ if (!maskSnps64j[m64j]) continue;
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ if ((bits0.is0|bits0.is2|bits0.is9)&(1ULL<<j)) continue; // not het
+ const uint64_masks &bitsF1 = genoBits[m64*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64*N + nF2];
+ int trioPhase = 0;
+ if (!isParent) {
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase++;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase--;
+ }
+ else { // n0 is a parent; nF1 is the child
+ int g0 = getGeno0123(m64j, nF1); // child
+ int g2 = getGeno0123(m64j, nF2); // other parent
+ if (g0+g2 != 2) { // not Mendel error or triple het
+ if (g0 == 0) trioPhase = -1;
+ if (g0 == 2) trioPhase = 1;
+ if (g0 == 1) { // child is a het
+ if (g2 == 0) trioPhase = 1;
+ if (g2 == 2) trioPhase = -1;
+ }
+ }
+ }
+ if (!trioPhase) continue;
+ if (pad >= 0) {
+ if (((hapBits>>j)&1) == (trioPhase == 1))
+ cout << trio1;
+ else
+ cout << trio2;
+ printed++;
+ }
+ ret.push_back(((hapBits>>j)&1) == (trioPhase == 1));
+ }
+ while (printed < pad) { cout << " "; printed++; }
+ return ret;
+ }
+
+ pair <uint64, uint64> Eagle::phaseSegHMM(uint64 n0, uint64 n1hap, uint64 n2hapA, uint64 n2hapB,
+ uint64 m64, uint64 &hetErrMask) const {
+ uint64 n1is1 = haploBitsT[n1hap*Mseg64 + m64];
+ uint64 n2is1A = haploBitsT[n2hapA*Mseg64 + m64];
+ uint64 n2is1B = haploBitsT[n2hapB*Mseg64 + m64];
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ uint64 wrongHomBitsA = (bits0.is0 & (n1is1 | n2is1A)) | (bits0.is2 & ~(n1is1 & n2is1A));
+ uint64 wrongHetBitsA = (~(bits0.is0|bits0.is2|bits0.is9) & ~(n1is1 ^ n2is1A));
+ uint64 wrongHomBitsB = (bits0.is0 & (n1is1 | n2is1B)) | (bits0.is2 & ~(n1is1 & n2is1B));
+ uint64 wrongHetBitsB = (~(bits0.is0|bits0.is2|bits0.is9) & ~(n1is1 ^ n2is1B));
+ uint score = popcount64(wrongHomBitsB)*homErrCost
+ + popcount64(wrongHetBitsB)*hetErrCost;
+ uint minScore = score;
+ double cMdiffMax = m64==0 ? 0.0 : cMs64j[m64*64] - cMs64j[m64*64-1];
+ uint64 kSwitch = 0;
+ uint64 kSeg = seg64cMvecs[m64].size();
+ for (uint64 k = 0; k < kSeg; k++) {
+ score += ((wrongHomBitsA>>k)&1)*homErrCost + ((wrongHetBitsA>>k)&1)*hetErrCost
+ - ((wrongHomBitsB>>k)&1)*homErrCost - ((wrongHetBitsB>>k)&1)*hetErrCost;
+ double cMdiff = (k+1==kSeg ? cMs64j[(m64+1)*64] : cMs64j[m64*64+k+1]) - cMs64j[m64*64+k];
+ if (score < minScore || (score == minScore && cMdiff > cMdiffMax)) {
+ minScore = score;
+ cMdiffMax = cMdiff;
+ kSwitch = k+1;
+ }
+ }
+
+ uint64 phaseBits1 = 0, phaseBits2 = 0; hetErrMask = 0;
+ for (uint64 j = 0; j < 64; j++) {
+ uint64 m64j = m64*64+j;
+ if (!maskSnps64j[m64j]) continue;
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ if (bits0.is0&(1ULL<<j)) // dip = 0: hap1 = hap2 = 0
+ ;
+ else if (bits0.is2&(1ULL<<j)) { // dip = 2: hap1 = hap2 = 1
+ phaseBits1 |= 1ULL<<j;
+ phaseBits2 |= 1ULL<<j;
+ }
+ else {
+ uint64 phase1 = (haploBits[m64*2*N + n1hap]>>j)&1;
+ uint64 phase2 = ((haploBits[m64*2*N + (j<kSwitch?n2hapA:n2hapB)]>>j)&1);
+ if (bits0.is9&(1ULL<<j)) { // missing
+ phaseBits1 |= phase1<<j;
+ phaseBits2 |= phase2<<j;
+ }
+ else { // het
+ bool phase = phase1;
+ bool hetErr = phase1 == phase2;
+ if (hetErr) {
+ hetErrMask |= 1ULL<<j;
+ if (Nref == 0) {
+ uchar conf1 = phaseConfs[n1hap*Mseg64*64 + m64j];
+ uchar conf2 = phaseConfs[(j<kSwitch?n2hapA:n2hapB)*Mseg64*64 + m64j];
+ if (min((int) conf2, 255-conf2) < min((int) conf1, 255-conf1))
+ phase = !phase;
+ }
+ }
+ phaseBits1 |= ((uint64) phase)<<j;
+ phaseBits2 |= ((uint64) !phase)<<j;
+ }
+ }
+ }
+ return make_pair(phaseBits1, phaseBits2);
+ }
+
+ vector <bool> Eagle::checkSegPhase(uint64 n0, uint64 nF1, uint64 nF2, uint64 n1hap, uint64 n2hap,
+ int sign, uint64 m64) const {
+ vector <bool> ret;
+ for (uint64 j = 0; j < 64; j++) {
+ uint64 m64j = m64*64+j;
+ if (!maskSnps64j[m64j]) continue;
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ if ((bits0.is0|bits0.is2|bits0.is9)&(1ULL<<j)) continue; // not het
+ const uint64_masks &bitsF1 = genoBits[m64*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64*N + nF2];
+ int trioPhase = 0;
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase++;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase--;
+ if (!trioPhase) continue;
+ /*
+ int hapBit1 = (haploBits[m64*2*N + n1hap]>>j)&1;
+ if (sign == -1) hapBit1 = 1-hapBit1;
+ int hapBit2 = (haploBits[m64*2*N + n2hap]>>j)&1;
+ if (sign == 1) hapBit2 = 1-hapBit2;
+ uchar conf1 = phaseConfs[n1hap*Mseg64*64 + m64j];
+ uchar conf2 = phaseConfs[n2hap*Mseg64*64 + m64j];
+ //if (hapBit1 != hapBit2) cout << "?[" << (uint) conf1 << "|" << (uint) conf2 << "]";
+
+ int hapBitFinal = hapBit1;
+ if (!(conf1 == 0 || conf1 == 255) && (conf2 == 0 || conf2 == 255)) hapBitFinal = hapBit2;
+ */
+ int hapBitFinal = (int) phaseConfs2[2*n0*Mseg64*64 + m64j] >= 128;
+ if (hapBitFinal == (trioPhase == 1)) {
+ cout << trio1;
+ ret.push_back(0);
+ }
+ else {
+ cout << trio2;
+ ret.push_back(1);
+ }
+
+ }
+ cout << endl;
+ return ret;
+ }
+
+ void Eagle::computeSegPhaseConfs(uint64 n0, uint64 n1hap, uint64 n2hap, int sign, uint64 m64,
+ int err) {
+ const double maxOffsetFrac = 0.25, offsetFrac = ((n0&63)-31.5)/31.5 * maxOffsetFrac;
+ uint64 m64jStart, m64jEnd;
+ if (offsetFrac >= 0) {
+ m64jStart = m64==0 ? 0 : m64*64 + (uint) (offsetFrac * seg64cMvecs[m64].size());
+ m64jEnd = (m64+1)*64 + (m64+1==Mseg64 ? 0 : (uint) (offsetFrac * seg64cMvecs[m64+1].size()));
+ }
+ else {
+ m64jStart = m64==0 ? 0 : (m64-1)*64 + (uint) ((1+offsetFrac) * seg64cMvecs[m64-1].size());
+ m64jEnd = m64*64 + (m64+1==Mseg64 ? 64 : (uint) ((1+offsetFrac) * seg64cMvecs[m64].size()));
+ }
+ /*
+ cout << "m64 = " << m64 << ": " << m64jStart/64 << "." << (m64jStart&63) << " - "
+ << m64jEnd/64 << "." << (m64jEnd&63) << endl;
+ */
+ int cropErr = max(err-1, 0);
+ uchar hetConfs[2]; hetConfs[0] = (uchar) cropErr; hetConfs[1] = (uchar) (255-cropErr);
+ for (uint64 m64j = m64jStart; m64j < m64jEnd; m64j++) {
+ if (maskSnps64j[m64j]) {
+ uint64 m64cur = m64j/64; uint64 j = m64j&63;
+ uint g0 = getGeno0123(m64j, n0);
+ if (g0 == 0)
+ phaseConfs2[2*n0*Mseg64*64 + m64j] = phaseConfs2[(2*n0+1)*Mseg64*64 + m64j] = 0;
+ else if (g0 == 2)
+ phaseConfs2[2*n0*Mseg64*64 + m64j] = phaseConfs2[(2*n0+1)*Mseg64*64 + m64j] = 255;
+ else {
+ int hapBit1 = (haploBits[m64cur*2*N + n1hap]>>j)&1;
+ int hapBit2 = (haploBits[m64cur*2*N + n2hap]>>j)&1;
+ if (g0 == 1) { // het (not missing)
+ uchar conf1 = phaseConfs[n1hap*Mseg64*64 + m64j];
+ uchar conf2 = phaseConfs[n2hap*Mseg64*64 + m64j];
+ if (!(conf1 == 0 || conf1 == 255) && (conf2 == 0 || conf2 == 255))
+ hapBit1 = 1-hapBit2; // only use n2hap if n1hap conf <100% and n2hap conf = 100%
+ else
+ hapBit2 = 1-hapBit1; // default: go with n1hap
+ }
+ if (sign == -1) std::swap(hapBit1, hapBit2);
+ phaseConfs2[2*n0*Mseg64*64 + m64j] = hetConfs[hapBit1];
+ phaseConfs2[(2*n0+1)*Mseg64*64 + m64j] = hetConfs[hapBit2];
+ }
+ }
+ else
+ phaseConfs2[2*n0*Mseg64*64 + m64j] = phaseConfs2[(2*n0+1)*Mseg64*64 + m64j] = 0;
+ }
+ }
+
+ string Eagle::computePhaseString(uint64 n0, uint64 nF1, uint64 nF2,
+ const vector <Match> &matches, const vector <int> &signs,
+ uint64 start, double cMend, bool cons)
+ const {
+
+ if ((int) nF1 == -1 || (int) nF2 == -1) return "";
+ vector <uint64> starts(matches.size()), ends(matches.size());
+ for (uint i = 0; i < matches.size(); i++) {
+ if (cons) {
+ starts[i] = std::max(matches[i].m64jStartCons, matches[i].m64jStart);
+ ends[i] = std::min(matches[i].m64jEndCons, matches[i].m64jEnd);
+ }
+ else {
+ starts[i] = matches[i].m64jStart;
+ ends[i] = matches[i].m64jEnd;
+ }
+ }
+ string phase;
+ double lastPhased = cMs64j[start], lastHet = lastPhased;
+ int hetCount = 0, snpCount = 0;
+ for (uint64 m64j = start; m64j < Mseg64*64 && cMs64j[m64j] < cMend; m64j++) {
+ if (!maskSnps64j[m64j]) continue;
+ uint64 m64cur = m64j/64; uint64 j = m64j&63;
+ const uint64_masks &bits0 = genoBits[m64cur*N + n0];
+ const uint64_masks &bitsF1 = genoBits[m64cur*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64cur*N + nF2];
+
+ int trioPhase = 1; // default
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase = 1;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase = -1;
+ int votes1 = 0, votes2 = 0;
+ snpCount++;
+ if ((bits0.is0|bits0.is2|bits0.is9)&(1ULL<<j)) {
+ bool wrong = false;
+ for (uint i = 0; i < matches.size(); i++)
+ if (signs[i] && starts[i] <= m64j && m64j <= ends[i]) {
+ const uint64_masks &bits1 = genoBits[m64cur*N + matches[i].n];
+ if (((bits0.is0&bits1.is2)|(bits0.is2&bits1.is0))&(1ULL<<j)) wrong = true;
+ }
+ if (wrong)
+ phase += wrongChar;
+ continue; // discard non-hets
+ }
+ for (uint i = 0; i < matches.size(); i++)
+ if (signs[i] && starts[i] <= m64j && m64j <= ends[i]) {
+ const uint64_masks &bits1 = genoBits[m64cur*N + matches[i].n];
+ int phase1 = 0;
+ if (bits1.is0&(1ULL<<j)) phase1 = 1 * signs[i] * trioPhase;
+ if (bits1.is2&(1ULL<<j)) phase1 = -1 * signs[i] * trioPhase;
+ if (phase1 == 1) votes1++;
+ else if (phase1 == -1) votes2++;
+ }
+ trioPhase = 0; // change default
+ if ((bitsF1.is0|bitsF2.is2)&(1ULL<<j)) trioPhase = 1;
+ if ((bitsF1.is2|bitsF2.is0)&(1ULL<<j)) trioPhase = -1;
+
+ if (votes1+votes2) {
+ double cM = cMs64j[m64j];
+ for (int tick = (int) (10*lastPhased) + 1; tick < 10*cM; tick++) {
+ if (tick % 10 == 0) phase += StringUtils::itos(tick/10); //(char) ('0' + (tick/10)%10);
+ else if (lastHet <= lastPhased) phase += ROHchar;
+ else phase += IBDx2char;
+ }
+ if (cM-lastPhased > 0.5) {
+ char buf[20]; sprintf(buf, "[%.1fcM:%d/%d]", cM-lastPhased, hetCount, snpCount);
+ phase += string(buf);
+ }
+ if (votes1&votes2)
+ phase += conflictChar;
+ else if (trioPhase) {
+ if (votes1)
+ phase += trio1;
+ else
+ phase += trio2;
+ }
+ else
+ phase += noTrioInfo;
+
+ hetCount = 0; snpCount = 0;
+ lastPhased = cM;
+ }
+ else
+ phase += '?';
+
+ lastHet = cMs64j[m64j];
+ hetCount++;
+ }
+
+ if (cMend >= cMs64j[Mseg64*64-1]) {
+ double cM = cMs64j[Mseg64*64-1];
+ for (int tick = (int) (10*lastPhased) + 1; tick < 10*cM; tick++) {
+ if (tick % 10 == 0) phase += (char) ('0' + (tick/10)%10);
+ else if (lastHet <= lastPhased) phase += ROHchar;
+ else phase += IBDx2char;
+ }
+ if (cM-lastPhased > 0.5) {
+ char buf[20]; sprintf(buf, "[%.1fcM:%d/%d]", cM-lastPhased, hetCount, snpCount);
+ phase += string(buf);
+ }
+ }
+ return phase;
+ }
+
+ void Eagle::printMatch(uint64 n0, uint64 nF1, uint64 nF2, const Match &duoMatch,
+ double memoLogBF[][4]) const {
+ if ((int) nF1 == -1 || (int) nF2 == -1) return;
+ uint64 n1 = duoMatch.n;
+ uint64 x = duoMatch.m64jStart, y = duoMatch.m64jEnd;
+ double logBF = duoMatch.logBF;
+ int same = 0, opp = 0; string phase;
+ double lastPhased = cMs64j[x], maxPhasedGap = 0, lastHet = cMs64j[x], maxHetGap = 0;
+ int hetCount = 0, snpCount = 0, numErr = 0;
+ for (uint m64j = duoMatch.m64jStart; m64j <= duoMatch.m64jEnd; m64j++) {
+ if (!maskSnps64j[m64j]) continue;
+ uint64 m64cur = m64j/64; uint64 j = m64j&63;
+ const uint64_masks &bits0 = genoBits[m64cur*N + n0];
+ const uint64_masks &bits1 = genoBits[m64cur*N + n1];
+ const uint64_masks &bitsF1 = genoBits[m64cur*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64cur*N + nF2];
+ uint64 wrongBits = (bits0.is0 & bits1.is2) | (bits0.is2 & bits1.is0);
+ if (wrongBits&(1ULL<<j)) {
+ phase += wrongChar;
+ numErr++;
+ }
+
+ uint64 trioPhased = ~(bits0.is0|bits0.is2|bits0.is9) &
+ ((bitsF1.is0^bitsF2.is0) | (bitsF1.is2^bitsF2.is2));
+ uint64 phased1 = ~(bits0.is0|bits0.is2|bits0.is9) & (bits1.is0|bits1.is2);
+
+ if (phased1&(1ULL<<j)) {
+ double cM = cMs64j[m64j];
+ for (int tick = (int) (10*lastPhased) + 1; tick < 10*cM; tick++) {
+ if (tick % 10 == 0) phase += (char) ('0' + (tick/10)%10);
+ else if (lastHet <= lastPhased) phase += ROHchar;
+ else phase += IBDx2char;
+ }
+ if (cM-lastPhased > 0.5) {
+ char buf[20]; sprintf(buf, "[%.1fcM:%d/%d]", cM-lastPhased, hetCount, snpCount);
+ phase += string(buf);
+ }
+ if (trioPhased&(1ULL<<j)) {
+ if ((bits1.is0&(1ULL<<j)) == ((bitsF1.is0|bitsF2.is2)&(1ULL<<j))) {
+ same++;
+ phase += trio1;
+ }
+ else {
+ opp++;
+ phase += trio2;
+ }
+ }
+ else
+ phase += noTrioInfo;
+
+ hetCount = 0; snpCount = 0;
+ if (cM - lastPhased > maxPhasedGap)
+ maxPhasedGap = cM - lastPhased;
+ lastPhased = cM;
+ }
+ else
+ snpCount++;
+
+ if ((~(bits0.is0|bits0.is2|bits0.is9))&(1ULL<<j)) {
+ double cM = cMs64j[m64j];
+ if (cM - lastHet > maxHetGap)
+ maxHetGap = cM - lastHet;
+ lastHet = cM;
+ hetCount++;
+ }
+ }
+ double cM = cMs64j[y];
+ if (cM - lastPhased > maxPhasedGap)
+ maxPhasedGap = cM - lastPhased;
+ lastPhased = cM;
+ if (cM - lastHet > maxHetGap)
+ maxHetGap = cM - lastHet;
+ lastHet = cM;
+
+ const uint WINDOW = 20; double minLogBFwindow = 0, minLoc = 0;
+ for (uint64 wStart = x; wStart+WINDOW <= y; wStart++) {
+ if (!maskSnps64j[wStart]) continue;
+ double logBFwindow = 0;
+ for (uint64 m64j = wStart; m64j < wStart+WINDOW; m64j++) {
+ const uint64_masks &bits1 = genoBits[m64j/64*N + n1];
+ uint geno1 = bgetGeno0123(bits1, m64j&63);
+ if (memoLogBF[m64j][geno1] != MEMO_UNSET) logBFwindow += memoLogBF[m64j][geno1];
+ }
+ if (logBFwindow < minLogBFwindow) {
+ minLogBFwindow = logBFwindow;
+ minLoc = cMs64j[wStart];
+ }
+ }
+
+ printf("n0=%-5d n1=%-5d (%s) BF= %.1f cM= %.1f (%.1f-%.1f):", (int) n0, (int) n1,
+ Nref==0 ? indivs[n1].indivID.c_str() : "", logBF, cMs64j[y]-cMs64j[x], cMs64j[x],
+ cMs64j[y]);
+ cout << " (" << numErr << " errs) ";
+ cout << same << "|" << opp;
+ cout << " " << phase;
+ cout << " max gap: " << maxPhasedGap;
+ cout << " max ROH: " << maxHetGap;
+ printf(" min window logBF: %.1f minLoc: %.1fcM", minLogBFwindow, minLoc);
+ cout << endl;
+ }
+
+ void Eagle::checkTrioErrorRate(uint64 n0, uint64 nF1, uint64 nF2) const {
+
+ if ((int) nF1 == -1) return;
+ int numOppHom1 = 0, numOppHom2 = 0, numWrongHet = 0, snpCount = 0;
+ for (uint m64j = 0; m64j < Mseg64*64; m64j++) {
+ if (!maskSnps64j[m64j]) continue;
+ snpCount++;
+ uint64 m64cur = m64j/64; uint64 j = m64j&63;
+ const uint64_masks &bits0 = genoBits[m64cur*N + n0];
+ const uint64_masks &bitsF1 = genoBits[m64cur*N + nF1];
+ const uint64_masks &bitsF2 = genoBits[m64cur*N + nF2];
+ uint64 oppHom1 = (bits0.is0 & bitsF1.is2) | (bits0.is2 & bitsF1.is0);
+ uint64 oppHom2 = (bits0.is0 & bitsF2.is2) | (bits0.is2 & bitsF2.is0);
+ uint64 wrongHet = ~(bits0.is0|bits0.is2|bits0.is9) &
+ ((bitsF1.is0&bitsF2.is0) | (bitsF1.is2&bitsF2.is2));
+ if (oppHom1&(1ULL<<j)) numOppHom1++;
+ if (oppHom2&(1ULL<<j)) numOppHom2++;
+ if (wrongHet&(1ULL<<j)) numWrongHet++;
+ }
+ cout << "oppHom1: " << numOppHom1 << " oppHom2: " << numOppHom2 << " wrongHet: " << numWrongHet
+ << " / " << snpCount << endl;
+ }
+
+ void Eagle::findLongHalfIBD(uint64 n0, vector <uint> topInds[2], vector <uint> topIndsLens[2],
+ uint K) const {
+ /*
+ for (uint e = 0; e < 2; e++) {
+ topInds[e] = vector <uint> (Mseg64 * K);
+ topIndsLens[e] = vector <uint> (Mseg64);
+ }
+ uint *runStarts[2][2]; // [even/odd][max err]; lengths are N
+ for (uint p = 0; p < 2; p++)
+ for (uint e = 0; e < 2; e++)
+ runStarts[p][e] = ALIGNED_MALLOC_UINTS(N);
+ uint *runStartFreqs[2][2]; // [even/odd][max err]; lengths are Mseg64+1 (b/c m64+1 can go over)
+ for (uint p = 0; p < 2; p++)
+ for (uint e = 0; e < 2; e++)
+ runStartFreqs[p][e] = ALIGNED_MALLOC_UINTS(Mseg64+1);
+
+ // initialize "prev" p=1
+ for (uint e = 0; e < 2; e++) {
+ const uint p = 1;
+ memset(runStarts[p][e], 0, N*sizeof(runStarts[p][e][0])); // all runs start at 0
+ runStartFreqs[p][e][0] = N;
+ memset(&runStartFreqs[p][e][1], 0, Mseg64*sizeof(runStarts[p][e][1]));
+ }
+
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ // initialize cur [p=(m64&1)] arrays
+ const uint p = m64&1;
+ for (uint e = 0; e < 2; e++) {
+ memcpy(runStarts[p][e], runStarts[!p][e], N*sizeof(runStarts[p][e][0]));
+ memset(runStartFreqs[p][e], 0, (Mseg64+1)*sizeof(runStartFreqs[p][e][0]));
+ }
+
+ uint64_masks bits0 = genoBits[m64*N + n0];
+ for (uint64 n1 = 0; n1 < N; n1++) {
+ const uint64_masks &bits1 = genoBits[m64*N + n1];
+ uint64 wrongBits = (bits0.is0 & bits1.is2) | (bits0.is2 & bits1.is0);
+ // update cur runStarts based on bit matches
+ if (wrongBits) {
+ if (wrongBits & (wrongBits-1)) // 2+ wrong => fail; move both starts forward
+ runStarts[p][0][n1] = runStarts[p][1][n1] = m64+1;
+ else { // 1 wrong => move 1-err start to 0-err start; fail 0-err
+ runStarts[p][1][n1] = runStarts[p][0][n1];
+ runStarts[p][0][n1] = m64+1;
+ }
+ }
+ // compute cur runStartFreqs based on (updated) cur runStarts
+ for (uint e = 0; e < 2; e++)
+ runStartFreqs[p][e][runStarts[p][e][n1]]++;
+ }
+
+ for (uint e = 0; e < 2; e++)
+ for (uint s64 = 0; s64 <= m64; s64++)
+ if ((runStartFreqs[p][e][s64] >= K && m64+1 == Mseg64) || // reached end
+ (runStartFreqs[p][e][s64] < K && runStartFreqs[!p][e][s64] >= K)) {
+ // all n1 s.t. runStarts[p][e][n1] == s64 are in top K starting from s64
+ // some n1 s.t. runStarts[!p][e][n1] == s64 are in top K starting from s64
+ uint allPos = 0, somePos = runStartFreqs[p][e][s64];
+ uint *topIndsChunk = &topInds[e][s64*K];
+ for (uint64 n1 = 0; n1 < N; n1++) {
+ if (allPos < K && runStarts[p][e][n1] == s64)
+ topIndsChunk[allPos++] = n1;
+ else if (somePos < K && runStarts[!p][e][n1] == s64)
+ topIndsChunk[somePos++] = n1;
+ }
+ topIndsLens[e][s64] = min(somePos, K);
+ }
+ }
+
+ for (uint p = 0; p < 2; p++)
+ for (uint e = 0; e < 2; e++)
+ ALIGNED_FREE(runStartFreqs[p][e]);
+ for (uint p = 0; p < 2; p++)
+ for (uint e = 0; e < 2; e++)
+ ALIGNED_FREE(runStarts[p][e]);
+ */
+ for (uint e = 0; e < 2; e++) {
+ topInds[e] = vector <uint> (Mseg64 * K);
+ topIndsLens[e] = vector <uint> (Mseg64);
+ }
+ uint *runStarts[2]; // [even/odd]; lengths are N
+ for (uint p = 0; p < 2; p++)
+ runStarts[p] = ALIGNED_MALLOC_UINTS(N);
+ uint *runStartFreqs[2]; // [even/odd]; lengths are Mseg64+1 (b/c m64+1 can go over)
+ for (uint p = 0; p < 2; p++)
+ runStartFreqs[p] = ALIGNED_MALLOC_UINTS(Mseg64+1);
+
+ // initialize "prev" p=1
+ {
+ const uint p = 1;
+ memset(runStarts[p], 0, N*sizeof(runStarts[p][0])); // all runs start at 0
+ runStartFreqs[p][0] = N;
+ memset(&runStartFreqs[p][1], 0, Mseg64*sizeof(runStarts[p][1]));
+ }
+
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ // initialize cur [p=(m64&1)] arrays
+ const uint p = m64&1;
+ {
+ memcpy(runStarts[p], runStarts[!p], N*sizeof(runStarts[p][0]));
+ memset(runStartFreqs[p], 0, (Mseg64+1)*sizeof(runStartFreqs[p][0]));
+ }
+
+ uint64_masks bits0 = genoBits[m64*N + n0];
+ for (uint64 n1 = 0; n1 < N; n1++) {
+ const uint64_masks &bits1 = genoBits[m64*N + n1];
+ uint64 wrongBits = (bits0.is0 & bits1.is2) | (bits0.is2 & bits1.is0);
+ // update cur runStarts based on bit matches
+ if (wrongBits) {
+ runStarts[p][n1] = m64+1;
+ }
+ // compute cur runStartFreqs based on (updated) cur runStarts
+ runStartFreqs[p][runStarts[p][n1]]++;
+ }
+
+ for (uint s64 = 0; s64 <= m64; s64++)
+ if ((runStartFreqs[p][s64] >= K && m64+1 == Mseg64) || // reached end
+ (runStartFreqs[p][s64] < K && runStartFreqs[!p][s64] >= K)) {
+ // all n1 s.t. runStarts[p][n1] == s64 are in top K starting from s64
+ // some n1 s.t. runStarts[!p][n1] == s64 are in top K starting from s64
+ uint allPos = 0, somePos = runStartFreqs[p][s64];
+ uint *topIndsChunk = &topInds[1][s64*K];
+ for (uint64 n1 = 0; n1 < N; n1++) {
+ if (allPos < K && runStarts[p][n1] == s64)
+ topIndsChunk[allPos++] = n1;
+ else if (somePos < K && runStarts[!p][n1] == s64)
+ topIndsChunk[somePos++] = n1;
+ }
+ topIndsLens[1][s64] = min(somePos, K);
+ }
+ }
+
+ for (uint p = 0; p < 2; p++)
+ ALIGNED_FREE(runStartFreqs[p]);
+ for (uint p = 0; p < 2; p++)
+ ALIGNED_FREE(runStarts[p]);
+ }
+
+ void Eagle::findLongDipHap(uint64 n0, vector <uint> topInds[2], vector <uint> topIndsLens[2],
+ uint K, uint errStart=1) const {
+
+ uint64 Nhaps = 2*(Nref==0 ? N : Nref);
+ for (uint e = 0; e < 2; e++) {
+ topInds[e] = vector <uint> (Mseg64 * K);
+ topIndsLens[e] = vector <uint> (Mseg64);
+ }
+ uint *runStarts[2][2]; // [even/odd][max err]; lengths are Nhaps
+ for (uint p = 0; p < 2; p++)
+ for (uint e = 0; e < 2; e++)
+ runStarts[p][e] = ALIGNED_MALLOC_UINTS(Nhaps);
+ uint *runStartFreqs[2][2]; // [even/odd][max err]; lengths are Mseg64+1 (b/c m64+1 can go over)
+ for (uint p = 0; p < 2; p++)
+ for (uint e = 0; e < 2; e++)
+ runStartFreqs[p][e] = ALIGNED_MALLOC_UINTS(Mseg64+1);
+
+ // initialize "prev" p=1
+ for (uint e = 0; e < 2; e++) {
+ const uint p = 1;
+ memset(runStarts[p][e], 0, Nhaps*sizeof(runStarts[p][e][0])); // all runs start at 0
+ runStartFreqs[p][e][0] = Nhaps;
+ memset(&runStartFreqs[p][e][1], 0, Mseg64*sizeof(runStarts[p][e][1]));
+ }
+
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ // initialize cur [p=(m64&1)] arrays
+ const uint p = m64&1;
+ for (uint e = 0; e < 2; e++) {
+ memcpy(runStarts[p][e], runStarts[!p][e], Nhaps*sizeof(runStarts[p][e][0]));
+ memset(runStartFreqs[p][e], 0, (Mseg64+1)*sizeof(runStartFreqs[p][e][0]));
+ }
+
+ uint64_masks bits0 = genoBits[m64*N + n0];
+ //uint *runStarts_p_0 = runStarts[p][0], *runStarts_p_1 = runStarts[p][1];
+ //uint *runStartFreqs_p_0 = runStartFreqs[p][0], *runStartFreqs_p_1 = runStartFreqs[p][1];
+ for (uint64 n1 = 0; n1 < Nhaps; n1++) {
+ uint64 is1 = haploBits[m64*2*N + n1];
+ uint64 wrongBits = (bits0.is0 & is1) | (bits0.is2 & ~is1);
+
+ // update cur runStarts based on bit matches
+ if (wrongBits) {
+ if (wrongBits & (wrongBits-1)) // 2+ wrong => fail; move both starts forward
+ runStarts[p][0][n1] = runStarts[p][1][n1] = m64+1;
+ else { // 1 wrong => move 1-err start to 0-err start; fail 0-err
+ runStarts[p][1][n1] = runStarts[p][0][n1];
+ runStarts[p][0][n1] = m64+1;
+ }
+ }
+ // compute cur runStartFreqs based on (updated) cur runStarts
+ for (uint e = errStart; e < 2; e++)
+ runStartFreqs[p][e][runStarts[p][e][n1]]++;
+ /*
+ if (wrongBits) {
+ if (wrongBits & (wrongBits-1)) // 2+ wrong => fail; move both starts forward
+ runStarts_p_0[n1] = runStarts_p_1[n1] = m64+1;
+ else { // 1 wrong => move 1-err start to 0-err start; fail 0-err
+ runStarts_p_1[n1] = runStarts_p_0[n1];
+ runStarts_p_0[n1] = m64+1;
+ }
+ }
+ runStartFreqs_p_0[runStarts_p_0[n1]]++;
+ runStartFreqs_p_1[runStarts_p_1[n1]]++;
+ */
+ }
+
+ for (uint e = errStart; e < 2; e++)
+ for (uint s64 = 0; s64 <= m64; s64++)
+ if ((runStartFreqs[p][e][s64] >= K && m64+1 == Mseg64) || // reached end
+ (runStartFreqs[p][e][s64] < K && runStartFreqs[!p][e][s64] >= K)) {
+ // all n1 s.t. runStarts[p][e][n1] == s64 are in top K starting from s64
+ // some n1 s.t. runStarts[!p][e][n1] == s64 are in top K starting from s64
+ uint allPos = 0, somePos = runStartFreqs[p][e][s64];
+ uint *topIndsChunk = &topInds[e][s64*K];
+ for (uint64 n1 = 0; n1 < Nhaps; n1++) {
+ if (allPos < K && runStarts[p][e][n1] == s64)
+ topIndsChunk[allPos++] = n1;
+ else if (somePos < K && runStarts[!p][e][n1] == s64)
+ topIndsChunk[somePos++] = n1;
+ }
+ topIndsLens[e][s64] = min(somePos, K);
+ }
+ }
+
+ for (uint p = 0; p < 2; p++)
+ for (uint e = 0; e < 2; e++)
+ ALIGNED_FREE(runStartFreqs[p][e]);
+ for (uint p = 0; p < 2; p++)
+ for (uint e = 0; e < 2; e++)
+ ALIGNED_FREE(runStarts[p][e]);
+ }
+
+ void Eagle::randomlyPhaseTmpHaploBitsT(uint64 n0) {
+ // fast rng: last 16 bits of Marsaglia's MWC
+ uint w = 521288629;
+ for (uint i = 0; i < (n0 & 0xff); i++)
+ w=18000*(w&65535)+(w>>16);
+
+ uint64 n1 = (n0+1) % N;
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++)
+ if (maskSnps64j[m64j]) {
+ uint g = getGeno0123(m64j, n0);
+ if (g == 3) g = getGeno0123(m64j, n1); // if missing, try filling with geno of next sample
+ if (g == 3) g = 1; // if still missing, set to het
+
+ if (g == 0) // nothing to do; tmpHaploBitsT is already cleared in init()
+ ;
+ else if (g == 2) { // set bit in both parental haplotypes
+ for (uint64 q = 0; q <= 1ULL; q++)
+ tmpHaploBitsT[(2*n0+q)*Mseg64 + (m64j/64)] |= 1ULL<<(m64j&63);
+ }
+ else { // set bit in random parental haplotype
+ uint64 q = (w=18000*(w&65535)+(w>>16))&1;
+ tmpHaploBitsT[(2*n0+q)*Mseg64 + (m64j/64)] |= 1ULL<<(m64j&63);
+ }
+ }
+ }
+
+ pair <double, vector <double> > Eagle::findLongDipMatches(uint64 n0, uint64 nF1, uint64 nF2) {
+
+ if (!maskIndivs[n0]) return make_pair(0.0, vector <double> ());
+
+ vector <uint> topInds[2]; // [max err]; lengths are Mseg64 * K
+ vector <uint> topIndsLens[2]; // [max err]; lengths are Mseg64
+
+ const uint K = 10;
+ Timer timer;
+ findLongHalfIBD(n0, topInds, topIndsLens, K);
+ double halfIBDtime = timer.update_time();
+
+ const double duoMatchThresh = log10(N*10), longMatchMin = 4.0;
+
+
+ /***** COMPUTE BAYES FACTORS FOR LONG HALF-IBD REGIONS TO SELECT LONG MATCHES *****/
+
+ vector <Match> cumTopMatches;
+ vector <Match> longMatches;
+ double *workLogBF = ALIGNED_MALLOC_DOUBLES(Mseg64*64);
+ double (*memoLogBF)[4] = (double (*)[4]) ALIGNED_MALLOC(Mseg64*64*4*sizeof(double));
+ // initialize memo lookup table of logBFs (n1 given n0)
+ for (uint m64j = 0; m64j < Mseg64*64; m64j++)
+ for (uint g = 0; g < 4; g++)
+ memoLogBF[m64j][g] = MEMO_UNSET;
+
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ std::set <uint> n1s; n1s.insert(n0);
+ // delete previous matches that have ended: sort by greaterEnd and erase range at end
+ sort(cumTopMatches.begin(), cumTopMatches.end(), Match::greaterEnd);
+ for (uint k = 0; k < cumTopMatches.size(); k++) {
+ if (m64 >= cumTopMatches[k].m64jEnd/64) {
+ cumTopMatches.resize(k);
+ break;
+ }
+ else
+ n1s.insert(cumTopMatches[k].n);
+ }
+
+ // incorporate new matches
+ for (uint e = 0; e < 2; e++) {
+ for (uint k = 0; k < topIndsLens[e][m64]; k++) {
+ uint n1 = topInds[e][m64*K + k];
+ if (maskIndivs[n1] && !n1s.count(n1)) {
+ n1s.insert(n1);
+ Match duoMatch = computeDuoLogBF(memoLogBF, workLogBF, n0, n1, m64);
+ assert(maskSnps64j[duoMatch.m64jStart]);
+ assert(maskSnps64j[duoMatch.m64jEnd]);
+
+ if (duoMatch.logBF > duoMatchThresh && duoMatch.cMlenInit >= longMatchMin) {
+ longMatches.push_back(duoMatch);
+#ifdef VERBOSE
+ printMatch(n0, nF1, nF2, duoMatch, memoLogBF);
+#endif
+ //retractMatch(n0, duoMatch, memoLogBF);
+ cumTopMatches.push_back(duoMatch);
+ }
+ }
+ }
+ }
+ }
+#ifdef VERBOSE
+ cout << "num longMatches: " << longMatches.size() << endl;
+#endif
+
+ /***** TRIM MATCHES UNTIL CONSISTENT; DETERMINE SAME/OPP ORIENTATION OF PAIRS *****/
+
+ const double longMatchMinTrim = 3.0, minSameOppDiff = 0.5, minMatchLenDiff = 0.5;
+
+ sort(longMatches.begin(), longMatches.end(), Match::greaterLen);
+ vector < vector <uint> > sameEdges(longMatches.size()), oppEdges(longMatches.size());
+ for (uint i = 0; i < longMatches.size(); i++)
+ for (uint j = 0; j < i; j++) { // trim against previous (longer) matches
+ double iLen = cMs64j[longMatches[i].m64jEnd] - cMs64j[longMatches[i].m64jStart];
+ double jLen = cMs64j[longMatches[j].m64jEnd] - cMs64j[longMatches[j].m64jStart];
+ // check if too short (after previous trimming)
+ if (iLen < longMatchMinTrim || jLen < longMatchMinTrim) continue;
+
+ Match &d1 = longMatches[iLen<jLen?i:j]; // shorter
+ Match &d2 = longMatches[iLen<jLen?j:i]; // longer
+
+ if (d1.n == nF1 || d1.n == nF2 || d2.n == nF1 || d2.n == nF2) continue;
+ if (d1.m64jEnd < d2.m64jStart || d2.m64jEnd < d1.m64jStart)
+ continue; // no overlap
+#ifdef VERBOSE
+ printf(" cM= %.1f (%.1f-%.1f), cM= %.1f (%.1f-%.1f):",
+ cMs64j[d1.m64jEnd]-cMs64j[d1.m64jStart], cMs64j[d1.m64jStart], cMs64j[d1.m64jEnd],
+ cMs64j[d2.m64jEnd]-cMs64j[d2.m64jStart], cMs64j[d2.m64jStart], cMs64j[d2.m64jEnd]);
+#endif
+ uint64 maxStart = max(d1.m64jStart, d2.m64jStart);
+ uint64 minEnd = min(d1.m64jEnd, d2.m64jEnd);
+ uint64 lastSame = maxStart, lastOpp = lastSame;
+ double longestSame = 0, longestOpp = 0;
+ uint64 sameStart = 0, sameEnd = 0, oppStart = 0, oppEnd = 0;
+ for (uint64 m64j = maxStart; m64j <= minEnd; m64j++) {
+ bool same = false, opp = false;
+ if (m64j == minEnd) {
+ same = true; opp = true;
+ }
+ else if (getGeno0123(m64j, n0) == 1) {
+ uint g1 = getGeno0123(m64j, d1.n);
+ uint g2 = getGeno0123(m64j, d2.n);
+ if ((g1==0||g1==2) && (g2==0||g2==2)) {
+ same = g1==g2;
+ opp = g1!=g2;
+ }
+ }
+ if (same) {
+ double oppLen = cMs64j[m64j] - cMs64j[lastSame];
+ if (longestOpp < oppLen) {
+ longestOpp = oppLen;
+ oppStart = lastSame; oppEnd = m64j;
+ }
+ lastSame = m64j;
+ }
+ if (opp) {
+ double sameLen = cMs64j[m64j] - cMs64j[lastOpp];
+ if (longestSame < sameLen) {
+ longestSame = sameLen;
+ sameStart = lastOpp; sameEnd = m64j;
+ }
+ lastOpp = m64j;
+ }
+ }
+ double fullOverlap = cMs64j[minEnd] - cMs64j[maxStart];
+#ifdef VERBOSE
+ printf(" %.1f|%.1f (%.1f-%.1f)|(%.1f-%.1f)\n", longestSame, longestOpp,
+ cMs64j[sameStart], cMs64j[sameEnd], cMs64j[oppStart], cMs64j[oppEnd]);
+#endif
+
+ uint64 trimStart = 0; int orientation = 0;
+ if (longestSame > longestOpp + minSameOppDiff
+ || (longestSame == fullOverlap && longestSame > longestOpp)) {
+ sameEdges[i].push_back(j);
+ sameEdges[j].push_back(i);
+ orientation = 1;
+ trimStart = (sameStart+sameEnd)/2;
+ }
+ else if (longestOpp > longestSame + minSameOppDiff
+ || (longestOpp == fullOverlap && longestOpp > longestSame)) {
+ oppEdges[i].push_back(j);
+ oppEdges[j].push_back(i);
+ orientation = -1;
+ trimStart = (oppStart+oppEnd)/2;
+ }
+
+ if (orientation != 0) { // clearly same or opp
+ // left end
+ if (cMs64j[d1.m64jStart] < cMs64j[d2.m64jStart] - minMatchLenDiff) // d1 extends further
+ trim(d2, d1, n0, orientation, trimStart, -1, workLogBF); // => trim d2
+ else
+ trim(d1, d2, n0, orientation, trimStart, -1, workLogBF);
+ // right end
+ if (cMs64j[d1.m64jEnd] > cMs64j[d2.m64jEnd] + minMatchLenDiff) // d1 extends further
+ trim(d2, d1, n0, orientation, trimStart, 1, workLogBF); // => trim d2
+ else
+ trim(d1, d2, n0, orientation, trimStart, 1, workLogBF);
+#ifdef VERBOSE
+ printf(" --> %.1f (%.1f-%.1f) --> %.1f (%.1f-%.1f)\n",
+ cMs64j[d1.m64jEnd]-cMs64j[d1.m64jStart], cMs64j[d1.m64jStart], cMs64j[d1.m64jEnd],
+ cMs64j[d2.m64jEnd]-cMs64j[d2.m64jStart], cMs64j[d2.m64jStart], cMs64j[d2.m64jEnd]);
+#endif
+ }
+ else { // can't determine same vs. opp => chop shorter so that matches no longer overlap
+ if (d1.m64jStart < d2.m64jStart) d1.m64jEnd = d2.m64jStart-1;
+ else d1.m64jStart = min(d2.m64jEnd+1ULL, Mseg64*64-1);
+#ifdef VERBOSE
+ printf(" trimmed first to cM= %.1f (%.1f-%.1f)\n",
+ cMs64j[d1.m64jEnd]-cMs64j[d1.m64jStart], cMs64j[d1.m64jStart], cMs64j[d1.m64jEnd]);
+#endif
+ }
+ }
+
+
+ /***** REDUCE TO SET OF TRIMMED MATCHES WITH CONSISTENT SIGNS *****/
+
+ vector <bool> kept(longMatches.size()), checked(longMatches.size());
+ for (uint t = 0; t < longMatches.size(); t++) {
+ // find longest remaining trimmed match
+ uint i = 0; double longest = 0;
+ for (uint iTest = 0; iTest < longMatches.size(); iTest++)
+ if (!checked[iTest]) {
+ const Match &d1 = longMatches[iTest];
+ if (cMs64j[d1.m64jEnd]-cMs64j[d1.m64jStart] > longest) {
+ longest = cMs64j[d1.m64jEnd]-cMs64j[d1.m64jStart];
+ i = iTest;
+ }
+ }
+ checked[i] = true;
+ if (longest < longMatchMinTrim) break;
+ if (longMatches[i].n == nF1 || longMatches[i].n == nF2) continue;
+
+ kept[i] = true;
+#ifdef VERBOSE
+ const Match &d1 = longMatches[i];
+ printf("cM= %.1f (%.1f-%.1f)\n",
+ cMs64j[d1.m64jEnd]-cMs64j[d1.m64jStart], cMs64j[d1.m64jStart], cMs64j[d1.m64jEnd]);
+#endif
+ vector <int> signs = searchSigns(longMatches, sameEdges, oppEdges, kept);
+ if (signs.empty()) { // inconsistent signs
+ kept[i] = false;
+#ifdef VERBOSE
+ cout << " WARNING: sign inconsistency: eliminating" << endl;
+#endif
+ }
+ }
+
+ // compute final signs
+ vector <int> signs = searchSigns(longMatches, sameEdges, oppEdges, kept);
+
+#ifdef VERBOSE
+ for (uint i = 0; i < longMatches.size(); i++)
+ if (kept[i]) {
+ const Match &d1 = longMatches[i];
+ printf("cM= %.1f (%.1f-%.1f): ",
+ vector <Match> curMatch(1, d1); vector <int> curSign(1, signs[i]);
+ cMs64j[d1.m64jEnd]-cMs64j[d1.m64jStart], cMs64j[d1.m64jStart], cMs64j[d1.m64jEnd]);
+ cout << computePhaseString(n0, nF1, nF2, curMatch, curSign, d1.m64jStart,
+ cMs64j[d1.m64jEnd]+1e-9, false) << endl;
+ cout << computePhaseString(n0, nF1, nF2, curMatch, curSign, d1.m64jStart,
+ cMs64j[d1.m64jEnd]+1e-9, true) << endl;
+ }
+ cout << endl << endl
+ << "phase: " << computePhaseString(n0, nF1, nF2, longMatches, signs, 0, 1e100, false) << endl;
+ cout << endl << endl
+ << "phase: " << computePhaseString(n0, nF1, nF2, longMatches, signs, 0, 1e100, true) << endl;
+#endif
+ computePhaseConfs(n0, longMatches, signs, true);
+ if ((int) nF1 != -1 && (int) nF2 != -1) {
+ checkPhase(n0, nF1, nF2, 0.1);
+ checkPhase(n0, nF1, nF2, 0.5);
+ }
+
+ ALIGNED_FREE(workLogBF);
+ ALIGNED_FREE(memoLogBF);
+
+ // record longest match length per seg64 for output
+ vector <double> cMs(Mseg64);
+ bool cons = true;
+ for (uint i = 0; i < longMatches.size(); i++) {
+ if (!signs[i]) continue;
+ uint64 start, end;
+ if (cons) {
+ start = std::max(longMatches[i].m64jStartCons, longMatches[i].m64jStart);
+ end = std::min(longMatches[i].m64jEndCons, longMatches[i].m64jEnd);
+ }
+ else {
+ start = longMatches[i].m64jStart;
+ end = longMatches[i].m64jEnd;
+ }
+ for (uint64 m64 = (start+63)/64; m64 < end/64; m64++)
+ cMs[m64] = std::max(cMs[m64], cMs64j[end]-cMs64j[start]);
+ }
+
+ return make_pair(halfIBDtime, cMs);
+ }
+
+ int Eagle::numDipHapWrongBits(uint64 m64, uint64 n0, uint64 n1hap) const {
+ uint64 is1 = haploBits[m64*2*N + n1hap];
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ uint64 wrongBits = (bits0.is0 & is1) | (bits0.is2 & ~is1);
+ return popcount64(wrongBits);
+ }
+
+ int Eagle::firstDipHapGoodBit(uint64 m64, uint64 n0, uint64 n1hap) const {
+ uint64 is1 = haploBits[m64*2*N + n1hap];
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ uint64 wrongBits = (bits0.is0 & is1) | (bits0.is2 & ~is1);
+ return wrongBits ? 64 - __builtin_clzll(wrongBits) : 0; // MSB
+ }
+
+ int Eagle::firstDipHapWrongBit(uint64 m64, uint64 n0, uint64 n1hap) const {
+ if (m64 >= Mseg64) return 0;
+ uint64 is1 = haploBits[m64*2*N + n1hap];
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ uint64 wrongBits = (bits0.is0 & is1) | (bits0.is2 & ~is1);
+ return popcount64((wrongBits & ~(wrongBits-1))-1); // LSB
+ }
+
+ struct DipHapSeg {
+ uint n, start, end;
+ DipHapSeg(uint _n, uint _start, uint _end) : n(_n), start(_start), end(_end) {}
+ bool operator < (const DipHapSeg &seg2) const {
+ return end-start > seg2.end-seg2.start
+ || (end-start == seg2.end-seg2.start && n < seg2.n);
+ }
+ };
+ struct DipHapSegFarther {
+ bool operator() (const DipHapSeg &seg1, const DipHapSeg &seg2) const {
+ return seg1.end > seg2.end || (seg1.end == seg2.end && seg1.n < seg2.n);
+ }
+ };
+
+ int Eagle::countSE(const vector <bool> &phaseVec) {
+ int ans = 0;
+ for (uint h = 1; h < phaseVec.size(); h++)
+ ans += (phaseVec[h] != phaseVec[h-1]);
+ return ans;
+ }
+
+ int Eagle::countMajorSE(const vector <bool> &phaseVec) {
+ vector <bool> phaseVec7;
+ for (uint h7 = 0; h7+7 < phaseVec.size(); h7 += 7) {
+ int votes = 0;
+ for (uint h = h7; h < h7+7; h++)
+ votes += phaseVec[h];
+ phaseVec7.push_back(votes >= 4);
+ }
+ return countSE(phaseVec7);
+ }
+
+ double Eagle::findLongHapMatches(uint64 n0, uint64 nF1, uint64 nF2, int iter) {
+
+ if (!maskIndivs[n0]) return 0;
+
+ if (Mseg64 < 3U) {
+ cerr << "Too few SNP segments for analysis" << endl;
+ exit(1);
+ }
+
+ uint seed = n0; // for rand_r()
+
+ vector <uint> topInds[2]; // [max err]; lengths are Mseg64 * K
+ vector <uint> topIndsLens[2]; // [max err]; lengths are Mseg64
+
+ const uint K = 20;
+ Timer timer;
+ findLongDipHap(n0, topInds, topIndsLens, K);
+ double hapTime = timer.update_time();
+ /*
+ // VISUALIZE RESULTS
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ cout << "m64 = " << m64 << endl;
+ int m64xMin = std::max(0, (int) m64-5);
+ int m64xMax = std::min(m64+11, Mseg64);
+ for (int m64x = m64xMin; m64x < m64xMax; m64x++) {
+ int numSnps = 0;
+ for (int j = 0; j < 64; j++)
+ numSnps += maskSnps64j[m64x*64+j];
+ printf("%2d ", numSnps);
+ }
+ cout << endl;
+ for (uint e = 0; e < 2; e++) {
+ cout << "e = " << e << endl;
+ for (uint k = 0; k < topIndsLens[e][m64]; k++) {
+ uint64 n1hap = topInds[e][m64*K + k];
+ for (int m64x = m64xMin; m64x < m64xMax; m64x++) {
+ printf("%2d ", numDipHapWrongBits(m64x, n0, n1hap));
+ }
+ cout << " (" << n1hap << ")" << endl;
+ }
+ }
+ }
+ */
+ const int maxWrongBits = 3;
+ std::set <DipHapSeg> longDipHapSegs[Mseg64];
+ const uint maxSegs = (iter == 2 ? 10 : 20); // TODO: increase accuracy with 20?
+ vector <uint> lastEnd64(2*N);
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ //if ((int) nF1 != -1) cout << "m64 = " << m64 << endl;
+ for (uint e = 0; e < 2; e++) {
+ //if ((int) nF1 != -1) cout << "e = " << e << endl;
+ for (uint k = 0; k < topIndsLens[e][m64]; k++) {
+ uint64 n1hap = topInds[e][m64*K + k];
+ if (!maskIndivs[n1hap/2]) continue;
+ if (n1hap/2 == n0) continue;
+ if (lastEnd64[n1hap] > m64) continue;
+
+ // find start
+ uint segStart = m64;
+ while ((int) segStart >= 0 && numDipHapWrongBits(segStart, n0, n1hap) <= maxWrongBits)
+ segStart--;
+ segStart++;
+ // find end
+ uint segEnd = m64;
+ while (segEnd < Mseg64 && numDipHapWrongBits(segEnd, n0, n1hap) <= maxWrongBits)
+ segEnd++;
+
+ //checkHapPhase1(n0, nF1, nF2, n1hap, segStart, segEnd);
+
+ lastEnd64[n1hap] = segEnd;
+ for (uint64 m64x = segStart; m64x < segEnd; m64x++) {
+ longDipHapSegs[m64x].insert(DipHapSeg(n1hap, segStart, segEnd));
+ if (longDipHapSegs[m64x].size() > maxSegs)
+ longDipHapSegs[m64x].erase(--longDipHapSegs[m64x].end());
+ }
+ }
+ }
+ }
+ /*
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ cout << "m64 = " << m64 << endl;
+ for (std::set <DipHapSeg>::iterator it = longDipHapSegs[m64].begin();
+ it != longDipHapSegs[m64].end(); it++) {
+ printf("%2d: %2d-%2d (%d)\n", it->end-it->start, it->start, it->end, it->n);
+ }
+ }
+ */
+ uint64 curHaploBitsT[Mseg64];
+ const uint64 side = 1;
+ //const uint numHapHaps = 5;
+ vector <uint> errBests(Mseg64);
+ vector < pair <uint, uint > > n12hapBests(Mseg64);
+ for (uint64 m64 = 0+side; m64+side < Mseg64; m64++) {
+ /*
+ if ((int) nF1 != -1) {
+ cout << "m64 = " << m64 << endl;
+ for (uint i = 0; i < numHapHaps; i++) printf(" ");
+ }
+ checkHapPhase(n0, nF1, nF2, haploBitsT + 2*n0*Mseg64, m64, side);
+ */
+ vector < pair <uint, uint> > bestHitPairs; uint minWrongBits = 99;
+ for (std::set <DipHapSeg>::iterator it = longDipHapSegs[m64].begin();
+ it != longDipHapSegs[m64].end(); it++) { // for each dip-hap long match
+ uint64 n1hap = it->n;
+ // ---------- SLOW ----------
+ /*
+ std::set < pair <uint, uint> > wrongBitsHaps; uint worstInSet = 1<<30;
+ for (uint64 n2hap = 0; n2hap < 2*N; n2hap++) {
+ if (!maskIndivs[n2hap/2]) continue;
+ if (n2hap/2 == n0) continue;
+ if (n2hap/2 == n1hap/2) continue; // opp haps of same indiv
+ uint numWrongBits = 0;
+ for (uint64 m64x = m64-side; m64x <= m64+side; m64x++) {
+ uint64 n1is1 = haploBits[m64x*2*N + n1hap];
+ uint64 n2is1 = haploBits[m64x*2*N + n2hap];
+ const uint64_masks &bits0 = genoBits[m64x*N + n0];
+ uint64 wrongBits = (bits0.is0 & (n1is1 | n2is1)) | (bits0.is2 & ~(n1is1 & n2is1))
+ | (~(bits0.is0|bits0.is2|bits0.is9) & ~(n1is1 ^ n2is1));
+ numWrongBits += popcount64(wrongBits);
+ }
+ if (numWrongBits < worstInSet) {
+ wrongBitsHaps.insert(make_pair(numWrongBits, n2hap));
+ if (wrongBitsHaps.size() > numHapHaps) {
+ wrongBitsHaps.erase(--wrongBitsHaps.end());
+ worstInSet = (--wrongBitsHaps.end())->first;
+ }
+ }
+ }
+ for (std::set < pair <uint, uint> >::iterator itH = wrongBitsHaps.begin();
+ itH != wrongBitsHaps.end(); itH++)
+ printf("%2d ", itH->first);
+ //printf("%2d (%d)\n", itH->first, itH->second);
+ */
+ // ---------- FAST ----------
+ if ((int) nF1 != -1)
+ printf("| ");
+ for (uint64 m64x = m64-side; m64x <= m64+side; m64x++) {
+ curHaploBitsT[m64x] = 0;
+ for (uint64 j = 0; j < 64; j++) {
+ uint64 m64j = m64x*64+j, bit = 0;
+ if (maskSnps64j[m64j]) {
+ uint g0 = getGeno0123(m64j, n0);
+ if (g0 == 0 || g0 == 3) bit = 0;
+ else if (g0 == 2) bit = 1;
+ else bit = 1-((haploBitsT[n1hap*Mseg64 + m64x]>>j)&1);
+ }
+ curHaploBitsT[m64x] |= bit<<j;
+ }
+ }
+ std::ostringstream oss;
+ for (uint h = 0; h < hashLookups[m64].size(); h++) { // for each hashing
+ uint numHits;
+ const uint *lenHapInds =
+ hashLookups[m64][h].query(computeHash(curHaploBitsT, hashBits[m64][h]));
+ if (lenHapInds == NULL)
+ numHits = 0;
+ else
+ numHits = lenHapInds[0];
+ //uint best = 99;
+ for (uint k = 1; k <= numHits; k++) {
+ uint64 n2hap = lenHapInds[k];
+ if (!maskIndivs[n2hap/2]) continue;
+ if (n2hap/2 == n0) continue;
+ if (n2hap/2 == n1hap/2) continue; // opp haps of same indiv
+ uint numWrongBits = 0;
+ for (uint64 m64x = m64-side; m64x <= m64+side; m64x++) {
+ uint64 n1is1 = haploBitsT[n1hap*Mseg64 + m64x];
+ uint64 n2is1 = haploBitsT[n2hap*Mseg64 + m64x];
+ const uint64_masks &bits0 = genoBits[m64x*N + n0];
+ uint64 wrongBits = (bits0.is0 & (n1is1 | n2is1)) | (bits0.is2 & ~(n1is1 & n2is1))
+ | (~(bits0.is0|bits0.is2|bits0.is9) & ~(n1is1 ^ n2is1));
+ uint popcnt = popcount64(wrongBits);
+ numWrongBits += popcnt;
+ }
+ if (numWrongBits < minWrongBits) {
+ minWrongBits = numWrongBits;
+ bestHitPairs.clear();
+ bestHitPairs.push_back(make_pair((uint) n1hap, (uint) n2hap));
+ }
+ else if (numWrongBits == minWrongBits)
+ bestHitPairs.push_back(make_pair((uint) n1hap, (uint) n2hap));
+ /*
+ if (numWrongBits < best)
+ best = numWrongBits;
+ */
+ }
+ /*
+ if ((int) nF1 != -1) {
+ printf("%2d ", best);
+ char buf[10]; sprintf(buf, "%2d ", numHits);
+ oss << string(buf);
+ }
+ */
+ }
+ if (/*(int) nF1 != -1*/false) {
+ cout << ": " << oss.str();
+ checkHapPhase(n0, nF1, nF2, haploBitsT + /*wrongBitsHaps.begin()->second*/n1hap*Mseg64, m64, side); // n1hap is better!
+ }
+ }
+ if (bestHitPairs.empty()) { // keep current phasing
+ errBests[m64] = 99;
+ n12hapBests[m64] = make_pair((uint) n0*2, (uint) n0*2+1);
+ }
+ else {
+ errBests[m64] = minWrongBits;
+ n12hapBests[m64] = bestHitPairs[rand_r(&seed) % bestHitPairs.size()];
+ }
+ }
+
+ if ((int) nF1 != -1)
+ cout << endl << "2nd-iter phase:" << endl;
+ int sign = 1; vector <int> signs(Mseg64);
+ uint64 prevInd = side;
+ vector <bool> phaseVec;
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ uint64 curInd = min(max(m64, side), Mseg64-1-side); /*int minWrongBits = 99;
+ for (int diff = - (int) side; diff <= (int) side; diff++) {
+ uint64 m64d = m64+diff;
+ if (m64d >= Mseg64) continue;
+ uint64 n1is1 = haploBitsT[n12hapBests[m64d].first*Mseg64 + m64];
+ uint64 n2is1 = haploBitsT[n12hapBests[m64d].second*Mseg64 + m64];
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ uint64 wrongBits = (bits0.is0 & (n1is1 | n2is1)) | (bits0.is2 & ~(n1is1 & n2is1))
+ | (~(bits0.is0|bits0.is2|bits0.is9) & ~(n1is1 ^ n2is1));
+ int numWrongBits = popcount64(wrongBits);
+ if (numWrongBits < minWrongBits) {
+ minWrongBits = numWrongBits;
+ curInd = m64d;
+ }
+ }
+ */ // best of 3 neighbors doesn't help?
+ vector < pair <int, int> > offsetMults;
+ if (m64 > side && m64 < Mseg64-side) { // update sign
+ uint64 n1prev = n12hapBests[prevInd].first, n2prev = n12hapBests[prevInd].second;
+ uint64 n1cur = n12hapBests[curInd].first, n2cur = n12hapBests[curInd].second;
+ for (uint64 m64j = (m64-side)*64; m64j < (m64+side)*64; m64j++) {
+ uint h1prev = (haploBitsT[n1prev*Mseg64 + (m64j/64)]>>(m64j&63))&1;
+ uint h2prev = (haploBitsT[n2prev*Mseg64 + (m64j/64)]>>(m64j&63))&1;
+ uint h1cur = (haploBitsT[n1cur*Mseg64 + (m64j/64)]>>(m64j&63))&1;
+ uint h2cur = (haploBitsT[n2cur*Mseg64 + (m64j/64)]>>(m64j&63))&1;
+ if (h1prev + h2prev == 1 && h1cur + h2cur == 1) {
+ int offset = abs((int) (m64j - m64*64));
+ offsetMults.push_back(make_pair(offset, h1prev == h1cur ? 1 : -1));
+ }
+ }
+ }
+ if (!offsetMults.empty()) {
+ sort(offsetMults.begin(), offsetMults.end());
+ int totVotes = min((int) offsetMults.size(), 5), sameVotes = 0;
+ for (int k = 0; k < totVotes; k++)
+ if (offsetMults[k].second == 1)
+ sameVotes++;
+ if (sameVotes < (totVotes+1)/2) sign *= -1; // swap sign
+ }
+ signs[m64] = sign;
+ uint64 n1hap = n12hapBests[curInd].first, n2hap = n12hapBests[curInd].second;
+ computeSegPhaseConfs(n0, n1hap, n2hap, sign, m64, errBests[curInd]);
+ prevInd = curInd;
+ }
+
+ if ((int) nF1 != -1) {
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ uint64 curInd = min(max(m64, side), Mseg64-1-side);
+ char usedInd = ' ';
+ if (curInd > m64) usedInd = 'v';
+ else if (curInd < m64) usedInd = '^';
+ uint64 n1hap = n12hapBests[curInd].first, n2hap = n12hapBests[curInd].second;
+ int sign = signs[m64];
+ printf("m64 = %2d err: %2d %c (%6d,%6d) ",
+ (int) m64, (int) errBests[curInd], usedInd, (int) (sign==1 ? n1hap : n2hap),
+ (int) (sign==1 ? n2hap : n1hap));
+ //cout << closestOffset << endl;
+ vector <bool> phaseSeg = checkSegPhase(n0, nF1, nF2, n1hap, n2hap, sign, m64);
+ phaseVec.insert(phaseVec.end(), phaseSeg.begin(), phaseSeg.end());
+ }
+
+ printf("# major SE: %2d # tot SE: %2d / %d\n", countMajorSE(phaseVec), countSE(phaseVec),
+ (int) phaseVec.size()-1);
+ fflush(stdout);
+ }
+ /*
+ for (uint64 n1hap = 2*n0; n1hap <= 2*n0+1; n1hap++) {
+ cout << "n0 = " << n0 << "; n1hap = " << n1hap << ": ";
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++)
+ cout << (int) phaseConfs2[n1hap*Mseg64*64 + m64j] << " ";
+ cout << endl;
+ }
+ */
+ return hapTime;
+ }
+
+ inline uint64 pairToULL(pair <uint, uint> p) { return ((uint64) p.first<<32ULL)|p.second; }
+ inline pair <uint, uint> ullToPair(uint64 ull) {
+ return make_pair((uint) (ull>>32ULL), (uint) ull);
+ }
+
+ bool Eagle::updateHelper(std::unordered_map <uint64, DPState> &dpTab, uint &dpBestScore,
+ pair <uint, uint> cur, pair <uint, uint> next, uint score) const {
+#ifdef RDTSC_TIMING
+ uint64 tscStart = Timer::rdtsc();
+#endif
+ /* ---- SLOW ----
+ if (dpTab.find(pairToULL(next)) == dpTab.end() || dpTab[pairToULL(next)].score > score)
+ dpTab[pairToULL(next)] = DPState(score, cur);
+ */
+ if (score > dpBestScore + 2*switchCost) return false;
+ DPState &nextState = dpTab[pairToULL(next)];
+ if (nextState.score == 0 || nextState.score > score) {
+ nextState.score = score;
+ nextState.from = cur;
+ }
+ if (score < dpBestScore) dpBestScore = score;
+#ifdef RDTSC_TIMING
+ dpUpdateTicks += Timer::rdtsc() - tscStart;
+ dpUpdateCalls++;
+#endif
+ return true;
+ }
+
+ uint Eagle::computeStaticScore(uint n0, uint n1hap, uint n2hap, uint64 m64) const {
+#ifdef RDTSC_TIMING
+ uint64 tscStart = Timer::rdtsc();
+#endif
+ uint64 n1is1 = haploBitsT[n1hap*Mseg64 + m64];
+ uint64 n2is1 = haploBitsT[n2hap*Mseg64 + m64];
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ uint64 wrongHomBits = (bits0.is0 & (n1is1 | n2is1)) | (bits0.is2 & ~(n1is1 & n2is1));
+ uint64 wrongHetBits = (~(bits0.is0|bits0.is2|bits0.is9) & ~(n1is1 ^ n2is1));
+ uint score = popcount64(wrongHomBits)*homErrCost
+ + popcount64(wrongHetBits)*hetErrCost
+ + segConfs[n1hap*Mseg64+m64] / 5 + segConfs[n2hap*Mseg64+m64] / 5;
+#ifdef RDTSC_TIMING
+ dpStaticTicks += Timer::rdtsc() - tscStart;
+#endif
+ return score;
+ }
+
+ uint Eagle::computeSwitchScore(uint n0, uint n1hap, uint n2hapA, uint n2hapB, uint64 m64) const {
+#ifdef RDTSC_TIMING
+ uint64 tscStart = Timer::rdtsc();
+#endif
+ uint64 n1is1 = haploBitsT[n1hap*Mseg64 + m64];
+ uint64 n2is1A = haploBitsT[n2hapA*Mseg64 + m64];
+ uint64 n2is1B = haploBitsT[n2hapB*Mseg64 + m64];
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+
+ uint64 wrongHomBitsA = (bits0.is0 & (n1is1 | n2is1A)) | (bits0.is2 & ~(n1is1 & n2is1A));
+ uint64 wrongHetBitsA = (~(bits0.is0|bits0.is2|bits0.is9) & ~(n1is1 ^ n2is1A));
+ uint64 wrongHomBitsB = (bits0.is0 & (n1is1 | n2is1B)) | (bits0.is2 & ~(n1is1 & n2is1B));
+ uint64 wrongHetBitsB = (~(bits0.is0|bits0.is2|bits0.is9) & ~(n1is1 ^ n2is1B));
+ /* ---- SLOW ----
+ uint score = popcount64(wrongHomBitsB)*homErrCost
+ + popcount64(wrongHetBitsB)*hetErrCost;
+ uint minScore = score;
+ for (uint64 k = 0; k < 64; k++) {
+ score += ((wrongHomBitsA>>k)&1)*homErrCost + ((wrongHetBitsA>>k)&1)*hetErrCost
+ - ((wrongHomBitsB>>k)&1)*homErrCost - ((wrongHetBitsB>>k)&1)*hetErrCost;
+ if (score < minScore) minScore = score;
+ }
+ assert(score == popcount64(wrongHomBitsA)*homErrCost
+ + popcount64(wrongHetBitsA)*hetErrCost);
+ */
+ uint64 wrongBitsA = wrongHomBitsA | wrongHetBitsA;
+ uint64 wrongBitsB = wrongHomBitsB | wrongHetBitsB;
+ uint64 hetBits = ~(bits0.is0|bits0.is2|bits0.is9);
+ uint mask = (1U<<switchScoreLutBits)-1;
+ int curScore = popcount64(wrongHomBitsB)*homErrCost
+ + popcount64(wrongHetBitsB)*hetErrCost;
+ int bestScore = curScore;
+ for (uint64 b = 0; b < 64; b += switchScoreLutBits) {
+ uint lutInd = ((((uint) (wrongBitsA>>b))&mask)<<(switchScoreLutBits+switchScoreLutBits))
+ | ((((uint) (wrongBitsB>>b))&mask)<<switchScoreLutBits)
+ | ((((uint) (hetBits>>b))&mask));
+ bestScore = min(bestScore, curScore + switchScoreLut[lutInd][0]);
+ curScore += switchScoreLut[lutInd][1];
+ }
+ curScore = bestScore + segConfs[n1hap*Mseg64+m64] / 5 + segConfs[n2hapA*Mseg64+m64] / 5
+ + segConfs[n2hapB*Mseg64+m64] / 5;
+
+ // additional penalty for errors in (n1hap,n2hapB) at end of m64
+ curScore += ((wrongBitsB & 0xf000000000000000) != 0) + ((wrongBitsB & 0xff00000000000000) != 0)
+ + ((wrongBitsB & 0xffff000000000000) != 0);
+
+ if (m64+1 < Mseg64) { // additional penalty for errors in (n1hap,n2hapB) at start of m64+1
+ n1is1 = haploBitsT[n1hap*Mseg64 + m64+1];
+ n2is1B = haploBitsT[n2hapB*Mseg64 + m64+1];
+ const uint64_masks &bits0next = genoBits[(m64+1)*N + n0];
+ wrongBitsB = (bits0next.is0 & (n1is1 | n2is1B)) | (bits0next.is2 & ~(n1is1 & n2is1B))
+ | (~(bits0next.is0|bits0next.is2|bits0next.is9) & ~(n1is1 ^ n2is1B));
+ curScore += ((wrongBitsB & 0xf) != 0) + ((wrongBitsB & 0xff) != 0)
+ + ((wrongBitsB & 0xffff) != 0);
+ }
+
+#ifdef RDTSC_TIMING
+ dpSwitchTicks += Timer::rdtsc() - tscStart;
+#endif
+ return curScore;
+ }
+
+ void Eagle::updateTable(std::unordered_map <uint64, DPState> dpTable[], uint dpBestScores[],
+ uint64 m64, uint64 dist, uint n0, uint n1hapA, uint n2hapA, uint n1hapB,
+ uint n2hapB, uint score) const {
+ if (n1hapB/2 == n2hapB/2) return; // disallow copying both haps from an indiv
+ if ((n1hapA == n1hapB && n2hapA == n2hapB) || (n1hapA != n1hapB && n2hapA != n2hapB))
+ score += computeStaticScore(n0, n1hapB, n2hapB, m64);
+ else {
+ if (n1hapA == n1hapB)
+ score += computeSwitchScore(n0, n1hapA, n2hapA, n2hapB, m64);
+ else /* (n2hapA == n2hapB) */
+ score += computeSwitchScore(n0, n2hapA, n1hapA, n1hapB, m64);
+ }
+ if (!updateHelper(dpTable[m64], dpBestScores[m64], make_pair(n1hapA, n2hapA),
+ make_pair(n1hapB, n2hapB), score))
+ return;
+ for (uint64 m64x = m64+1; m64x < m64+dist && m64x < Mseg64; m64x++) {
+ score += computeStaticScore(n0, n1hapB, n2hapB, m64x);
+ if (!updateHelper(dpTable[m64x], dpBestScores[m64x], make_pair(n1hapB, n2hapB),
+ make_pair(n1hapB, n2hapB), score))
+ return;
+ }
+ }
+
+ void updateErrHits(vector <uint> &hitVec, uint64 &bestErrLoc, uint64 errLoc, uint n2hap) {
+ if (errLoc > bestErrLoc) {
+ bestErrLoc = errLoc;
+ hitVec.resize(1, n2hap);
+ }
+ else if (errLoc == bestErrLoc)
+ hitVec.push_back(n2hap);
+ }
+
+ void Eagle::safeInsert(std::set <uint> &refHapSet, uint n1hap, uint n0) const {
+ if (!maskIndivs[n1hap/2]) return;
+ if (n1hap/2 == n0) return;
+ refHapSet.insert(n1hap);
+ }
+
+ vector < pair <uint64, uint64> > Eagle::findGoodSegs(uint64 n0, uint64 nF1, uint64 nF2, uint64 n1hap) const {
+ vector < pair <uint64, uint64> > goodSegs;
+ uint64 firstGood = 0;
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ uint64_masks bits0 = genoBits[m64*N + n0];
+ uint64_masks bitsF1 = genoBits[m64*N + nF1];
+ uint64_masks bitsF2 = genoBits[m64*N + nF2];
+ uint64 is0 = bits0.is0 | (~(bits0.is0|bits0.is2|bits0.is9) &
+ ((bitsF1.is0 & ~bitsF2.is0) | (~bitsF1.is2 & bitsF2.is2)));
+ uint64 is2 = bits0.is2 | (~(bits0.is0|bits0.is2|bits0.is9) &
+ ((bitsF1.is2 & ~bitsF2.is2) | (~bitsF1.is0 & bitsF2.is0)));
+ uint64 is1 = haploBits[m64*2*N + n1hap];
+ uint64 wrongBits = (is0 & is1) | (is2 & ~is1);
+ if (wrongBits) {
+ uint64 firstWrong = m64*64 + popcount64((wrongBits & ~(wrongBits-1))-1);
+ if (firstWrong - firstGood >= 64)
+ goodSegs.push_back(make_pair(firstGood, firstWrong));
+ firstGood = m64*64 + (wrongBits ? 64 - __builtin_clzll(wrongBits) : 0);
+ }
+ }
+ goodSegs.push_back(make_pair(firstGood, Mseg64*64));
+ return goodSegs;
+ }
+
+ void Eagle::updateFarHaps(vector < pair <uint, uint> > &farHaps, uint n1hap, uint m64jStart, uint m64jEnd) const {
+ const double cMminLen = 1.0;
+ if (cMs64j[m64jEnd] - cMs64j[m64jStart] < cMminLen) return;
+ //cout << m64jStart/64 << "." << (m64jStart&63) << "-" << m64jEnd/64 << "." << (m64jEnd&63) << " ";
+ if (m64jEnd > farHaps[m64jStart].first) {
+ farHaps[m64jStart].first = m64jEnd;
+ farHaps[m64jStart].second = n1hap;
+ }
+ }
+
+ double Eagle::runHMM(uint64 n0, uint64 nF1, uint64 nF2, int iter, uint beamWidth,
+ uint maxHapStates) {
+
+ if (!maskIndivs[n0]) return 0;
+
+ if (Mseg64 < 3U) {
+ cerr << "Too few SNP segments for analysis" << endl;
+ exit(1);
+ }
+
+ uint seed = n0; // for rand_r()
+
+ /***** FIND LONGEST DIP-HAP MATCHES *****/
+
+#ifdef RDTSC_TIMING
+ uint64 tscStart = Timer::rdtsc();
+#endif
+ vector <uint> topInds[2]; // [max err]; lengths are Mseg64 * K
+ vector <uint> topIndsLens[2]; // [max err]; lengths are Mseg64
+
+ const uint K = 100;
+ Timer timer;
+ findLongDipHap(n0, topInds, topIndsLens, K, iter >= 4 ? 0 : 1);
+ double hapTime = timer.update_time();
+#ifdef RDTSC_TIMING
+ diphapTicks += Timer::rdtsc() - tscStart;
+#endif
+ /*
+ // VISUALIZE RESULTS
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ cout << "m64 = " << m64 << endl;
+ int m64xMin = std::max(0, (int) m64-5);
+ int m64xMax = std::min(m64+11, Mseg64);
+ for (int m64x = m64xMin; m64x < m64xMax; m64x++) {
+ int numSnps = 0;
+ for (int j = 0; j < 64; j++)
+ numSnps += maskSnps64j[m64x*64+j];
+ printf("%2d ", numSnps);
+ }
+ cout << endl;
+ for (uint e = 0; e < 2; e++) {
+ cout << "e = " << e << endl;
+ for (uint k = 0; k < topIndsLens[e][m64]; k++) {
+ uint64 n1hap = topInds[e][m64*K + k];
+ for (int m64x = m64xMin; m64x < m64xMax; m64x++) {
+ printf("%2d ", numDipHapWrongBits(m64x, n0, n1hap));
+ }
+ cout << " (" << n1hap << ")" << endl;
+ }
+ }
+ }
+ */
+
+ /***** EXTEND DIP-HAP MATCHES TO OBTAIN LONGEST MATCHES COVERING EACH SEG *****/
+
+#ifdef RDTSC_TIMING
+ uint64 tscExtStart = Timer::rdtsc();
+#endif
+ const int maxWrongBits = 1;
+ std::set <DipHapSeg> longDipHapSegsForward[Mseg64];
+ const uint maxSegs = std::max(50U, maxHapStates/4);
+ vector <uint> lastEnd64(2*N);
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+#ifdef DETAILS
+ if ((int) nF1 != -1) cout << "m64 = " << m64 << endl;
+#endif
+ for (uint e = 0; e < 2; e++) {
+#ifdef DETAILS
+ if ((int) nF1 != -1) cout << "e = " << e << endl;
+#endif
+ for (uint k = 0; k < topIndsLens[e][m64]; k++) {
+ uint64 n1hap = topInds[e][m64*K + k];
+ if (!maskIndivs[n1hap/2]) continue;
+ if (n1hap/2 == n0) continue;
+ if (lastEnd64[n1hap] > m64) continue;
+
+ // find start
+ uint segStart = m64;
+ while ((int) segStart >= 0 && numDipHapWrongBits(segStart, n0, n1hap) <= maxWrongBits)
+ segStart--;
+ if ((int) segStart < 0) segStart = 0; //segStart++; NEW: start one chunk before!
+ uint segStart64j = segStart*64 + firstDipHapGoodBit(segStart, n0, n1hap);
+
+ // find end
+ uint segEnd = m64+1;
+ while (segEnd < Mseg64 && numDipHapWrongBits(segEnd, n0, n1hap) <= maxWrongBits)
+ segEnd++;
+ uint segEnd64j = segEnd*64 + firstDipHapWrongBit(segEnd, n0, n1hap);
+
+#ifdef DETAILS
+ checkHapPhase1(n0, nF1, nF2, n1hap, segStart, segEnd);
+#endif
+ lastEnd64[n1hap] = segEnd;
+ for (uint64 m64x = segStart; m64x+1 /* extend forward */ < segEnd; m64x++) {
+ longDipHapSegsForward[m64x].insert(DipHapSeg(n1hap, segStart64j, segEnd64j));
+ if (longDipHapSegsForward[m64x].size() > maxSegs)
+ longDipHapSegsForward[m64x].erase(--longDipHapSegsForward[m64x].end());
+ }
+ }
+ }
+ }
+#ifdef RDTSC_TIMING
+ extTicks += Timer::rdtsc() - tscExtStart;
+#endif
+
+ /***** COMPILE SET OF REFERENCE HAPLOTYPES FOR EACH TRANSITIONS AT EACH SEG *****/
+
+ std::set <uint> refHapSets[Mseg64];
+ vector <uint> refHapVecs[Mseg64];
+ vector < pair <uint, uint> > refHapOppPairs[Mseg64];
+ const uint numTopShort = maxHapStates/4, numTopLong = maxHapStates/4,
+ minRefHaps = maxHapStates/4;
+ const uint64 side = 1;
+ uint64 curHaploBitsT[Mseg64];
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ uint cumTop = 0;
+ // dip-hap hits starting at m64 and m64+1
+ for (uint64 m64x = m64; m64x < m64+2 && m64x < Mseg64; m64x++)
+ for (uint e = 1; e != -1U; e--) {
+ cumTop += numTopShort / 4;
+ for (uint k = 0; k < topIndsLens[e][m64] && refHapSets[m64].size() < cumTop; k++) {
+ uint n1hap = topInds[e][m64*K + k];
+ if (segConfs[n1hap*Mseg64+m64] < 5 &&
+ (m64+1==Mseg64 || segConfs[n1hap*Mseg64+m64+1] < 5))
+ safeInsert(refHapSets[m64], n1hap, n0);
+ }
+ }
+ // long dip hap covering m64, m64+1
+ cumTop += numTopLong;
+ for (std::set <DipHapSeg>::iterator it = longDipHapSegsForward[m64].begin();
+ it != longDipHapSegsForward[m64].end() && refHapSets[m64].size() < cumTop; it++) {
+ uint n1hap = it->n;
+ if (segConfs[n1hap*Mseg64+m64] < 5 &&
+ (m64+1==Mseg64 || segConfs[n1hap*Mseg64+m64+1] < 5))
+ safeInsert(refHapSets[m64], n1hap, n0);
+ }
+#ifdef DETAILS
+ cout << "m64 = " << m64 << ": " << refHapSets[m64].size() << endl;
+#endif
+ /***** AUGMENT REFERENCE HAPLOTYPE SET WITH COMPLEMENTS *****/
+ vector <uint> refHapVec(refHapSets[m64].begin(), refHapSets[m64].end());
+ if (m64+1 >= side && m64+1+side < Mseg64) { // look up m64+1 in LSH
+ for (uint k1 = 0; k1 < refHapVec.size(); k1++) {
+ uint64 n1hap = refHapVec[k1];
+ // require no err on right part of m64
+ uint64 m64errMaskMax = 1ULL<<((uint64) rand_r(&seed)&63);
+
+ for (uint64 m64x = m64+1-side; m64x <= m64+1+side; m64x++) {
+ curHaploBitsT[m64x] = 0;
+ for (uint64 j = 0; j < 64; j++) {
+ uint64 m64j = m64x*64+j, bit = 0;
+ if (maskSnps64j[m64j]) {
+ uint g0 = getGeno0123(m64j, n0);
+ if (g0 == 0 || g0 == 3) bit = 0;
+ else if (g0 == 2) bit = 1;
+ else bit = 1-((haploBitsT[n1hap*Mseg64 + m64x]>>j)&1);
+ }
+ curHaploBitsT[m64x] |= bit<<j;
+ }
+ }
+ vector <uint> hits[3]; // n2haps with 1st err at [1]: m64+1; [2]: m64+2 (allow 1 at m64+1)
+ vector <uint64> bestErrLocs(3); // farthest error locations seen
+ for (uint h = 0; h < hashLookups[m64+1].size(); h++) { // for each hashing
+ uint numHits;
+#ifdef RDTSC_TIMING
+ uint64 tscLshStart = Timer::rdtsc();
+#endif
+ const uint *lenHapInds =
+ hashLookups[m64+1][h].query(computeHash(curHaploBitsT, hashBits[m64+1][h]));
+#ifdef RDTSC_TIMING
+ lshTicks += Timer::rdtsc() - tscLshStart;
+ uint64 tscLshCheckStart = Timer::rdtsc();
+#endif
+ if (lenHapInds == NULL)
+ numHits = 0;
+ else
+ numHits = lenHapInds[0];
+ for (uint k = 1; k <= numHits; k++) {
+ uint64 n2hap = lenHapInds[k];
+ if (!maskIndivs[n2hap/2]) continue;
+ if (n2hap/2 == n0) continue;
+ if (n2hap/2 == n1hap/2) continue; // opp haps of same indiv
+ int errFail = 0, err1 = 0; // err1: 1 err in m64+1
+ uint64 err1Loc = 0, err2Loc = 0;
+ for (uint64 m64x = m64; m64x <= m64+2; m64x++) {
+ uint64 n1is1 = haploBitsT[n1hap*Mseg64 + m64x];
+ uint64 n2is1 = haploBitsT[n2hap*Mseg64 + m64x];
+ const uint64_masks &bits0 = genoBits[m64x*N + n0];
+ uint64 wrongBits = (bits0.is0 & (n1is1 | n2is1)) | (bits0.is2 & ~(n1is1 & n2is1))
+ | (~(bits0.is0|bits0.is2|bits0.is9) & ~(n1is1 ^ n2is1));
+ if (m64x == m64) {
+ if (wrongBits >= m64errMaskMax) { // not perfect in right part of m64
+ errFail = 2; // fail
+ break;
+ }
+ }
+ else if (m64x == m64+1) {
+ if (wrongBits & (wrongBits-1)) { // 2+ err
+ errFail = 1; // fail
+ err1Loc = (wrongBits & ~(wrongBits-1))-1; // LSB-1: big is good
+ break;
+ }
+ else if (wrongBits != 0)
+ err1 = 1;
+ }
+ else {
+ err2Loc = (wrongBits & ~(wrongBits-1))-1; // LSB-1: big is good
+ }
+ }
+ if (!errFail) { // did not fail
+ if (err2Loc != 0) err2Loc -= err1; // slightly worse to have 1 error in m64+1
+ updateErrHits(hits[2], bestErrLocs[2], err2Loc, n2hap);
+ }
+ else if (errFail == 1)
+ updateErrHits(hits[1], bestErrLocs[1], err1Loc, n2hap);
+ }
+#ifdef RDTSC_TIMING
+ lshCheckTicks += Timer::rdtsc() - tscLshCheckStart;
+#endif
+ if (bestErrLocs[2] == -1ULL) break; // early exit if perfect match found
+ }
+
+ //checkHapPhase1(n0, nF1, nF2, refHapVec[k1], m64, min(m64+3, Mseg64));
+ uint64 n2hap = -1ULL;
+ for (int xLoc = 2; xLoc >= 0; xLoc--)
+ if (!hits[xLoc].empty()) {
+ n2hap = hits[xLoc][rand_r(&seed) % hits[xLoc].size()];
+ //cout << "2." << popcount64(bestErrLocs[2]) << ": ";
+ break;
+ }
+ if (n2hap != -1ULL) {
+#ifdef DETAILS
+ checkHapPhase1(n0, nF1, nF2, n2hap, m64, min(m64+3, Mseg64));
+#endif
+ safeInsert(refHapSets[m64], n2hap, n0);
+ refHapOppPairs[m64].push_back(make_pair((uint) n1hap, (uint) n2hap));
+ }
+ }
+ }
+
+ // make sure at least some ref haps are chosen: relax earlier conf filter
+ for (std::set <DipHapSeg>::iterator it = longDipHapSegsForward[m64].begin();
+ it != longDipHapSegsForward[m64].end() && refHapSets[m64].size() < minRefHaps/2; it++)
+ safeInsert(refHapSets[m64], it->n, n0);
+ for (uint64 m64x = m64; m64x < m64+2 && m64x < Mseg64; m64x++)
+ for (uint e = 0; e < 2; e++)
+ for (uint k = 0; k < topIndsLens[e][m64] && refHapSets[m64].size() < minRefHaps; k++)
+ safeInsert(refHapSets[m64], topInds[e][m64*K + k], n0);
+
+ refHapVecs[m64] = vector <uint> (refHapSets[m64].begin(), refHapSets[m64].end());
+ //cout << "m64 = " << m64 << ": " << refHapSets[m64].size() << endl;
+ }
+
+#ifdef DETAILS
+ // TEMPORARY: CHECKING TRIO HAP - REF HAP MATCHES
+ std::set <DipHapSeg> longTrioHapHapSegsForward[2][Mseg64];
+ uint longestTrioHaps[2][Mseg64]; memset(&longestTrioHaps[0][0], 0, 2*Mseg64*sizeof(longestTrioHaps[0][0]));
+ for (int p = 0; p < 2; p++) {
+ cout << "Checking trio hap - ref hap matches" << endl;
+
+ for (uint64 n1hap = 0; n1hap < 2*N; n1hap++) {
+ if (n1hap % 10000 == 0) cout << "at " << n1hap << endl;
+ if (!maskIndivs[n1hap/2]) continue;
+ if (n1hap/2 == n0) continue;
+ vector < pair <uint64, uint64> > goodSegs
+ = findGoodSegs(n0, p==0?nF1:nF2, p==0?nF2:nF1, n1hap);
+ for (uint s = 0; s < goodSegs.size(); s++)
+ for (uint64 m64x = (goodSegs[s].first + 32) / 64; m64x < (goodSegs[s].second + 32) / 64;
+ m64x++) {
+ if (goodSegs[s].second - goodSegs[s].first > longestTrioHaps[p][m64x]) {
+ longTrioHapHapSegsForward[p][m64x].insert(DipHapSeg(n1hap, goodSegs[s].first,
+ goodSegs[s].second));
+ if (longTrioHapHapSegsForward[p][m64x].size() > 10) {
+ longTrioHapHapSegsForward[p][m64x].erase(--longTrioHapHapSegsForward[p][m64x].end());
+ longestTrioHaps[p][m64x] = ((--longTrioHapHapSegsForward[p][m64x].end())->end)
+ - ((--longTrioHapHapSegsForward[p][m64x].end())->start);
+ }
+ }
+ }
+ }
+ }
+#endif
+ /***** RUN HMM (BEAM SEARCH) *****/
+
+ std::unordered_map <uint64, DPState> dpTable[Mseg64];
+ uint dpBestScores[Mseg64]; for (uint64 m64 = 0; m64 < Mseg64; m64++) dpBestScores[m64] = 1<<30;
+ pair <uint, uint> finalHapPairs[Mseg64];
+ vector <int> bestPathScores(Mseg64);
+
+#ifdef RDTSC_TIMING
+ uint64 tscDpStart = Timer::rdtsc();
+#endif
+ for (uint64 m64 = 0; m64 <= Mseg64; m64++) {
+ // find best from prev
+#ifdef RDTSC_TIMING
+ uint64 tscDpSortStart = Timer::rdtsc();
+#endif
+ vector <DPState> curStates;
+ if (m64 == 0) curStates.push_back(DPState(0, make_pair(-1U, -1U)));
+ if (m64 > 0) {
+ uint maxDiffScore = 2*switchCost;
+ uint bestScore = dpBestScores[m64-1];
+
+ // prune best states to beamWidth (via bucket sort)
+ vector < vector <uint64> > curStateBuckets(maxDiffScore+1);
+ for (std::unordered_map <uint64, DPState>::iterator it = dpTable[m64-1].begin();
+ it != dpTable[m64-1].end(); it++)
+ if (it->second.score <= bestScore + maxDiffScore)
+ curStateBuckets[it->second.score - bestScore].push_back(it->first);
+ for (uint dScore = 0; dScore <= maxDiffScore && curStates.size() < beamWidth; dScore++) {
+ //if (curStates.size() + curStateBuckets[dScore].size() > beamWidth)
+ //sort(curStateBuckets[dScore].begin(), curStateBuckets[dScore].end());
+ for (uint k = 0; k < curStateBuckets[dScore].size() && curStates.size() < beamWidth; k++)
+ curStates.push_back(DPState(bestScore+dScore, ullToPair(curStateBuckets[dScore][k])));
+ }
+ /* ---- SLOW ----
+ for (std::unordered_map <uint64, DPState>::iterator it = dpTable[m64-1].begin();
+ it != dpTable[m64-1].end(); it++)
+ if (it->second.score <= bestScore + maxDiffScore)
+ curStates.push_back(DPState(it->second.score, ullToPair(it->first))); // .from = prev
+ sort(curStates.begin(), curStates.end());
+ */
+ }
+#ifdef RDTSC_TIMING
+ dpSortTicks += Timer::rdtsc() - tscDpSortStart;
+#endif
+ if (m64 < Mseg64) {
+ const vector <uint> &refHapVec = refHapVecs[m64]; uint Kref = refHapVec.size();
+
+ // iterate through best in beam
+ if (m64 > 0) {
+ for (uint s = 0; s < curStates.size() && s < beamWidth; s++) {
+ uint n1hap = curStates[s].from.first, n2hap = curStates[s].from.second;
+ uint score = curStates[s].score;
+ // continue
+ updateTable(dpTable, dpBestScores, m64, 1, n0, n1hap, n2hap, n1hap, n2hap, score);
+ }
+ for (uint s = 0; s < curStates.size() && s < beamWidth; s++) {
+ uint n1hap = curStates[s].from.first, n2hap = curStates[s].from.second;
+ uint score = curStates[s].score;
+ // switch n1hap or n2hap
+ for (uint k = 0; k < Kref; k++) {
+ updateTable(dpTable, dpBestScores, m64, 3, n0, n1hap, n2hap, refHapVec[k], n2hap,
+ score+switchCost);
+ updateTable(dpTable, dpBestScores, m64, 3, n0, n1hap, n2hap, n1hap, refHapVec[k],
+ score+switchCost);
+ }
+ }
+ }
+
+ // clean start (from curStates[0])
+ if (m64 == 0) {
+ for (uint k1 = 0; k1 < Kref; k1++)
+ for (uint k2 = k1+1; k2 < Kref; k2++)
+ updateTable(dpTable, dpBestScores, m64, 3, n0, curStates[0].from.first,
+ curStates[0].from.second, refHapVec[k1], refHapVec[k2],
+ curStates[0].score+2*switchCost);
+ }
+ else { // only try clean start from opp pairs
+ for (uint k12 = 0; k12 < refHapOppPairs[m64].size(); k12++)
+ updateTable(dpTable, dpBestScores, m64, 3, n0, curStates[0].from.first,
+ curStates[0].from.second, refHapOppPairs[m64][k12].first,
+ refHapOppPairs[m64][k12].second, curStates[0].score+3*switchCost);
+ }
+ }
+ else { // finished; backtrack through DP table
+ finalHapPairs[Mseg64-1] = curStates[0].from;
+ bestPathScores[Mseg64-1] = curStates[0].score;
+ for (int m64x = Mseg64-2; m64x >= 0; m64x--) {
+ finalHapPairs[m64x] = dpTable[m64x+1][pairToULL(finalHapPairs[m64x+1])].from;
+ //cout << "m64 = " << m64x << ": n1hap = " << finalHapPairs[m64x].first << ", n2hap = " << finalHapPairs[m64x].second << " score = " << dpTable[m64x+1][pairToULL(finalHapPairs[m64x+1])].score << endl;
+ bestPathScores[m64x] = dpTable[m64x+1][pairToULL(finalHapPairs[m64x+1])].score;
+ }
+ }
+ }
+#ifdef RDTSC_TIMING
+ dpTicks += Timer::rdtsc() - tscDpStart;
+#endif
+
+ /***** FIND TRANSITIONS WITHIN 64-SNP CHUNKS *****/
+
+ uint64 hmmHaploBitsT[2][Mseg64], fixHaploBitsT[2][Mseg64], hetErrMasks[Mseg64],
+ uncertainMasks[Mseg64];
+ vector <bool> phaseVec;
+ vector <int> n1haps, n2haps, n3haps, signs;
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ uint64 n1hap, n2hap, n3hap; int sign;
+ if (m64 == 0) {
+ n1hap = finalHapPairs[m64].first;
+ n2hap = n3hap = finalHapPairs[m64].second;
+ sign = 1;
+ }
+ else if (finalHapPairs[m64].first == finalHapPairs[m64-1].first) {
+ n1hap = finalHapPairs[m64].first;
+ n2hap = finalHapPairs[m64-1].second;
+ n3hap = finalHapPairs[m64].second;
+ sign = 1;
+ }
+ else if (finalHapPairs[m64].second == finalHapPairs[m64-1].second) {
+ n1hap = finalHapPairs[m64].second;
+ n2hap = finalHapPairs[m64-1].first;
+ n3hap = finalHapPairs[m64].first;
+ sign = -1;
+ }
+ else { // restart
+ n1hap = finalHapPairs[m64].first;
+ n2hap = n3hap = finalHapPairs[m64].second;
+ sign = 1;
+ }
+ n1haps.push_back(n1hap); n2haps.push_back(n2hap); n3haps.push_back(n3hap); signs.push_back(sign);
+ pair <uint64, uint64> phaseBits
+ = phaseSegHMM(n0, n1hap, n2hap, n3hap, m64, hetErrMasks[m64]);
+ hmmHaploBitsT[0][m64] = phaseBits.first;
+ hmmHaploBitsT[1][m64] = phaseBits.second;
+ if (sign == -1) std::swap(hmmHaploBitsT[0][m64], hmmHaploBitsT[1][m64]);
+ //checkHapPhase(n0, nF1, nF2, hmmHaploBitsT[0], m64, 0);
+
+ //vector <bool> phaseSeg = checkHapPhase2(n0, nF1, nF2, n1hap, n2hap, n3hap, m64, sign);
+ //cout << endl;
+ //phaseVec.insert(phaseVec.end(), phaseSeg.begin(), phaseSeg.end());
+ }
+
+ /***** DETECT AND FIX BLIPS BASED ON HAPLOTYPE FREQUENCIES *****/
+
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+#ifdef RDTSC_TIMING
+ uint64 tscBlipFixStart = Timer::rdtsc();
+#endif
+ vector < vector <int> > votes(64, vector <int> (4));
+ uint64 m64mid = min(max(m64, side), Mseg64-1-side);
+
+ std::unordered_set <uint> seen;
+ //cout << "m64 = " << m64 << endl;
+ for (int opp = 0; opp < 2; opp++)
+ for (uint h = 0; h < min((uint) hashLookups[m64mid].size(), 10U); h++) { // for each hash
+ uint numHits;
+#ifdef RDTSC_TIMING
+ uint64 tscBlipLshStart = Timer::rdtsc();
+#endif
+ const uint *lenHapInds =
+ hashLookups[m64mid][h].query(computeHash(hmmHaploBitsT[opp], hashBits[m64mid][h]));
+#ifdef RDTSC_TIMING
+ blipLshTicks += Timer::rdtsc() - tscBlipLshStart;
+#endif
+ if (lenHapInds == NULL)
+ numHits = 0;
+ else
+ numHits = lenHapInds[0];
+ for (uint k = 1; k <= min(numHits, 25U); k++) {
+ uint64 nXhap = lenHapInds[k];
+ if (!maskIndivs[nXhap/2]) continue;
+ if (nXhap/2 == n0) continue;
+ if (seen.count(nXhap)) continue;
+ seen.insert(nXhap);
+
+#ifdef RDTSC_TIMING
+ uint64 tscBlipPopStart = Timer::rdtsc();
+#endif
+ int errs = 0;
+ for (uint64 m64x = m64mid-side; m64x <= m64mid+side; m64x++) {
+ const uint64_masks &bits0 = genoBits[m64x*N + n0];
+ errs += popcount64((hmmHaploBitsT[opp][m64x] ^ haploBitsT[nXhap*Mseg64+m64x])
+ & ~bits0.is9);
+ }
+#ifdef RDTSC_TIMING
+ blipPopTicks += Timer::rdtsc() - tscBlipPopStart;
+ uint64 tscBlipVoteStart = Timer::rdtsc();
+#endif
+ if (errs <= 2) {
+ //checkHapPhase(n0, nF1, nF2, haploBitsT + nXhap*Mseg64, m64mid, side);
+ for (uint64 j = 0; j < 64; j++)
+ votes[j][(((haploBitsT[nXhap*Mseg64+m64]>>j)&1)^opp) + 2*opp]++;
+ }
+#ifdef RDTSC_TIMING
+ blipVoteTicks += Timer::rdtsc() - tscBlipVoteStart;
+#endif
+ }
+ }
+
+ fixHaploBitsT[0][m64] = hmmHaploBitsT[0][m64]; fixHaploBitsT[1][m64] = hmmHaploBitsT[1][m64];
+ const uint64_masks &bits0 = genoBits[m64*N + n0];
+ for (uint64 j = 0; j < 64; j++) {
+ if ((bits0.is0|bits0.is2|bits0.is9)&(1ULL<<j)) continue; // not het
+ double relVoteDiff = (hetErrMasks[m64]&(1ULL<<j)) ? 2 : 10; // weak if uncertain phase call
+ double eps = 0.5; // pseudocount
+ double ratioOR = (votes[j][0]+eps)*(votes[j][2]+eps)/(votes[j][1]+eps)/(votes[j][3]+eps);
+ if (ratioOR > relVoteDiff) {
+ fixHaploBitsT[0][m64] &= ~(1ULL<<j);
+ fixHaploBitsT[1][m64] |= 1ULL<<j;
+ }
+ if (ratioOR < 1.0/relVoteDiff) {
+ fixHaploBitsT[0][m64] |= 1ULL<<j;
+ fixHaploBitsT[1][m64] &= ~(1ULL<<j);
+ }
+ }
+#ifdef RDTSC_TIMING
+ blipFixTicks += Timer::rdtsc() - tscBlipFixStart;
+#endif
+ /*
+ if ((int) nF1 != -1) { // output blip fix + vote info
+ if (fixHaploBitsT[0][m64] != hmmHaploBitsT[0][m64])
+ cout << "*** ";
+ cout << "m64 = " << m64 << ": ";
+ checkHaploBits(n0, nF1, nF2, hmmHaploBitsT[0][m64], m64);
+ if (fixHaploBitsT[0][m64] != hmmHaploBitsT[0][m64]) {
+ cout << " -> ";
+ checkHapPhase(n0, nF1, nF2, fixHaploBitsT[0], m64, 0);
+ }
+ cout << " ";
+ checkHapPhase(n0, nF1, nF2, hmmHaploBitsT[0], m64, 0, votes);
+ }
+ */
+ }
+ // set phaseConfs2: 0|255 unless hetErr or flip or miss; 1|254 at those sites
+ uchar *phaseConfsHap0, *phaseConfsHap1;
+ if (Nref) { // allocate temp arrays for phaseConfs2 haplotype confidences
+ phaseConfsHap0 = ALIGNED_MALLOC_UCHARS(Mseg64*64);
+ phaseConfsHap1 = ALIGNED_MALLOC_UCHARS(Mseg64*64);
+ }
+ else {
+ phaseConfsHap0 = phaseConfs2 + 2*n0*Mseg64*64;
+ phaseConfsHap1 = phaseConfs2 + (2*n0+1)*Mseg64*64;
+ }
+ uchar hapConfs[2][2];
+ hapConfs[0][0] = 0; hapConfs[0][1] = 255; // [0][*]: not uncertain
+ hapConfs[1][0] = 1; hapConfs[1][1] = 254; // [1][*]: uncertain
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ uncertainMasks[m64] = hetErrMasks[m64] | (fixHaploBitsT[0][m64] ^ hmmHaploBitsT[0][m64])
+ | genoBits[m64*N + n0].is9;
+ for (uint64 j = 0; j < 64; j++) {
+ uint64 m64j = m64*64 + j;
+ bool uncertain = (uncertainMasks[m64]>>j)&1;
+ phaseConfsHap0[m64j] = hapConfs[uncertain][(fixHaploBitsT[0][m64]>>j)&1];
+ phaseConfsHap1[m64j] = hapConfs[uncertain][(fixHaploBitsT[1][m64]>>j)&1];
+ }
+ }
+
+#ifdef CHECK_TRUE_DIP_HAP
+ /***** CHECK TRUE BEST DIP-HAP: ARE THEY RESPECTED BY FINAL PHASE? *****/
+ cout << "CHECK TRUE BEST DIP-HAP: ARE THEY RESPECTED BY FINAL PHASE?" << endl;
+ std::set <DipHapSeg> segsOutput;
+ for (uint64 m64 = 0; m64 < Mseg64; m64++)
+ for (int p = 0; p < 2; p++) {
+ //cout << "m64 = " << m64 << " true longest cover, parent " << p+1 << endl;
+ for (std::set <DipHapSeg>::iterator it = longTrioHapHapSegsForward[p][m64].begin();
+ it != longTrioHapHapSegsForward[p][m64].end(); it++) {
+ if (!segsOutput.count(*it)) {
+ segsOutput.insert(*it);
+ /*
+ cout << "n1hap = " << it->n << "; m64 = [" << it->start/64 << "." << (it->start&63) << "," << it->end/64 << "." << (it->end&63) << "): ";
+ cout << ((std::find(refHapVecs[m64].begin(), refHapVecs[m64].end(), it->n)
+ != refHapVecs[m64].end()) ? "YES" : "\033[1;33mNO\033[0m") << endl;//" ";
+ */
+ //checkHapPhase1j(n0, nF1, nF2, it->n, it->start, it->end); cout << endl;
+ vector <bool> ret = checkHapPhase1jCall(n0, nF1, nF2, fixHaploBitsT[p], it->start, it->end, false);
+ if (find(ret.begin(), ret.end(), 0) != ret.end() &&
+ find(ret.begin(), ret.end(), 1) != ret.end())
+ checkHapPhase1jCall(n0, nF1, nF2, fixHaploBitsT[p], it->start, it->end, true);
+ }
+ break;
+ }
+ }
+#endif
+
+ /***** POST-PROCESS: FIND LONG DIP-HAP MATCHES ALMOST CONSISTENT WITH PHASING *****/
+ vector < pair <uint, uint> > farHaps(Mseg64*64);
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ //cout << "m64 = " << m64 << ":" << endl;
+ for (uint e = 0; e < 2; e++) {
+ //cout << "e = " << e << ":" << endl;
+ for (uint k = 0; k < topIndsLens[e][m64] && k < 10U /*TODO*/; k++) {
+ uint64 n1hap = topInds[e][m64*K + k];
+ // TODO: save this info from earlier
+ if (!maskIndivs[n1hap/2]) continue;
+ if (n1hap/2 == n0) continue;
+ // find start
+ uint segStart = m64;
+ while ((int) segStart >= 0 && numDipHapWrongBits(segStart, n0, n1hap) <= maxWrongBits)
+ segStart--;
+ if ((int) segStart < 0) segStart = 0;
+ uint segStart64j = segStart*64 + firstDipHapGoodBit(segStart, n0, n1hap);
+ // find end
+ uint segEnd = m64+1;
+ while (segEnd < Mseg64 && numDipHapWrongBits(segEnd, n0, n1hap) <= maxWrongBits)
+ segEnd++;
+ uint segEnd64j = segEnd*64 + firstDipHapWrongBit(segEnd, n0, n1hap);
+
+ vector <uint> err64j(1, segStart64j);
+ bool foundHet = false, prevPhase = false;
+ for (uint64 m64j = segStart64j; m64j < segEnd64j; m64j++)
+ if (maskSnps64j[m64j]) {
+ uint64 m64 = m64j/64, j = m64j&63;
+ uint g0 = getGeno0123(m64j, n0); // TODO: speed up
+ bool relPhase = ((haploBitsT[n1hap*Mseg64+m64] ^ fixHaploBitsT[0][m64])>>j)&1;
+ if (g0 == 1) {
+ if (!foundHet)
+ foundHet = true;
+ else if (relPhase != prevPhase)
+ err64j.push_back(m64j);
+ prevPhase = relPhase;
+ }
+ else if (g0 == 0 || g0 == 2) {
+ if (relPhase) { // dip-hap err
+ err64j.push_back(m64j);
+ err64j.push_back(m64j);
+ }
+ }
+ }
+ err64j.push_back(segEnd64j);
+ /*
+ // output post-process debug info
+ cout << "true: "; checkHapPhase1j(n0, nF1, nF2, n1hap, segStart64j, segEnd64j); cout << endl;
+ cout << "call: "; checkHapPhase1jCall(n0, nF1, nF2, fixHaploBitsT[0], segStart64j, segEnd64j, true);
+ cout << "err vs. call:";
+ for (uint i = 0; i < err64j.size(); i++) {
+ cout << " " << err64j[i]/64 << "." << (err64j[i]&63);
+ printf("(%.2f)", cMs64j[err64j[i]]);
+ }
+ cout << endl;
+ */
+
+ const double cMconsecMin = 1.5, cMendMin = 1.5;
+ uint iStart = 0;
+ for (uint i = 1; i < err64j.size(); i++) {
+ double cMseg = cMs64j[err64j[i]] - cMs64j[err64j[i-1]];
+ if (i == iStart+1) { // piece ending at err64j[i] is first in new chunk
+ if (cMseg < cMendMin) // no good; can't start yet
+ iStart = i;
+ }
+ else {
+ double cMprev = cMs64j[err64j[i-1]] - cMs64j[err64j[i-2]];
+ if (cMprev < cMconsecMin && cMseg < cMconsecMin) { // consec short => split
+ // deal with chunk that just ended at either err64j[i-2] or err64j[i-1]
+ if (cMprev < cMendMin) // last is too short
+ updateFarHaps(farHaps, n1hap, err64j[iStart], err64j[i-2]);
+ else
+ updateFarHaps(farHaps, n1hap, err64j[iStart], err64j[i-1]);
+ // deal with beginning of next chunk
+ if (cMseg < cMendMin)
+ iStart = i;
+ else
+ iStart = i-1;
+ }
+ }
+ if (i+1 == err64j.size()) {
+ if (cMseg < cMendMin)
+ updateFarHaps(farHaps, n1hap, err64j[iStart], err64j[i-1]);
+ else
+ updateFarHaps(farHaps, n1hap, err64j[iStart], err64j[i]);
+ }
+ }
+ //cout << endl;
+ }
+ }
+ }
+ uint farEnd = 0;
+ vector <DipHapSeg> hapSegs;
+ for (uint64 segStart64j = 0; segStart64j < Mseg64*64; segStart64j++)
+ if (farHaps[segStart64j].first > farEnd) {
+ farEnd = farHaps[segStart64j].first;
+ hapSegs.push_back(DipHapSeg(farHaps[segStart64j].second, segStart64j, farEnd));
+ }
+ std::set <uint> postSwitches;
+ for (uint i = 0; i < hapSegs.size(); i++) {
+ uint64 n1hap = hapSegs[i].n, segStart64j = hapSegs[i].start, segEnd64j = hapSegs[i].end;
+ uint64 useStart64j = (i==0 || hapSegs[i-1].end < segStart64j) ?
+ segStart64j : (hapSegs[i-1].end + segStart64j)/2;
+ uint64 useEnd64j = (i+1==hapSegs.size() || segEnd64j < hapSegs[i+1].start) ?
+ segEnd64j : (segEnd64j + hapSegs[i+1].start)/2;
+ /*
+ // output post-process debug info
+ double cMstart = cMs64j[segStart64j], cMend = cMs64j[segEnd64j];
+ if ((int) nF1 != -1) {
+ updateFarHaps(farHaps, n1hap, segStart64j, segEnd64j); // just to print
+ cout << endl;
+ cout << "true: "; checkHapPhase1j(n0, nF1, nF2, n1hap, segStart64j, segEnd64j); cout << endl;
+ cout << "call: "; checkHapPhase1jCall(n0, nF1, nF2, fixHaploBitsT[0], segStart64j, segEnd64j, true);
+ printf("%.2f-%.2f cM; ", cMstart, cMend);
+ cout << "use " << useStart64j/64 << "." << (useStart64j&63) << "-"
+ << useEnd64j/64 << "." << (useEnd64j&63) << endl;
+ }
+ */
+
+ vector <int> numSinceSwitches; vector <uint64> switchLocs;
+ bool foundHet = false, prevPhase = false; int numSinceSwitch = 0;
+ for (uint64 m64j = segStart64j; m64j < segEnd64j; m64j++)
+ if (maskSnps64j[m64j]) {
+ uint64 m64 = m64j/64, j = m64j&63;
+ uint g0 = getGeno0123(m64j, n0); // TODO: speed up
+ if (g0 == 1) {
+ numSinceSwitch++;
+ bool relPhase = ((haploBitsT[n1hap*Mseg64+m64] ^ fixHaploBitsT[0][m64])>>j)&1;
+ if (!foundHet)
+ foundHet = true;
+ else if (relPhase != prevPhase) {
+ numSinceSwitches.push_back(numSinceSwitch);
+ switchLocs.push_back(m64j);
+ numSinceSwitch = 0;
+ }
+ prevPhase = relPhase;
+ }
+ }
+ numSinceSwitches.push_back(numSinceSwitch);
+ for (uint s = 0; s < switchLocs.size(); s++)
+ if (numSinceSwitches[s] > 2 && numSinceSwitches[s+1] > 2 &&
+ useStart64j <= switchLocs[s] && switchLocs[s] < useEnd64j) {
+ uint64 m64j = switchLocs[s];
+ /* output post-process debug info
+ if ((int) nF1 != -1) {
+ cout << "SUGGEST SWITCH: " << m64j/64 << "." << (m64j&63);
+ printf("(%.2f; split %.2f=%.2f+%.2f)", cMs64j[m64j], cMend-cMstart, cMs64j[m64j]-cMstart, cMend-cMs64j[m64j]);
+ }
+ */
+ postSwitches.insert(m64j);
+ }
+ }
+
+ // apply switches
+ int sign = 1;
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++) {
+ uint64 m64 = m64j/64, j = m64j&63;
+ if (postSwitches.count(m64j))
+ sign = -sign;
+ if (sign == -1) {
+ uint64 tmp = (fixHaploBitsT[0][m64] ^ fixHaploBitsT[1][m64]) & (1ULL<<j);
+ fixHaploBitsT[0][m64] ^= tmp; fixHaploBitsT[1][m64] ^= tmp;
+ tmp = (hmmHaploBitsT[0][m64] ^ hmmHaploBitsT[1][m64]) & (1ULL<<j);
+ hmmHaploBitsT[0][m64] ^= tmp; hmmHaploBitsT[1][m64] ^= tmp;
+ std::swap(phaseConfsHap0[m64j], phaseConfsHap1[m64j]);
+ }
+ }
+
+ if (Nref) { // store phased haploBits for target sample
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++) {
+ if (!maskSnps64j[m64j]) continue;
+ uint64 hapBits[2];
+ hapBits[0] = (int) phaseConfsHap0[m64j] >= 128;
+ hapBits[1] = (int) phaseConfsHap1[m64j] >= 128;
+ for (uint64 h = 0; h <= 1ULL; h++) {
+ uint64 nTargetHap = 2*(n0-Nref) + h;
+ tmpHaploBitsT[nTargetHap*Mseg64 + (m64j/64)] |= hapBits[h]<<(m64j&63);
+ }
+ }
+ ALIGNED_FREE(phaseConfsHap1);
+ ALIGNED_FREE(phaseConfsHap0);
+ }
+
+ if ((int) nF1 != -1) { // final output
+ cout << "3rd-iter phase:" << endl;
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ /*
+ checkHaploBits(n0, nF1, nF2, hmmHaploBitsT[0][m64], m64);
+ if (fixHaploBitsT[0][m64] != hmmHaploBitsT[0][m64]) {
+ cout << " -> ";
+ checkHaploBits(n0, nF1, nF2, fixHaploBitsT[0][m64], m64);
+ }
+ */
+ vector <bool> phaseSeg = checkHaploBits(n0, nF1, nF2, fixHaploBitsT[0][m64], m64, -1);
+ //cout << endl;
+ phaseVec.insert(phaseVec.end(), phaseSeg.begin(), phaseSeg.end());
+ }
+ printf("# major SE: %2d # tot SE: %2d / %d\n", countMajorSE(phaseVec), countSE(phaseVec),
+ (int) phaseVec.size()-1);
+ fflush(stdout);
+ }
+
+#ifdef RDTSC_TIMING
+ totTicks += Timer::rdtsc() - tscStart;
+#endif
+ return hapTime;
+ }
+
+ void Eagle::writePhaseConfs(const string &tmpPhaseFile) const {
+ FILE *fout = fopen(tmpPhaseFile.c_str(), "wb");
+ fwrite(phaseConfs, 1, 2*N*Mseg64*64, fout);
+ fclose(fout);
+ }
+
+ void Eagle::readPhaseConfs(const string &tmpPhaseFile) {
+ FILE *fout = fopen(tmpPhaseFile.c_str(), "rb");
+ fread(phaseConfs, 1, 2*N*Mseg64*64, fout);
+ fclose(fout);
+ }
+
+ void Eagle::cpPhaseConfs(uint64 n0start, uint64 n0end) {
+ memcpy(phaseConfs + 2*n0start*Mseg64*64, phaseConfs2 + 2*n0start*Mseg64*64,
+ 2*(n0end-n0start)*Mseg64*64);
+ }
+
+ void Eagle::cpTmpHaploBitsT(uint64 n0start, uint64 n0end) {
+ memcpy(haploBitsT + 2*n0start*Mseg64, tmpHaploBitsT + 2*n0start*Mseg64,
+ 2*(n0end-n0start)*Mseg64 * sizeof(haploBitsT[0]));
+ for (uint64 nHap = 2*n0start; nHap < 2*n0end; nHap++)
+ for (uint64 m64 = 0; m64 < Mseg64; m64++)
+ haploBits[m64*2*N + nHap] = haploBitsT[nHap*Mseg64 + m64];
+ }
+
+ void Eagle::outputSE(const vector <uint> &children, const vector <uint> &nF1s,
+ const vector <uint> &nF2s, int step) const {
+
+ if (children.empty()) return;
+ vector <int> majorSEs, totSEs; vector <double> majorSErates, totSErates;
+ for (uint att = 0; att < children.size(); att++) {
+ printf("Switch error locations (step %d):", step);
+ vector <bool> phaseVec = checkPhaseConfsPhase(children[att], nF1s[att], nF2s[att]);
+ majorSEs.push_back(countMajorSE(phaseVec));
+ totSEs.push_back(countSE(phaseVec));
+ majorSErates.push_back(majorSEs.back() * 100.0 / (phaseVec.size()-1));
+ totSErates.push_back(totSEs.back() * 100.0 / (phaseVec.size()-1));
+ printf("# major SE: %2d # tot SE: %2d / %d (step %d)\n", majorSEs.back(),
+ totSEs.back(), (int) phaseVec.size()-1, step);
+ }
+ sort(majorSEs.begin(), majorSEs.end());
+ sort(totSEs.begin(), totSEs.end());
+ int useTrios = 70;
+ printf("%d-trio avg # major SE: %.2f avg # tot SE: %.2f (step %d)\n", useTrios,
+ std::accumulate(majorSEs.begin(), majorSEs.begin()+useTrios, 0) / (double) useTrios,
+ std::accumulate(totSEs.begin(), totSEs.begin()+useTrios, 0) / (double) useTrios, step);
+ sort(majorSErates.begin(), majorSErates.end());
+ sort(totSErates.begin(), totSErates.end());
+ printf("median major SE rate: %.2f%% median tot SE rate: %.2f%% (step %d)\n",
+ majorSErates[(nF1s.size()-1)/2], totSErates[(nF1s.size()-1)/2], step);
+ fflush(stdout);
+ }
+
+ void Eagle::writeHapsGzSample(const string &prefix) const {
+ FileUtils::AutoGzOfstream hapsGzOut; hapsGzOut.openOrExit(prefix + ".haps.gz");
+ uint64 m = 0; // index in snps vector
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++) {
+ if (!maskSnps64j[m64j]) continue;
+ hapsGzOut << snps[m].chrom << " " << snps[m].ID << " " << snps[m].physpos
+ << " " << snps[m].allele1 << " " << snps[m].allele2;
+ for (uint64 n0 = 0; n0 < N; n0++) {
+ int hapBit1, hapBit2;
+ if (phaseConfs != NULL) {
+ hapBit1 = (int) phaseConfs[2*n0*Mseg64*64 + m64j] < 128;
+ hapBit2 = (int) phaseConfs[(2*n0+1)*Mseg64*64 + m64j] < 128;
+ }
+ else {
+ hapBit1 = !((haploBits[(m64j/64)*2*N + 2*n0]>>(m64j&63))&1);
+ hapBit2 = !((haploBits[(m64j/64)*2*N + 2*n0+1]>>(m64j&63))&1);
+ }
+ if (isFlipped64j[m64j]) {
+ hapBit1 = !hapBit1;
+ hapBit2 = !hapBit2;
+ }
+ hapsGzOut << " " << hapBit1 << " " << hapBit2;
+ }
+ hapsGzOut << endl;
+ m++;
+ }
+ hapsGzOut.close();
+
+ FileUtils::AutoGzOfstream sampleOut; sampleOut.openOrExit(prefix + ".sample");
+ sampleOut << std::setprecision(3);
+ sampleOut << "ID_1 ID_2 missing" << endl;
+ sampleOut << "0 0 0" << endl;
+ for (uint64 n0 = 0; n0 < N; n0++) {
+ /*
+ int ctrMiss = 0, ctrTot = 0;
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++)
+ if (maskSnps64j[m64j]) {
+ ctrTot++;
+ ctrMiss += getGeno0123(m64j, n0)==3;
+ }
+ */
+ double missing = 0;//ctrMiss / (double) ctrTot;
+ sampleOut << indivs[n0].famID << " " << indivs[n0].indivID << " " << missing << endl;
+ }
+ sampleOut.close();
+ }
+
+ void bcf_hdr_append_eagle_version(bcf_hdr_t *hdr, int argc, char **argv)
+ {
+ kstring_t str = {0,0,0};
+ const char cmd[] = "eagle";
+ ksprintf(&str,"##%sVersion=%s+htslib-%s\n", cmd, EAGLE_VERSION, hts_version());
+ bcf_hdr_append(hdr,str.s);
+
+ str.l = 0;
+ ksprintf(&str,"##%sCommand=%s", cmd, "eagle");
+ int i;
+ for (i=1; i<argc; i++)
+ {
+ if ( strchr(argv[i],' ') )
+ ksprintf(&str, " '%s'", argv[i]);
+ else
+ ksprintf(&str, " %s", argv[i]);
+ }
+ kputc('\n', &str);
+ bcf_hdr_append(hdr,str.s);
+ free(str.s);
+
+ bcf_hdr_sync(hdr);
+ }
+
+ void Eagle::writeVcf(const string &tmpFile, const string &outFile, int chromX, double bpStart,
+ double bpEnd, const string &writeMode, bool noImpMissing, int argc,
+ char **argv) const {
+
+ htsFile *htsTmp = hts_open(tmpFile.c_str(), "r");
+ htsFile *out = hts_open(outFile.c_str(), writeMode.c_str());
+
+ bcf_hdr_t *hdr = bcf_hdr_read(htsTmp);
+ bcf_hdr_append_eagle_version(hdr, argc, argv);
+ bcf_hdr_write(out, hdr);
+
+ bcf1_t *rec = bcf_init1();
+ int mtgt_gt = 0, *tgt_gt = NULL;
+
+ uint64 m64j = 0; // SNP index; update to correspond to current record
+
+ vector <int> mostRecentPloidy(N-Nref, 2);
+
+ while (bcf_read(htsTmp, hdr, rec) >= 0) {
+
+ int chrom = StringUtils::bcfNameToChrom(bcf_hdr_id2name(hdr, rec->rid), 1, chromX);
+
+ int ntgt_gt = bcf_get_genotypes(hdr, rec, &tgt_gt, &mtgt_gt);
+
+ int bp = rec->pos+1;
+ if (bpStart <= bp && bp <= bpEnd) { // check if within output region
+ for (int i = 0; i < (int) (N-Nref); i++) {
+ int ploidy = 2;
+ int *ptr = tgt_gt + i*ploidy;
+
+ if (chrom != chromX || (bcf_gt_is_missing(ptr[0]) && mostRecentPloidy[i] == 2)
+ || ptr[1] != bcf_int32_vector_end) { // diploid... be careful about missing '.'
+ mostRecentPloidy[i] = 2;
+ bool missing = false;
+ int minIdx = 1000, maxIdx = 0;
+ for (int j = 0; j < ploidy; j++) {
+ if ( bcf_gt_is_missing(ptr[j]) ) { // missing allele
+ missing = true;
+ }
+ else {
+ int idx = bcf_gt_allele(ptr[j]); // allele index
+ minIdx = std::min(minIdx, idx);
+ maxIdx = std::max(maxIdx, idx);
+ }
+ }
+
+ if (!missing && minIdx == maxIdx) { // hom => same allele
+ ptr[0] = ptr[1] = bcf_gt_phased(minIdx);
+ }
+ else if (!missing && minIdx > 0) { // ALT1/ALT2 het => don't phase
+ ptr[0] = ptr[1] = bcf_gt_missing;
+ }
+ else { // REF/ALT* het => phase as called by Eagle
+ if (missing && noImpMissing) { // don't call alleles
+ ptr[0] = ptr[1] = bcf_gt_missing;
+ }
+ else {
+ for (int j = 0; j < ploidy; j++) {
+ uint64 nTargetHap = 2*i + j;
+ int altIdx = missing ? 1 : maxIdx;
+ int hapBit = (tmpHaploBitsT[nTargetHap*Mseg64+(m64j/64)]>>(m64j&63))&1;
+ if (isFlipped64j[m64j]) hapBit = !hapBit;
+ int idx = hapBit ? altIdx : 0;
+ ptr[j] = bcf_gt_phased(idx); // convert allele index to bcf value (phased)
+ }
+ }
+ }
+ }
+ else { // haploid
+ mostRecentPloidy[i] = 1;
+ if ( bcf_gt_is_missing(ptr[0]) && !noImpMissing ) { // missing allele
+ int j = 0;
+ uint64 nTargetHap = 2*i + j;
+ int altIdx = 1;
+ int hapBit = (tmpHaploBitsT[nTargetHap*Mseg64+(m64j/64)]>>(m64j&63))&1;
+ if (isFlipped64j[m64j]) hapBit = !hapBit;
+ int idx = hapBit ? altIdx : 0;
+ ptr[j] = bcf_gt_phased(idx); // convert allele index to bcf value (phased)
+ }
+ }
+ }
+
+ bcf_update_genotypes(hdr, rec, tgt_gt, ntgt_gt);
+
+ bcf_write(out, hdr, rec);
+ }
+
+ m64j++;
+ if ((m64j&63) == seg64cMvecs[m64j/64].size())
+ m64j = (m64j + 64ULL) & ~63ULL; // move to next segment
+ }
+
+ assert(m64j == Mseg64*64);
+
+ free(tgt_gt);
+ bcf_destroy(rec);
+ bcf_hdr_destroy(hdr);
+ hts_close(out);
+ hts_close(htsTmp);
+ remove(tmpFile.c_str());
+ }
+
+ // write phased output in non-ref mode
+ // differences from the above (ref-mode) are as follows:
+ // - does not take noImpMissing arg
+ // - checks chrom
+ // - does not increment m64j outside the bpStart-bpEnd region
+ // - does not delete tmpFile (now vcfFile with original input)
+ void Eagle::writeVcfNonRef(const string &vcfFile, const string &outFile, int inputChrom,
+ int chromX, double bpStart, double bpEnd, const string &writeMode,
+ int argc, char **argv) const {
+
+ htsFile *htsIn = hts_open(vcfFile.c_str(), "r");
+ htsFile *out = hts_open(outFile.c_str(), writeMode.c_str());
+
+ bcf_hdr_t *hdr = bcf_hdr_read(htsIn);
+ bcf_hdr_append_eagle_version(hdr, argc, argv);
+ bcf_hdr_write(out, hdr);
+
+ bcf1_t *rec = bcf_init1();
+ int mtgt_gt = 0, *tgt_gt = NULL;
+
+ uint64 m64j = 0; // SNP index; update to correspond to current record
+
+ vector <int> mostRecentPloidy(N-Nref, 2);
+
+ while (bcf_read(htsIn, hdr, rec) >= 0) {
+ // check CHROM
+ int chrom = StringUtils::bcfNameToChrom(bcf_hdr_id2name(hdr, rec->rid), 1, chromX);
+ if (inputChrom != 0) {
+ if (chrom != inputChrom) {
+ continue;
+ }
+ }
+
+ int bp = rec->pos+1;
+ if (bpStart <= bp && bp <= bpEnd) { // check if within output region
+ int ntgt_gt = bcf_get_genotypes(hdr, rec, &tgt_gt, &mtgt_gt);
+
+ for (int i = 0; i < (int) (N-Nref); i++) {
+ int ploidy = 2;
+ int *ptr = tgt_gt + i*ploidy;
+
+ if (chrom != chromX || (bcf_gt_is_missing(ptr[0]) && mostRecentPloidy[i] == 2)
+ || ptr[1] != bcf_int32_vector_end) { // diploid... be careful about missing '.'
+ mostRecentPloidy[i] = 2;
+ bool missing = false;
+ int minIdx = 1000, maxIdx = 0; // (shouldn't matter; SNPs should be biallelic)
+ for (int j = 0; j < ploidy; j++) {
+ if ( bcf_gt_is_missing(ptr[j]) ) { // missing allele
+ missing = true;
+ }
+ else {
+ int idx = bcf_gt_allele(ptr[j]); // allele index
+ minIdx = std::min(minIdx, idx);
+ maxIdx = std::max(maxIdx, idx);
+ }
+ }
+
+ if (!missing && minIdx == maxIdx) { // hom => same allele
+ ptr[0] = ptr[1] = bcf_gt_phased(minIdx);
+ }
+ else if (!missing && minIdx > 0) { // ALT1/ALT2 het => don't phase (shouldn't happen)
+ ptr[0] = ptr[1] = bcf_gt_missing;
+ }
+ else { // REF/ALT* het => phase as called by Eagle
+ for (int j = 0; j < ploidy; j++) {
+ uint64 nTargetHap = 2*i + j;
+ int altIdx = missing ? 1 : maxIdx;
+ int hapBit = (tmpHaploBitsT[nTargetHap*Mseg64+(m64j/64)]>>(m64j&63))&1;
+ if (isFlipped64j[m64j]) hapBit = !hapBit;
+ int idx = hapBit ? altIdx : 0;
+ ptr[j] = bcf_gt_phased(idx); // convert allele index to bcf value (phased)
+ }
+ }
+ }
+ else { // haploid
+ mostRecentPloidy[i] = 1;
+ if ( bcf_gt_is_missing(ptr[0]) ) { // missing allele
+ int j = 0;
+ uint64 nTargetHap = 2*i + j;
+ int altIdx = 1;
+ int hapBit = (tmpHaploBitsT[nTargetHap*Mseg64+(m64j/64)]>>(m64j&63))&1;
+ if (isFlipped64j[m64j]) hapBit = !hapBit;
+ int idx = hapBit ? altIdx : 0;
+ ptr[j] = bcf_gt_phased(idx); // convert allele index to bcf value (phased)
+ }
+ }
+ }
+
+ bcf_update_genotypes(hdr, rec, tgt_gt, ntgt_gt);
+
+ bcf_write(out, hdr, rec);
+
+ m64j++;
+ if ((m64j&63) == seg64cMvecs[m64j/64].size())
+ m64j = (m64j + 64ULL) & ~63ULL; // move to next segment
+ }
+ }
+
+ assert(m64j == Mseg64*64);
+
+ free(tgt_gt);
+ bcf_destroy(rec);
+ bcf_hdr_destroy(hdr);
+ hts_close(out);
+ hts_close(htsIn);
+ }
+
+ void Eagle::makeHardCalls(uint64 n0start, uint64 n0end, uint seed) {
+ // fast rng: last 16 bits of Marsaglia's MWC
+ uint w = 521288629;
+ for (uint i = 0; i < seed % 12345; i++)
+ w=18000*(w&65535)+(w>>16);
+ //memset(haploBits, 0, 2*N*Mseg64*sizeof(haploBits[0]));
+ memset(segConfs + 2*n0start*Mseg64, 0, 2*(n0end-n0start)*Mseg64*sizeof(segConfs[0]));
+ for (uint64 nHap = 2*n0start; nHap < 2*n0end; nHap++)
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++) {
+ if ((m64j&63)==0)
+ haploBits[(m64j/64)*2*N + nHap] = 0;
+ uchar phaseConf = phaseConfs[nHap*Mseg64*64 + m64j];
+ segConfs[nHap*Mseg64+m64j/64] = max(segConfs[nHap*Mseg64+m64j/64],
+ min(phaseConf, (uchar) (255-phaseConf)));
+ if (phaseConf == (uchar) 255 || ((w=18000*(w&65535)+(w>>16))&255) < phaseConf)
+ haploBits[(m64j/64)*2*N + nHap] |= 1ULL<<(m64j&63);
+ if ((m64j&63)==63)
+ haploBitsT[nHap*Mseg64 + (m64j/64)] = haploBits[(m64j/64)*2*N + nHap];
+ }
+ }
+
+ uint Eagle::computeHash(const uint64 *curHaploBitsT, const uint64 *curHashBits, uint B) const {
+ uint hash = 0;
+ for (uint b = 0; b < B; b++)
+ hash |= ((curHaploBitsT[curHashBits[b]>>6]>>(curHashBits[b]&63))&1)<<b;
+ return hash;
+ }
+
+ uint Eagle::computeHash(const uint64 *curHaploBitsT, const vector <uint64> &curHashBits) const {
+ return computeHash(curHaploBitsT, &curHashBits[0], curHashBits.size());
+ }
+
+ double Eagle::computeLogHetP(uint64 m64j) const {
+ assert(Nref!=0); // only call this function when in ref-mode
+ int sumHaps = 0;
+ for (uint64 nHap = 0; nHap < 2*Nref; nHap++)
+ sumHaps += (haploBits[(m64j/64)*2*N + nHap]>>(m64j&63))&1;
+ double p = sumHaps / (2.0 * Nref);
+ p = std::min(p, 1-p);
+ return log10(p);
+ }
+
+ void Eagle::initRefIter(int refIter) {
+ uint64 Ntarget = N - Nref;
+ if (refIter > 1) { // copy tmpHaploBitsT from previous iter -> haploBits, haploBitsT
+ memcpy(haploBitsT + 2*Nref*Mseg64, tmpHaploBitsT, 2*Ntarget*Mseg64*sizeof(tmpHaploBitsT[0]));
+ for (uint64 nHap = 2*Nref; nHap < 2*N; nHap++) // copy transpose
+ for (uint64 m64 = 0; m64 < Mseg64; m64++)
+ haploBits[m64*2*N + nHap] = haploBitsT[nHap*Mseg64 + m64];
+ }
+ // clear tmpHaploBitsT (temp storage of phased target haplotypes)
+ memset(tmpHaploBitsT, 0, 2*Ntarget*Mseg64*sizeof(tmpHaploBitsT[0]));
+ }
+
+ // input arg iter = non-ref mode iter (ref mode iter 1 = non-ref mode iter 3)
+ void Eagle::buildHashTables(int iter, int batch, int seed) {
+
+ std::srand(1000000*seed + 1000*iter + batch); // seed random_shuffle
+
+ const uint maxValuesPerKey = 99;
+ const uint baseLSH = 10, bonusLSH = iter > 2 ? 4 : 0;
+ const uint numLSH = baseLSH + bonusLSH;
+ const uint maxBits = 32;
+ const double minLogHetP = log10(0.02);
+ hashLookups = vector < vector <StaticMultimap> > (Mseg64, vector <StaticMultimap> (numLSH));
+ hashBits = vector < vector < vector <uint64> > > (Mseg64, vector < vector <uint64> > (numLSH));
+
+ const double reduction = (iter == 2 ? 0 : 0.05);
+
+ const uint64 side = 1;
+ for (uint64 m64 = 0+side; m64+side < Mseg64; m64++) {
+ for (uint h = 0; h < numLSH; h++) {
+ vector <uint64> m64js;
+
+ if (h < baseLSH) { // standard hash regions: 3x down to 2x m64
+ for (uint64 m64j = (uint64) ((m64-side+reduction*h)*64);
+ m64j < (uint64) ((m64+side+1-reduction*h)*64); m64j++)
+ if (maskSnps64j[m64j] &&
+ (Nref==0 ? seg64logPs[m64j].cond[1][3] : computeLogHetP(m64j)) > minLogHetP)
+ m64js.push_back(m64j);
+ }
+ else { // small hash regions
+ int offStart = 0, offEnd = 0;
+ switch (h-baseLSH) {
+ case 0: offStart = -32; offEnd = 32; break;
+ case 1: offStart = 0; offEnd = 64; break;
+ case 2: offStart = -32; offEnd = 0; break;
+ case 3: offStart = 0; offEnd = 32; break;
+ }
+ for (uint64 m64j = (uint64) (m64*64 + offStart); m64j < (uint64) (m64*64 + offEnd);
+ m64j++)
+ if (maskSnps64j[m64j])
+ m64js.push_back(m64j);
+ }
+
+ if (m64js.empty())
+ for (uint64 m64j = (m64-side)*64; m64j < (m64+side+1)*64; m64j++)
+ if (maskSnps64j[m64j])
+ m64js.push_back(m64j);
+
+ uint bitsInHash = (h < baseLSH ? maxBits-h : 24);
+
+ // randomly select SNPs m64j to use in hash
+ uint m64jInd = m64js.size();
+ for (uint b = 0; b < bitsInHash; b++) {
+ // choose next SNP (in random order); if at end, reshuffle
+ if (m64jInd == m64js.size()) {
+ std::random_shuffle(m64js.begin(), m64js.end());
+ m64jInd = 0;
+ }
+ hashBits[m64][h].push_back(m64js[m64jInd++]);
+ }
+ }
+ }
+
+ uint64 nRefHaps = 2*((Nref!=0 && iter==3) ? Nref : N); // ref-mode iter 1 -> iter 3
+ vector < vector <uint> > keyVecs(omp_get_max_threads(), vector <uint> (nRefHaps));
+#pragma omp parallel for
+ for (uint64 m64 = 0+side; m64 < Mseg64-side; m64++) {
+ cout << "." << std::flush;
+ for (uint h = 0; h < numLSH; h++) {
+ // compute hashes
+ vector <uint> &keyVec = keyVecs[omp_get_thread_num()];
+ for (uint64 nHap = 0; nHap < nRefHaps; nHap++) // in ref-mode, only use ref
+ keyVec[nHap] = maskIndivs[nHap/2] ?
+ computeHash(haploBitsT + nHap*Mseg64, &hashBits[m64][h][0], hashBits[m64][h].size())
+ : -1U;
+ hashLookups[m64][h].init(keyVec, maxValuesPerKey);
+ }
+ }
+ }
+
+ const uint64 *Eagle::getHaploBitsT(void) const { return haploBitsT; }
+ uint64 Eagle::getNlib(int iter) const { return ((Nref!=0 && iter==3) ? Nref : N); }
+ uint64 Eagle::getMseg64(void) const { return Mseg64; }
+ const uchar *Eagle::getMaskSnps64j(void) const { return maskSnps64j; }
+
+ double Eagle::computeHetRate(void) const {
+ uint64 homCtr = 0, totCtr = 0;
+ for (uint64 m64 = 0; m64 < Mseg64; m64++)
+ for (uint64 n = Nref; n < N; n++) { // ref genoBits aren't initialized!
+ const uint64_masks &bits = genoBits[m64*N + n];
+ homCtr += popcount64(bits.is0 | bits.is2);
+ totCtr += popcount64(~bits.is9);
+ }
+ return 1 - homCtr / (double) totCtr;
+ }
+
+}
diff --git a/src/Eagle.hpp b/src/Eagle.hpp
new file mode 100644
index 0000000..23c463a
--- /dev/null
+++ b/src/Eagle.hpp
@@ -0,0 +1,209 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef EAGLE_HPP
+#define EAGLE_HPP
+
+//#define RDTSC_TIMING
+
+#include <vector>
+#include <string>
+#include <set>
+#include <unordered_map>
+
+#include "Types.hpp"
+#include "GenoData.hpp"
+#include "HapHedge.hpp"
+#include "StaticMultimap.hpp"
+
+namespace EAGLE {
+
+ struct Match {
+ uint n, m64jStart, m64jEnd, m64jStartCons, m64jEndCons; // [m64jStart..m64jEnd]
+ double logBF, cMlenInit;
+ bool operator < (const Match &m2) const {
+ return logBF > m2.logBF;
+ }
+ Match() : n(0), m64jStart(0), m64jEnd(0), m64jStartCons(0), m64jEndCons(1<<30), logBF(0) {}
+ Match(uint _n, uint _m64jStart, uint _m64jEnd, double _logBF) :
+ n(_n), m64jStart(_m64jStart), m64jEnd(_m64jEnd), m64jStartCons(0), m64jEndCons(1<<30),
+ logBF(_logBF) {}
+ static bool greaterEnd(const Match &m1, const Match &m2) {
+ return m1.m64jEnd > m2.m64jEnd || (m1.m64jEnd == m2.m64jEnd && m1.logBF > m2.logBF);
+ }
+ static bool greaterLen(const Match &m1, const Match &m2) {
+ return m1.cMlenInit > m2.cMlenInit ||
+ (m1.cMlenInit == m2.cMlenInit && m1.logBF > m2.logBF);
+ }
+ };
+
+ struct DPState {
+ uint score;
+ std::pair <uint, uint> from;
+ DPState() : score(0), from(std::make_pair(0U, 0U)) {}
+ DPState(uint _score, std::pair <uint, uint> _from) : score(_score), from(_from) {}
+ bool operator < (const DPState &state2) const {
+ return score < state2.score
+ || (score==state2.score && from < state2.from);
+ }
+ };
+
+ class Eagle {
+ public:
+ mutable uint64 totTicks, extTicks, diphapTicks, lshTicks, lshCheckTicks, dpTicks, dpStaticTicks, dpSwitchTicks, dpSortTicks, dpUpdateTicks, dpUpdateCalls, blipFixTicks, blipPopTicks, blipVoteTicks, blipLshTicks;
+
+ private:
+
+ static const uint homErrCost = 1, hetErrCost = 2, switchCost = 3;
+ static const uint switchScoreLutBits = 5;
+ char switchScoreLut[1<<(3*switchScoreLutBits)][2];
+ const uint64 N, Nref; // Nref = 0 if not in ref-mode
+ const uint64 Mseg64; // number of <=64-SNP chunks
+ const uint64_masks *genoBits; // [[MATRIX]]: M64 x N (is0 and is2 64-bit masks)
+ const std::vector <std::vector <double> > seg64cMvecs;
+ uchar *maskSnps64j; // M64x64 binary vector indicating SNPs to use
+ double *cMs64j; // M64x64+1 cM coordinates
+ uchar *phaseConfs, *phaseConfs2; // [[MATRIX]]: 2N x M64x64
+ uint64 *haploBits; // [[MATRIX]]: M64 x 2N (is1 for hard calls)
+ uint64 *haploBitsT; // [[MATRIX]]: 2N x M64 (is1 for hard calls)
+ uint64 *tmpHaploBitsT; // [[MATRIX]]: 2Ntarget x M64 (temp storage for target haps in ref-mode)
+ uchar *segConfs; // [[MATRIX]]: 2N x M64
+ std::vector < std::vector <StaticMultimap> > hashLookups;
+ std::vector < std::vector < std::vector <uint64> > > hashBits;
+ const AlleleFreqs *seg64logPs;
+ const std::vector <double> invLD64j; // M64x64 LD-based weights for evaluating match evidence
+ const std::vector <IndivInfoX> indivs;
+ const std::vector <SnpInfoX> snps; // M-vector
+ std::vector <uchar> maskIndivs; // N-vector: 0 to ignore indivs (e.g., relatives)
+ std::vector <bool> isFlipped64j; // in non-ref mode, SNPs are internally flipped to 0=A2=major
+ const double logPerr; // genotype error rate
+
+ void init(void);
+ uint getGeno0123(uint64 m64j, uint64 n) const;
+ void retractMatch(uint n0, Match &match, double memoLogBF[][4]) const;
+ Match computeDuoLogBF(double memoLogBF[][4], double workLogBF[], uint64 n0, uint64 n1, uint64 m64cur) const;
+ void trim(Match &match, const Match &ref, uint64 n0, int orientation, uint64 trimStart,
+ int inc, double workLogBF[]) const;
+ std::string computePhaseString(uint64 n0, uint64 nF1, uint64 nF2,
+ const std::vector <Match> &matches,
+ const std::vector <int> &signs, uint64 start, double cMend,
+ bool cons)
+ const;
+ void printMatch(uint64 n0, uint64 nF1, uint64 nF2, const Match &duoMatch,
+ double memoLogBF[][4]) const;
+ void findLongHalfIBD(uint64 n0, std::vector <uint> topInds[2],
+ std::vector <uint> topIndsLens[2], uint K) const;
+ std::vector <uint> findMinErrDipHap(uint64 n0, uint K, bool useTargetHaps) const;
+ void findLongDipHap(uint64 n0, std::vector <uint> topInds[2],
+ std::vector <uint> topIndsLens[2], uint K, uint errStart) const;
+ void computePhaseConfs(uint64 n0, const std::vector <Match> &matches,
+ const std::vector <int> &signs, bool cons);
+ std::vector <int> trioRelPhase(uint64 n0, uint64 nF1, uint64 nF2) const;
+ void checkPhase(uint64 n0, uint64 nF1, uint64 nF2, double thresh) const;
+ std::vector <bool> checkPhaseConfsPhase(uint64 n0, uint64 nF1, uint64 nF2) const;
+ void checkHapPhase(uint64 n0, uint64 nF1, uint64 nF2, const uint64 curHaploBitsT[], uint64 m64,
+ uint64 side, std::vector < std::vector <int> > votes=std::vector < std::vector <int> > ()) const;
+ std::vector <bool> checkHapPhase1(uint64 n0, uint64 nF1, uint64 nF2, uint64 n1hap,
+ uint64 m64start, uint64 m64end, int sign=1) const;
+ std::vector <bool> checkHapPhase1j(uint64 n0, uint64 nF1, uint64 nF2, uint64 n1hap,
+ uint64 m64jStart, uint64 m64jEnd, int sign=1) const;
+ std::vector <bool> checkHapPhase1jCall(uint64 n0, uint64 nF1, uint64 nF2, uint64 callBitsT[],
+ uint64 m64jStart, uint64 m64jEnd, bool print, int sign=1) const;
+ int checkHapPhase2(uint64 n0, uint64 nF1, uint64 nF2, uint64 n1hap,
+ uint64 n2hap, uint64 n3hap, uint64 m64, int sign) const;
+ std::vector <bool> checkHaploBits(uint64 n0, uint64 nF1, uint64 nF2, uint64 hapBits,
+ uint64 m64, int pad=0) const;
+ std::pair <uint64, uint64> phaseSegHMM(uint64 n0, uint64 n1hap, uint64 n2hap, uint64 n3hap,
+ uint64 m64, uint64 &hetErrMask) const;
+ std::vector <bool> checkSegPhase(uint64 n0, uint64 nF1, uint64 nF2, uint64 n1hap, uint64 n2hap,
+ int sign, uint64 m64) const;
+ void computeSegPhaseConfs(uint64 n0, uint64 n1hap, uint64 n2hap, int sign, uint64 m64,
+ int err);
+ int numDipHapWrongBits(uint64 m64, uint64 n0, uint64 n1hap) const;
+ int firstDipHapGoodBit(uint64 m64, uint64 n0, uint64 n1hap) const;
+ int firstDipHapWrongBit(uint64 m64, uint64 n0, uint64 n1hap) const;
+ uint computeHash(const uint64 *curHaploBitsT, const uint64 *curHashBits, uint B) const;
+ uint computeHash(const uint64 *curHaploBitsT, const std::vector <uint64> &curHashBits) const;
+
+ bool updateHelper(std::unordered_map <uint64, DPState> &dpTab, uint &dpBestScore,
+ std::pair <uint, uint> cur, std::pair <uint, uint> next, uint score) const;
+ uint computeStaticScore(uint n0, uint n1hap, uint n2hap, uint64 m64) const;
+ uint computeSwitchScore(uint n0, uint n1hap, uint n2hapA, uint n2hapB, uint64 m64) const;
+ void updateTable(std::unordered_map <uint64, DPState> dpTable[], uint dpBestScores[],
+ uint64 m64, uint64 dist, uint n0, uint n1hapA, uint n2hapA, uint n1hapB,
+ uint n2hapB, uint score) const;
+ void safeInsert(std::set <uint> &refHapSet, uint n1hap, uint n0) const;
+ std::vector < std::pair <uint64, uint64> > findGoodSegs(uint64 n0, uint64 nF1, uint64 nF2, uint64 n1hap) const;
+ void updateFarHaps(std::vector < std::pair <uint, uint> > &farHaps, uint n1hap, uint m64jStart, uint m64jEnd) const;
+ double computeLogHetP(uint64 m64j) const;
+
+ public:
+ Eagle(uint64 _N, uint64 _Mseg64, const uint64_masks *_genoBits,
+ std::vector < std::vector <double> > _seg64cMvecs, const AlleleFreqs *_seg64freqs,
+ std::vector <double> _invLD64j, const std::vector <IndivInfoX> &_indivs,
+ const std::vector <SnpInfoX> &_snps, const std::string &maskFile,
+ const std::vector <bool> &isFlipped64j, double _pErr, int runStep2);
+ // constructor for ref-mode
+ Eagle(uint64 _Nref, uint64 _Ntarget, uint64 _Mseg64, const uint64_masks *_genoBits,
+ std::vector < std::vector <double> > _seg64cMvecs, double _pErr);
+
+ void reallocLRPtoPBWT(void);
+ ~Eagle();
+
+ void checkTrioErrorRate(uint64 n0, uint64 nF1, uint64 nF2) const;
+ void randomlyPhaseTmpHaploBitsT(uint64 n0);
+ std::pair <double, std::vector <double> > findLongDipMatches(uint64 n0, uint64 nF1,
+ uint64 nF2);
+ double findLongHapMatches(uint64 n0, uint64 nF1, uint64 nF2, int iter);
+ double runHMM(uint64 n0, uint64 nF1, uint64 nF2, int iter, uint beamWidth, uint maxHapStates);
+ std::vector <bool> computeRefIsMono(const std::vector <uint> &bestHaps) const;
+ float runPBWT(uint64 n0, uint64 nF1, uint64 nF2, int Kpbwt, double cMexpect, double histFactor,
+ bool runReverse, bool useTargetHaps, bool impMissing);
+ float runPBWT(uint64 n0, uint64 nF1, uint64 nF2, int Kpbwt, double cMexpect, double histFactor,
+ bool runReverse, bool useTargetHaps, bool impMissing, int usePS,
+ const std::vector < std::pair <int, int> > &conPS);
+ void imputeMissing(const HapHedge &hapHedge, uint64 n0);
+ void writePhaseConfs(const std::string &tmpPhaseFile) const;
+ void readPhaseConfs(const std::string &tmpPhaseFile);
+ void cpPhaseConfs(uint64 n0start, uint64 n0end);
+ void cpTmpHaploBitsT(uint64 n0start, uint64 n0end);
+ void outputSE(const std::vector <uint> &children, const std::vector <uint> &nF1s,
+ const std::vector <uint> &nF2s, int step) const;
+ void writeHapsGzSample(const std::string &prefix) const;
+ void writeVcf(const std::string &tmpFile, const std::string &outFile, int chromX,
+ double bpStart, double bpEnd, const std::string &writeMode, bool noImpMissing,
+ int argc, char**argv) const;
+ void writeVcfNonRef(const std::string &vcfFile, const std::string &outFile, int inputChrom,
+ int chromX, double bpStart, double bpEnd, const std::string &writeMode,
+ int argc, char **argv) const;
+ void makeHardCalls(uint64 n0start, uint64 n0end, uint seed);
+ void initRefIter(int refIter);
+ void buildHashTables(int iter, int batch, int seed);
+ const uint64 *getHaploBitsT(void) const;
+ uint64 getNlib(int iter) const;
+ uint64 getMseg64(void) const;
+ const uchar *getMaskSnps64j(void) const;
+ double computeHetRate(void) const;
+
+ static int countSE(const std::vector <bool> &phaseVec);
+ static int countMajorSE(const std::vector <bool> &phaseVec);
+
+ };
+}
+
+#endif
diff --git a/src/EagleImpMiss.cpp b/src/EagleImpMiss.cpp
new file mode 100644
index 0000000..f2d42bf
--- /dev/null
+++ b/src/EagleImpMiss.cpp
@@ -0,0 +1,286 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <vector>
+#include <iostream>
+
+#include "HapHedge.hpp"
+#include "MemoryUtils.hpp"
+#include "NumericUtils.hpp"
+#include "Eagle.hpp"
+
+
+namespace EAGLE {
+
+ using std::cout;
+ using std::endl;
+ using std::vector;
+ using std::string;
+
+#define HAP_BEAM_WIDTH 16
+ struct ProbMaskBundle {
+ float logTotProb;
+ int numTop;
+ float probs[HAP_BEAM_WIDTH];
+ uint64 masks[HAP_BEAM_WIDTH];
+ };
+
+ struct MultMaskState {
+ float mult; // 1/Nhaps * pErr^numErrs
+ uint64 mask;
+ HapTreeState state;
+ char rmActive[2];
+ inline int count(void) const {
+ return state.count - rmActive[0] - rmActive[1];
+ }
+ inline float multCount(void) const {
+ if (mult == 0) return 0;
+ else return mult * count();
+ }
+ bool operator < (const MultMaskState &mms) const {
+ return multCount() > mms.multCount();
+ }
+ };
+
+ void advance(const HapTree &hapTree, const uchar rmHaps[], const uchar hap[],
+ const uchar missing[], MultMaskState states[HAP_BEAM_WIDTH], int m, float pErr) {
+ MultMaskState nextStates[2*HAP_BEAM_WIDTH];
+ int numNext = 0;
+ // impose pruning thresh (TODO: test)
+ const float minP = states[0].multCount() * pErr * pErr;
+ for (int k = 0; k < HAP_BEAM_WIDTH && states[k].multCount() > minP; k++)
+ for (int b = 0; b < 2; b++) {
+ HapTreeState nextState = states[k].state;
+ if (hapTree.next(m, nextState, b)) {
+ if (missing[m]) {
+ nextStates[numNext].mult = states[k].mult;
+ nextStates[numNext].mask = (states[k].mask<<1)|b;
+ }
+ else {
+ nextStates[numNext].mult = states[k].mult * (b==hap[m] ? 1 : pErr);
+ nextStates[numNext].mask = states[k].mask;
+ }
+ nextStates[numNext].state = nextState;
+ nextStates[numNext].rmActive[0] = states[k].rmActive[0] && ((rmHaps[m]&1)==b);
+ nextStates[numNext].rmActive[1] = states[k].rmActive[1] && ((rmHaps[m]>>1)==b);
+ numNext++;
+ }
+ }
+ std::sort(nextStates, nextStates + numNext);
+ memcpy(states, nextStates, std::min(numNext, HAP_BEAM_WIDTH) * sizeof(states[0])); // copy best
+ if (numNext < HAP_BEAM_WIDTH) states[numNext].mult = 0;
+ }
+
+ string logProbToStr(float f) {
+ f /= log(10);
+ char buf[100];
+ sprintf(buf, "%.2fe%d", pow(10, f - floor(f)), (int) floor(f));
+ return buf;
+ }
+
+ // rmHaps: 2-bit phased genotypes for haplotype pair to ignore... or NULL if none (ref mode)
+ // recombLogPs[T+1]: [0,T] = 0; [1..T-1] = logP for recomb in (t-0.5,t+0.5)
+ void impMissing(const HapHedge &hapHedge, const uchar *rmHaps, uchar hap[],
+ const uchar missing[], const float recombLogPs[], float pErr) {
+ const int maxExt = 500;
+ const int M = hapHedge.getM();
+ const int skip = hapHedge.getSkip();
+ const int T = hapHedge.getNumTrees();
+ const int dtMax = maxExt / skip;
+
+ ProbMaskBundle *topPrefixes = (ProbMaskBundle *) calloc(T * dtMax, sizeof(topPrefixes[0]));
+ // calloc => numTop initialized to 0
+
+ vector <float> fwdLogProbs(T+1, -1000000), bwdLogProbs(T+1, -1000000);
+ fwdLogProbs[0] = 0; bwdLogProbs[T] = 0;
+
+ MultMaskState states[HAP_BEAM_WIDTH];
+ // compute haplotype prefix beams; compute fwdLogProbs
+ for (int t = 0; t < T; t++) {
+ const HapTree &hapTree = hapHedge.getHapTree(t);
+ states[0].mask = 0;
+ states[0].state = hapTree.getRootState();
+ if (rmHaps == NULL) {
+ states[0].mult = hapTree.getInvNhaps();
+ states[0].rmActive[0] = states[0].rmActive[1] = 0;
+ }
+ else {
+ states[0].mult = 1 / (1/hapTree.getInvNhaps()-2); // removing 2 haps
+ states[0].rmActive[0] = states[0].rmActive[1] = 1;
+ }
+ for (int k = 1; k < HAP_BEAM_WIDTH; k++) states[k].mult = 0;
+ int m = t*skip;
+ int dtMiss = 0;
+ for (int dt = 0; dt<dtMax && t+dt<T; dt++) {
+ int mEnd = std::min(m + skip, M);
+ for (; m < mEnd; m++) {
+ if (missing[m]) dtMiss++;
+ advance(hapTree, rmHaps, hap, missing, states, m, pErr);
+ }
+ if (dtMiss > 64) break;
+ // impose pruning threshold (TODO: test)
+ float dtPrevBestProb = (dt==0 ? 0 : topPrefixes[t*dtMax + (dt-1)].probs[0]);
+ float minP = dtPrevBestProb * pErr * pErr * expf(recombLogPs[t+dt]); // rel to previous
+ float totProb = 0;
+ ProbMaskBundle &bundle = topPrefixes[t*dtMax + dt];
+ bundle.numTop = 0;
+ //cout << "t = " << t << ", dt = " << dt << ":" << endl;
+ for (int k = 0; k < HAP_BEAM_WIDTH && states[k].multCount() > minP; k++) {
+ float prob = states[k].multCount();
+ totProb += prob;
+ bundle.numTop++;
+ bundle.probs[k] = prob;
+ bundle.masks[k] = states[k].mask;
+ //cout << " prob = " << prob << ", mask = " << states[k].mask << endl;
+ }
+ if (bundle.numTop == 0) break;
+ bundle.logTotProb = logf(totProb);
+ // compute fwdLogProbs
+ NumericUtils::logSumExp(fwdLogProbs[t+dt+1],
+ fwdLogProbs[t] + bundle.logTotProb + recombLogPs[t+dt+1]);
+ //cout << endl;
+ }
+ //cout << endl;
+ }
+
+ // count missing sites
+ int numMiss = 0;
+ vector <int> tMiss;
+ for (int t = 0; t < T; t++) {
+ tMiss.push_back(numMiss);
+ int m = t*skip;
+ int mEnd = std::min(m + skip, M);
+ for (; m < mEnd; m++)
+ if (missing[m])
+ numMiss++;
+ }
+ // initialize log probs for missing sites
+ float allLogProb01s[numMiss][2];
+ for (int i = 0; i < numMiss; i++)
+ for (int b = 0; b < 2; b++)
+ allLogProb01s[i][b] = -1000000;
+
+ // compute bwdLogProbs; compute log probs at missing sites (using saved haplotype prefix beams)
+ for (int t = T-1; t >= 0; t--) {
+ int m = t*skip;
+ int dtMiss = 0;
+ for (int dt = 0; dt<dtMax && t+dt<T; dt++) {
+ // compute bwdLogProbs
+ const ProbMaskBundle &bundle = topPrefixes[t*dtMax + dt];
+ if (bundle.numTop == 0) break;
+ NumericUtils::logSumExp(bwdLogProbs[t],
+ recombLogPs[t] + bundle.logTotProb + bwdLogProbs[t+dt+1]);
+
+ int mEnd = std::min(m + skip, M);
+ for (; m < mEnd; m++)
+ if (missing[m])
+ dtMiss++;
+ float prob01s[dtMiss][2]; memset(prob01s, 0, dtMiss*2*sizeof(prob01s[0][0]));
+ for (int k = 0; k < bundle.numTop; k++)
+ for (int i = 0; i < dtMiss; i++)
+ prob01s[i][(bundle.masks[k]>>(dtMiss-1-i))&1] += bundle.probs[k];
+ for (int i = 0; i < dtMiss; i++)
+ for (int b = 0; b < 2; b++)
+ if (prob01s[i][b] != 0)
+ NumericUtils::logSumExp(allLogProb01s[tMiss[t]+i][b],
+ fwdLogProbs[t] + logf(prob01s[i][b]) + bwdLogProbs[t+dt+1]);
+ }
+ }
+
+ // impute missing sites
+ numMiss = 0;
+ for (int m = 0; m < M; m++)
+ if (missing[m]) {
+ hap[m] = (allLogProb01s[numMiss][1] > allLogProb01s[numMiss][0]);
+ numMiss++;
+ }
+ /*
+ for (int t = 0; t <= T; t++)
+ cout << "fwdProbs[" << t << "]: " << logProbToStr(fwdLogProbs[t]) << endl;
+ for (int t = 0; t <= T; t++)
+ cout << "bwdProbs[" << t << "]: " << logProbToStr(bwdLogProbs[t]) << endl;
+ for (int i = 0; i < numMiss; i++) {
+ cout << "missing site " << i << ":";
+ for (int b = 0; b < 2; b++)
+ cout << " " << logProbToStr(allLogProb01s[i][b]);
+ cout << endl;
+ }
+ */
+ free(topPrefixes);
+ }
+
+
+
+
+ void Eagle::imputeMissing(const HapHedge &hapHedge, uint64 n0) {
+ int M = hapHedge.getM(), skip = hapHedge.getSkip(), T = hapHedge.getNumTrees();
+ const HapBitsT &hapBitsT = hapHedge.getHapBitsT();
+
+ uchar *rmHaps = NULL;
+ if (2*n0 < (uint64) hapBitsT.getNhaps()) { // set rmHaps
+ rmHaps = ALIGNED_MALLOC_UCHARS(M * sizeof(rmHaps[0]));
+ for (int m = 0; m < M; m++)
+ rmHaps[m] = hapBitsT.getBit(2*n0, m) | (hapBitsT.getBit(2*n0+1, m)<<1);
+ }
+
+ vector <uchar> missing(M);
+ int m = 0;
+ vector <double> cMtreeStarts;
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++)
+ if (maskSnps64j[m64j]) {
+ if (m % skip == 0)
+ cMtreeStarts.push_back(cMs64j[m64j]);
+ missing[m++] = (genoBits[m64j/64 * N + n0].is9>>(m64j&63))&1;
+ }
+ cMtreeStarts.push_back(cMs64j[Mseg64*64]);
+
+ vector <float> recombLogPs(T+1);
+ const double cMswitch = 2.0;
+ for (int t = 1; t < T; t++) {
+ double cMdelta = (cMtreeStarts[t+1] - cMtreeStarts[t-1]) / 2;
+ recombLogPs[t] = log(std::max(1 - exp(-cMdelta / cMswitch), 1e-6));
+ //cout << exp(recombLogPs[t]) << " " << std::flush;
+ }
+ //cout << endl;
+
+ for (uint64 nHap = 2*(n0-Nref); nHap <= 2*(n0-Nref)+1; nHap++) {
+ vector <uchar> hap(M);
+ m = 0;
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++)
+ if (maskSnps64j[m64j])
+ hap[m++] = (tmpHaploBitsT[nHap*Mseg64 + m64j/64]>>(m64j&63))&1;
+
+ impMissing(hapHedge, rmHaps, &hap[0], &missing[0], &recombLogPs[0], pow(10.0, logPerr));
+
+ m = 0;
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++)
+ if (maskSnps64j[m64j]) {
+ uint64 bit = 1ULL<<(m64j&63);
+ if (hap[m])
+ tmpHaploBitsT[nHap*Mseg64 + m64j/64] |= bit;
+ else
+ tmpHaploBitsT[nHap*Mseg64 + m64j/64] &= ~bit;
+ m++;
+ }
+ }
+
+ if (rmHaps != NULL)
+ ALIGNED_FREE(rmHaps);
+ }
+
+}
diff --git a/src/EagleMain.cpp b/src/EagleMain.cpp
new file mode 100644
index 0000000..124ec1d
--- /dev/null
+++ b/src/EagleMain.cpp
@@ -0,0 +1,620 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <vector>
+#include <string>
+#include <iostream>
+#include <fstream>
+#include <map>
+#include <set>
+
+#include "omp.h"
+
+#include <boost/version.hpp>
+
+#include "Eagle.hpp"
+#include "EagleParams.hpp"
+#include "GenoData.hpp"
+#include "HapHedge.hpp"
+#include "SyncedVcfData.hpp"
+#include "Timer.hpp"
+#include "StringUtils.hpp"
+#include "Version.hpp"
+
+using namespace EAGLE;
+using namespace std;
+
+//#define LOCAL_TEST
+//#define OLD_IMP_MISSING
+
+void adjustHistFactor(double &histFactor, double hetRate, double snpRate) {
+ if (histFactor == 0) {
+ if (snpRate * hetRate == 0)
+ histFactor = 1;
+ else {
+ // compute how far (in cM) default 100 hets will typically span
+ double cMdefaultHist = 100 / snpRate / hetRate;
+ printf("Typical span of default 100-het history length: %.2f cM\n", cMdefaultHist);
+ const double cMminHist = 1.0;
+ histFactor = max(1.0, min(20.0, cMminHist / cMdefaultHist));
+ printf("Setting --histFactor=%.2f\n", histFactor);
+ if (histFactor != 1)
+ printf("Typical span of %d-het history length: %.2f cM\n", (int) (100*histFactor),
+ cMdefaultHist*histFactor);
+ cout << flush;
+ }
+ }
+}
+
+void phaseWithRef(EagleParams ¶ms, Timer &timer, double t0, int argc, char **argv) {
+
+ string tmpFile = params.outPrefix + ".unphased." + params.vcfOutSuffix;
+ string outFile = params.outPrefix + "." + params.vcfOutSuffix;
+ vector < vector < pair <int, int> > > conPSall; double snpRate;
+ SyncedVcfData vcfData(params.vcfRef, params.vcfTarget, params.allowRefAltSwap, params.chrom,
+ params.chromX, params.bpStart-params.bpFlanking,
+ params.bpEnd+params.bpFlanking, params.geneticMapFile,
+ params.cMmax==0 ? 1 : params.cMmax, tmpFile, params.vcfWriteMode,
+ params.usePS, conPSall, snpRate);
+
+ Eagle eagle(vcfData.getNref(), vcfData.getNtarget(), vcfData.getMseg64(),
+ vcfData.getGenoBits(), vcfData.getSeg64cMvecs(), params.pErr);
+
+ double hetRate = eagle.computeHetRate();
+ cout << "Fraction of heterozygous genotypes: " << hetRate << endl;
+ if (params.usePBWT) adjustHistFactor(params.histFactor, hetRate, snpRate);
+
+ uint64 Nref = vcfData.getNref(), Ntarget = vcfData.getNtarget();
+ int iters = params.pbwtIters;
+ if (params.noImpMissing)
+ iters = 1;
+ if (iters == 0) {
+ if (Ntarget < Nref/2)
+ iters = 1;
+ else if (Ntarget < 2*Nref)
+ iters = 2;
+ else
+ iters = 3;
+ cout << endl << "Auto-selecting number of phasing iterations: setting --pbwtIters to "
+ << iters << endl << endl;
+ }
+
+ cout << endl << "BEGINNING PHASING" << endl;
+
+ vector <float> confs(Ntarget);
+
+ for (int iter = 1; iter <= iters; iter++) {
+ double t23 = timer.get_time(); timer.update_time();
+ double timeMN2 = 0, timeImpMissing = 0;
+ cout << endl << "PHASING ITER " << iter << " OF " << iters << endl << endl;
+ eagle.initRefIter(iter);
+
+ if (params.usePBWT) { // run PBWT algorithm
+ HapBitsT *hapBitsTptr = NULL;
+ HapHedge *hapHedgePtr = NULL;
+#ifdef OLD_IMP_MISSING
+ if (!params.noImpMissing) {
+ cout << "Making HapHedge" << endl;
+ hapBitsTptr = new HapBitsT(eagle.getHaploBitsT(), 2*eagle.getNlib(2+iter),
+ eagle.getMseg64(), eagle.getMaskSnps64j());
+ int skip = 25;
+ hapHedgePtr = new HapHedge(*hapBitsTptr, skip);
+ cout << "Built PBWT on " << hapBitsTptr->getNhaps() << " haplotypes" << endl;
+ cout << "Time for HapHedge: " << timer.update_time() << endl;
+ }
+#endif
+ cout << endl << "Phasing target samples" << endl;
+ int numPhased = 0; const int dots = 80;
+#pragma omp parallel for reduction(+:timeImpMissing) schedule(dynamic, 4)
+ for (uint i = Nref; i < Nref+Ntarget; i++) {
+ int nF1 = -1, nF2 = -1;
+ if (params.trioCheck) {
+ if ((i-Nref)%3 == 0) { // child
+ nF1 = i+1; nF2 = i+2;
+ }
+ else if ((i-Nref)%3 == 1) { // parent 1
+ nF1 = -(i-1); nF2 = i+1;
+ }
+ else { // parent 2
+ nF1 = -(i-2); nF2 = i-1;
+ }
+ }
+ confs[i-Nref] = eagle.runPBWT
+ (i, nF1, nF2, params.Kpbwt/(iter<iters?2:1), params.expectIBDcM, params.histFactor,
+ iter==iters, iter>1, !params.noImpMissing, params.usePS, conPSall[i-Nref]);
+#ifdef OLD_IMP_MISSING
+ if (!params.noImpMissing) {
+ Timer tim;
+ eagle.imputeMissing(*hapHedgePtr, i);
+ timeImpMissing += tim.update_time();
+ }
+#endif
+#pragma omp critical(NUM_PHASED)
+ {
+ numPhased++;
+ int newDots = numPhased * dots / Ntarget - (numPhased-1) * dots / Ntarget;
+ if (newDots) cout << string(newDots, '.') << flush;
+ }
+ }
+#ifdef OLD_IMP_MISSING
+ if (!params.noImpMissing) {
+ delete hapHedgePtr;
+ delete hapBitsTptr;
+ }
+#endif
+ }
+ else { // run LRP algorithm
+ cout << "Building hash tables" << endl;
+ eagle.buildHashTables(2+iter, 0, params.seed); // in ref mode, first iter is 3
+ cout << " (time: " << timer.update_time() << ")" << endl;
+
+ cout << endl << "Phasing target samples" << endl;
+#pragma omp parallel for reduction(+:timeMN2) schedule(dynamic, 4)
+ for (uint i = Nref; i < Nref+Ntarget; i++)
+ timeMN2 += eagle.runHMM(i, -1, -1, 3, params.beamWidth4, params.maxHapStates);
+ }
+
+ cout << endl << "Time for phasing iter " << iter << ": " << (timer.get_time()-t23) << endl;
+ if (!params.usePBWT)
+ cout << "Time for phasing iter " << iter << " MN^2: " << timeMN2 / params.numThreads
+ << endl;
+#ifdef OLD_IMP_MISSING
+ else if (!params.noImpMissing)
+ cout << "Time for phasing iter " << iter << " impMissing: "
+ << timeImpMissing / params.numThreads << endl;
+#endif
+ }
+
+ /***** FINAL OUTPUT *****/
+
+ timer.update_time();
+ cout << "Writing " << params.vcfOutSuffix << " output to " << outFile << endl;
+ eagle.writeVcf(tmpFile, outFile, params.chromX, params.bpStart, params.bpEnd,
+ params.vcfWriteMode, params.noImpMissing, argc, argv);
+ cout << "Time for writing output: " << timer.update_time() << endl;
+
+ cout << "Total elapsed time for analysis = " << (timer.get_time()-t0) << " sec" << endl;
+
+ cout << endl;
+ cout << "Mean phase confidence of each target individual:" << endl;
+ cout << "ID" << "\t" << "PHASE_CONFIDENCE" << endl;
+ for (uint i = Nref; i < Nref+Ntarget; i++) {
+ cout << vcfData.getTargetID(i-Nref) << "\t" << confs[i-Nref] << endl;
+ }
+}
+
+int main(int argc, char *argv[]) {
+
+ Timer timer; double t0 = timer.get_time();
+
+ cout << " +-----------------------------+" << endl;
+ cout << " | |" << endl;
+ cout << " | Eagle v";
+ printf("%-19s|\n", EAGLE_VERSION);
+ cout << " | ";
+ printf("%-26s|\n", EAGLE_VERSION_DATE);
+ cout << " | Po-Ru Loh |" << endl;
+ cout << " | |" << endl;
+ cout << " +-----------------------------+" << endl;
+ cout << endl;
+
+ cout << "Copyright (C) 2015-2016 Harvard University." << endl;
+ cout << "Distributed under the GNU GPLv3 open source license." << endl << endl;
+
+ //cout << "Boost version: " << BOOST_LIB_VERSION << endl;
+ //cout << endl;
+
+ printf("Command line options:\n\n");
+ printf("%s ", argv[0]);
+ for (int i = 1; i < argc; i++) {
+ if (strlen(argv[i]) >= 2 && argv[i][0] == '-' && argv[i][1] == '-')
+ printf("\\\n ");
+ bool hasSpace = false;
+ for (uint j = 0; j < strlen(argv[i]); j++)
+ if (isspace(argv[i][j]))
+ hasSpace = true;
+ if (hasSpace) {
+ if (argv[i][0] == '-') {
+ bool foundEquals = false;
+ for (uint j = 0; j < strlen(argv[i]); j++) {
+ printf("%c", argv[i][j]);
+ if (argv[i][j] == '=' && !foundEquals) {
+ printf("\"");
+ foundEquals = true;
+ }
+ }
+ printf("\" ");
+ }
+ else
+ printf("\"%s\" ", argv[i]);
+ }
+ else
+ printf("%s ", argv[i]);
+ }
+ cout << endl << endl;
+
+ EagleParams params;
+ if (!params.processCommandLineArgs(argc, argv)) {
+ cerr << "Aborting due to error processing command line arguments" << endl;
+ exit(1);
+ }
+
+ cout << "Setting number of threads to " << params.numThreads << endl;
+ omp_set_num_threads(params.numThreads);
+
+ if (!params.vcfRef.empty()) { // use reference haplotypes
+ phaseWithRef(params, timer, t0, argc, argv);
+ return 0;
+ }
+
+ cout << endl << "=== Reading genotype data ===" << endl << endl;
+
+ GenoData genoData;
+ if (!params.vcfFile.empty())
+ genoData.initVcf(params.vcfFile, params.chrom, params.chromX, params.bpStart, params.bpEnd,
+ params.geneticMapFile, params.noMapCheck, params.cMmax);
+ else
+ genoData.initBed(params.famFile, params.bimFile, params.bedFile, params.chrom, params.bpStart,
+ params.bpEnd, params.geneticMapFile, params.excludeFiles, params.removeFiles,
+ params.maxMissingPerSnp, params.maxMissingPerIndiv, params.noMapCheck,
+ params.cMmax);
+
+ vector <double> invLD64j = genoData.computeInvLD64j(1000);
+
+ if (!params.usePBWT) { // Eagle v1 algorithm
+ params.pbwtOnly = false; // should already be false
+ params.runStep2 = 1;
+ }
+ else { // PBWT algorithm
+ // if SNP density is >1000 SNPs/Mb, don't run Steps 1+2 (even if --pbwtOnly is not set)
+ int bpSpan = genoData.getSnps().back().physpos - genoData.getSnps()[0].physpos;
+ double snpsPerMb = genoData.getSnps().size() / (bpSpan*1e-6);
+ if (snpsPerMb > 1000) params.pbwtOnly = true;
+ if (params.pbwtOnly) params.runStep2 = 0;
+
+ // if --runStep2 hasn't yet been set, SNP density must be low; run Step 2 unless too few chunks
+ if (params.runStep2 != 0 && params.runStep2 != 1)
+ params.runStep2 = (genoData.getMseg64() >= 3U); // can't run Step 2 with < 3 SNP segments
+ }
+
+ Eagle eagle(genoData.getN(), genoData.getMseg64(), genoData.getGenoBits(),
+ genoData.getSeg64cMvecs(), genoData.getSeg64logPs(), invLD64j, genoData.getIndivs(),
+ genoData.getSnps(), params.maskFile, genoData.getIsFlipped64j(), params.pErr,
+ params.runStep2);
+
+ double hetRate = eagle.computeHetRate();
+ cout << "Fraction of heterozygous genotypes: " << hetRate << endl;
+ if (params.usePBWT) adjustHistFactor(params.histFactor, hetRate, genoData.computeSnpRate());
+
+ map <string, pair <string, string> > trioIIDs;
+ vector <uint> children, nF1s, nF2s;
+ uint N = genoData.getN();
+ double timeMN2 = 0;
+
+#ifdef LOCAL_TEST
+ {
+ ifstream
+ //finTrios("/groups/price/poru/HSPH_SVN/data/GERA/phasing/eur.CEU_gt_0.9.trios_indep.fam");
+ finTrios("/groups/price/UKBiobank/download/ukb4777_trios.fam");
+ if (finTrios) {
+ string FID, IID, s1, s2, s3, s4;
+ while (finTrios >> FID >> IID >> s1 >> s2 >> s3 >> s4)
+ trioIIDs[IID] = make_pair(s1, s2);
+
+ for (uint i = 0; i < N; i++) {
+ int nF1 = -1, nF2 = -1;
+ if (trioIIDs.find(genoData.getIndiv(i).indivID) == trioIIDs.end()) continue;
+ pair <string, string> parents = trioIIDs[genoData.getIndiv(i).indivID];
+ for (uint iF = 0; iF < N; iF++)
+ if (genoData.getIndiv(iF).indivID == parents.first ||
+ genoData.getIndiv(iF).indivID == parents.second) {
+ if (nF1 == -1) nF1 = iF;
+ else nF2 = iF;
+ }
+ if (nF1 != -1 && nF2 != -1) {
+ children.push_back(i); nF1s.push_back(nF1); nF2s.push_back(nF2);
+ }
+ }
+ cout << "Identified " << children.size() << " trio children" << endl;
+ }
+ }
+#endif
+
+ if (params.iter == 0) {
+ if (params.outPrefix == "") {
+ cerr << "ERROR: --outPrefix must be specified" << endl;
+ exit(1);
+ }
+
+ /***** RUN STEP 1 *****/
+
+ if (!params.pbwtOnly) {
+ cout << endl << "BEGINNING STEP 1" << endl << endl;
+ double t1 = timer.get_time(); timer.update_time(); timeMN2 = 0;
+
+ for (uint att = 0; att < min(9U, (uint) children.size()); att++) // run on trio children
+ eagle.findLongDipMatches(children[att], nF1s[att], nF2s[att]);
+#pragma omp parallel for reduction(+:timeMN2) schedule(dynamic, 4)
+ for (uint i = 0; i < N; i++) {
+ //cout << StringUtils::itos(i)+"\n" << flush;
+ timeMN2 += eagle.findLongDipMatches(i, -1, -1).first;
+ //cout << StringUtils::itos(-i)+"\n" << flush;
+ }
+
+ if (!params.tmpPhaseConfsPrefix.empty())
+ eagle.writePhaseConfs(params.tmpPhaseConfsPrefix+".step1.bin");
+ cout << "Time for step 1: " << (timer.get_time()-t1) << endl;
+ cout << "Time for step 1 MN^2: " << timeMN2 / params.numThreads << endl;
+ eagle.outputSE(children, nF1s, nF2s, 1);
+ }
+ else { // use PBWT only => phase randomly
+ cout << endl << "SKIPPED STEP 1" << endl;
+#pragma omp parallel for reduction(+:timeMN2) schedule(dynamic, 4)
+ for (uint i = 0; i < N; i++)
+ eagle.randomlyPhaseTmpHaploBitsT(i);
+ }
+
+ if (params.runStep2) { // running step 2 => Step 1 phase confs were saved to phaseConfs
+ cout << endl << "Making hard calls" << flush; timer.update_time();
+ eagle.makeHardCalls(0, N, params.seed);
+ cout << " (time: " << timer.update_time() << ")" << endl << endl;
+ }
+ else { // not running Step 2 => Step 1 phase calls were saved to tmpHaploBitsT
+ eagle.cpTmpHaploBitsT(0, N);
+ }
+
+ /***** RUN STEPS 2-4 (STEP 4 = STEP 3a in paper) *****/
+
+ for (int step = 2; step <= (params.usePBWT ? (1+params.runStep2) : 4); step++) {
+ cout << endl << "BEGINNING STEP " << step << endl << endl;
+ double t23 = timer.get_time(); timer.update_time(); timeMN2 = 0;
+
+ const uint64 numBatches = step == 2 ? 1 : 10;
+ const uint64 runBatches = step <= 3 ? numBatches : (uint64) (numBatches * params.fracStep4);
+ for (uint64 b = 1; b <= runBatches; b++) {
+ cout << "BATCH " << b << " OF " << runBatches << endl;
+ cout << "Building hash tables" << endl;
+ eagle.buildHashTables(step, b, params.seed);
+ cout << " (time: " << timer.update_time() << ")" << endl;
+
+ if (b == 1)
+ for (uint att = 0; att < min(9U, (uint) children.size()); att++) // run on trio children
+ step == 2 ? eagle.findLongHapMatches(children[att], nF1s[att], nF2s[att], step)
+ : eagle.runHMM(children[att], nF1s[att], nF2s[att], step,
+ step==3 ? params.beamWidth3 : params.beamWidth4, params.maxHapStates);
+
+ uint iStart = (b-1)*N/numBatches, iEnd = b*N/numBatches;
+ cout << endl << "Phasing samples " << (iStart+1) << "-" << iEnd << endl;
+#pragma omp parallel for reduction(+:timeMN2) schedule(dynamic, 4)
+ for (uint i = iStart; i < iEnd; i++) {
+ //if (step == 3) cout << StringUtils::itos(i)+"\n" << flush;
+ timeMN2 += step == 2 ? eagle.findLongHapMatches(i, -1, -1, step)
+ : eagle.runHMM(i, -1, -1, step, step==3 ? params.beamWidth3 : params.beamWidth4,
+ params.maxHapStates);
+ //if (step == 3) cout << StringUtils::itos(-i)+"\n" << flush;
+ }
+
+ eagle.cpPhaseConfs(iStart, iEnd);
+ cout << "Time for phasing batch: " << timer.update_time() << endl;
+
+ cout << endl << "Making hard calls" << flush;
+ eagle.makeHardCalls(iStart, iEnd, params.seed + step);
+ cout << " (time: " << timer.update_time() << ")" << endl << endl;
+ }
+
+ if (!params.tmpPhaseConfsPrefix.empty())
+ eagle.writePhaseConfs(params.tmpPhaseConfsPrefix+".step"+StringUtils::itos(step)+".bin");
+ cout << "Time for step " << step << ": " << (timer.get_time()-t23) << endl;
+ cout << "Time for step " << step << " MN^2: " << timeMN2 / params.numThreads << endl;
+ eagle.outputSE(children, nF1s, nF2s, step);
+ }
+
+ if (params.usePBWT) { // run PBWT iters
+ if (!params.runStep2) // didn't run Step 2 => didn't allocate phaseConfs
+ cout << endl << "SKIPPED STEP 2" << endl;
+ else // ran Step 2 => allocated phaseConfs; didn't allocate tmpHaploBitsT
+ eagle.reallocLRPtoPBWT();
+
+ cout << endl << endl << "BEGINNING STEP 3 (PBWT ITERS)" << endl << endl;
+ int iters = params.pbwtIters;
+ if (iters == 0) {
+ iters = 2 + params.pbwtOnly;
+ cout << "Auto-selecting number of PBWT iterations: setting --pbwtIters to "
+ << iters << endl << endl;
+ }
+ for (int iter = 1; iter <= iters; iter++) {
+ cout << endl << "BEGINNING PBWT ITER " << iter << endl << endl;
+ double t23 = timer.get_time(); timer.update_time(); double timeImpMissing = 0;
+
+ int skip = 16; int Kpbwt = params.Kpbwt; bool runReverse = true;
+ if (iter < iters) { // run rougher computation
+ Kpbwt >>= (iters-iter);
+ skip *= 2;
+ runReverse = false;
+ }
+
+ const uint64 numBatches = 10;
+ for (uint64 b = 1; b <= numBatches; b++) {
+ cout << "BATCH " << b << " OF " << numBatches << endl;
+#ifdef OLD_IMP_MISSING
+ cout << endl << "Making HapHedge" << endl;
+ HapBitsT hapBitsT(eagle.getHaploBitsT(), 2*eagle.getNlib(2+iter),
+ eagle.getMseg64(), eagle.getMaskSnps64j());
+ HapHedge hapHedge(hapBitsT, skip);
+ cout << "Built PBWT on " << hapBitsT.getNhaps() << " haplotypes" << endl;
+ cout << "Time for HapHedge: " << timer.update_time() << endl;
+#endif
+ if (b == 1)
+ for (uint att = 0; att < min(9U, (uint) children.size()); att++) // run on trios
+ eagle.runPBWT(children[att], nF1s[att], nF2s[att], Kpbwt, params.expectIBDcM,
+ params.histFactor, runReverse, true, false);
+
+ uint iStart = (b-1)*N/numBatches, iEnd = b*N/numBatches;
+ cout << endl << "Phasing samples " << (iStart+1) << "-" << iEnd << endl;
+#pragma omp parallel for reduction(+:timeImpMissing) schedule(dynamic, 4)
+ for (uint i = iStart; i < iEnd; i++) {
+ eagle.runPBWT(i, -1, -1, Kpbwt, params.expectIBDcM, params.histFactor, runReverse,
+ true, true);
+#ifdef OLD_IMP_MISSING
+ Timer tim;
+ eagle.imputeMissing(hapHedge, i);
+ timeImpMissing += tim.update_time();
+#endif
+ }
+
+ eagle.cpTmpHaploBitsT(iStart, iEnd);
+ cout << "Time for phasing batch: " << timer.update_time() << endl << endl;
+ }
+
+ cout << "Time for PBWT iter " << iter << ": " << (timer.get_time()-t23) << endl;
+#ifdef OLD_IMP_MISSING
+ cout << "Time for PBWT iter " << iter << " impMissing: "
+ << timeImpMissing / params.numThreads << endl;
+#endif
+ //eagle.outputSE(children, nF1s, nF2s, step); // currently requires phaseConfs
+ }
+ }
+
+ /***** FINAL OUTPUT *****/
+
+ if (!params.vcfFile.empty()) {
+ string outFile = params.outPrefix + "." + params.vcfOutSuffix;
+ cout << "Writing " << params.vcfOutSuffix << " output to " << outFile << endl;
+ eagle.writeVcfNonRef(params.vcfFile, outFile, params.chrom, params.chromX, params.bpStart,
+ params.bpEnd, params.vcfWriteMode, argc, argv);
+ }
+ else {
+ cout << "Writing .haps.gz and .sample output" << endl; timer.update_time();
+ eagle.writeHapsGzSample(params.outPrefix);
+ }
+ cout << "Time for writing output: " << timer.update_time() << endl;
+ }
+ else if (params.iter == 1) { // PERFORM 1ST-ITER PHASING FOR A SMALL SUBSET ONLY (FOR TESTING)
+ double t1 = timer.get_time(); timer.update_time(); timeMN2 = 0;
+ map <int, int> longestFreqs; int tot = 0;
+ int att = 0, maxAtt = 10;
+ for (uint i = 0; i < N; i++) {
+ int nF1 = -1, nF2 = -1;
+
+ if (trioIIDs.find(genoData.getIndiv(i).indivID) == trioIIDs.end()) continue;
+ cout << "Testing n0 = " << i << ": " << genoData.getIndiv(i).famID << " "
+ << genoData.getIndiv(i).indivID << endl;
+ pair <string, string> parents = trioIIDs[genoData.getIndiv(i).indivID];
+ for (uint iF = 0; iF < N; iF++)
+ if (genoData.getIndiv(iF).indivID == parents.first ||
+ genoData.getIndiv(iF).indivID == parents.second) {
+ cout << "Parent n1: " << iF << endl;
+ if (nF1 == -1) nF1 = iF;
+ else nF2 = iF;
+ }
+ eagle.checkTrioErrorRate(i, nF1, nF2);
+
+ pair <double, vector <double> > ret = eagle.findLongDipMatches(i, nF1, nF2);
+ timeMN2 += ret.first;
+ for (uint j = 0; j < ret.second.size(); j++) {
+ longestFreqs[(int) ret.second[j]]++;
+ tot++;
+ }
+ att++;
+ if (att == maxAtt) break;
+ }
+ int cum = 0;
+ for (map <int, int>::iterator it = longestFreqs.begin(); it != longestFreqs.end(); it++) {
+ cum += it->second;
+ cout << it->first << " cM: " << it->second << " cum: " << (double) cum/tot << endl;
+ }
+ cout << "Time for step 1: " << (timer.get_time()-t1) << endl;
+ cout << "Time for step 1 MN^2: " << timeMN2 / params.numThreads << endl;
+ }
+ else { // iter > 1 (FOR TESTING)
+ cout << "Reading phase confidences" << endl;
+ eagle.readPhaseConfs(/*"test_UKBiobank/eagle_305_chr10_small_iter2_bsub.tmpPhaseConfs.bin"*/params.tmpPhaseConfsPrefix+".step"+StringUtils::itos(params.iter-1)+".bin");
+ timer.update_time();
+ cout << "Making hard calls" << endl;
+ eagle.makeHardCalls(0, N, params.seed);
+ cout << "Time for hard calls: " << timer.update_time() << endl;
+
+ /* global PBWT
+ cout << "Making forward HapHedge" << endl;
+ HapBitsT hapBitsFwdT(eagle.getHaploBitsT(), 2*eagle.getNlib(params.iter), eagle.getMseg64(),
+ eagle.getMaskSnps64j());
+ int skip = 1;
+ HapHedge hapHedgeFwd(hapBitsFwdT, skip);
+ cout << "Making backward HapHedge" << endl;
+ HapBitsT hapBitsBwdT(hapBitsFwdT, -1);
+ HapHedge hapHedgeBwd(hapBitsBwdT, skip);
+ cout << "Time for HapHedge: " << timer.update_time() << endl;
+ */
+#define USE_PBWT
+#ifndef USE_PBWT
+ cout << "Building hash tables" << endl;
+ eagle.buildHashTables(params.iter, 1, params.seed);
+ cout << endl << "Time for hash tables: " << timer.update_time() << endl;
+#else
+ eagle.reallocLRPtoPBWT();
+#endif
+ int att = 0, maxAtt = N;
+ for (uint i = 0; i < N; i++) {
+ int nF1 = -1, nF2 = -1;
+
+ if (trioIIDs.find(genoData.getIndiv(i).indivID) == trioIIDs.end()) continue;
+ cout << "Testing n0 = " << i << ": " << genoData.getIndiv(i).famID << " "
+ << genoData.getIndiv(i).indivID << endl;
+ pair <string, string> parents = trioIIDs[genoData.getIndiv(i).indivID];
+ for (uint iF = 0; iF < N; iF++)
+ if (genoData.getIndiv(iF).indivID == parents.first ||
+ genoData.getIndiv(iF).indivID == parents.second) {
+ cout << "Parent n1: " << iF << endl;
+ if (nF1 == -1) nF1 = iF;
+ else nF2 = iF;
+ }
+ eagle.checkTrioErrorRate(i, nF1, nF2);
+#ifdef USE_PBWT
+ eagle.runPBWT(i, nF1, nF2, params.Kpbwt, params.expectIBDcM, params.histFactor, true, false,
+ !params.noImpMissing);
+#else
+ timeMN2 += params.iter == 2 ? eagle.findLongHapMatches(i, nF1, nF2, params.iter)
+ : eagle.runHMM(i, nF1, nF2, params.iter,
+ params.iter==3 ? params.beamWidth3 : params.beamWidth4,
+ params.maxHapStates/*, &hapHedgeFwd, &hapHedgeBwd*/);
+#endif
+ att++; if (att == maxAtt) break;
+ }
+
+ cout << "Time for step minus init: " << timer.update_time() << endl;
+ cout << "Time for MN^2: " << timeMN2 / (params.iter == 0 ? params.numThreads : 1) << endl;
+#ifdef RDTSC_TIMING
+ printf("%.1f%% of time in dip-hap\n", 100*eagle.diphapTicks / (double) eagle.totTicks);
+ printf("%.1f%% of time in ext\n", 100*eagle.extTicks / (double) eagle.totTicks);
+ printf("%.1f%% of time in LSH\n", 100*eagle.lshTicks / (double) eagle.totTicks);
+ printf("%.1f%% of time in LSH hit checks\n", 100*eagle.lshCheckTicks / (double) eagle.totTicks);
+ printf("%.1f%% of time in DP\n", 100*eagle.dpTicks / (double) eagle.totTicks);
+ printf(" rel %.1f%% in sort\n", 100*eagle.dpSortTicks / (double) eagle.dpTicks);
+ printf(" rel %.1f%% in update\n", 100*eagle.dpUpdateTicks / (double) eagle.dpTicks);
+ printf(" rel %.1f%% in computeStatic\n", 100*eagle.dpStaticTicks / (double) eagle.dpTicks);
+ printf(" rel %.1f%% in computeSwitch\n", 100*eagle.dpSwitchTicks / (double) eagle.dpTicks);
+ printf("%.1f%% of time in blip fix\n", 100*eagle.blipFixTicks / (double) eagle.totTicks);
+ printf(" rel %.1f%% in LSH\n", 100*eagle.blipLshTicks / (double) eagle.blipFixTicks);
+ printf(" rel %.1f%% in popcount\n", 100*eagle.blipPopTicks / (double) eagle.blipFixTicks);
+ printf(" rel %.1f%% in vote update\n", 100*eagle.blipVoteTicks / (double) eagle.blipFixTicks);
+ cout << "Number of update calls: " << eagle.dpUpdateCalls << endl;
+#endif
+ }
+
+ cout << "Total elapsed time for analysis = " << (timer.get_time()-t0) << " sec" << endl;
+}
diff --git a/src/EaglePBWT.cpp b/src/EaglePBWT.cpp
new file mode 100644
index 0000000..94ffeb6
--- /dev/null
+++ b/src/EaglePBWT.cpp
@@ -0,0 +1,744 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <vector>
+#include <iostream>
+#include <algorithm>
+#include <cmath>
+#include <cstdlib>
+#include <cstring>
+#include <cassert>
+
+#include "DipTreePBWT.hpp"
+#include "HapHedge.hpp"
+#include "NumericUtils.hpp"
+#include "Timer.hpp"
+#include "Types.hpp"
+#include "Eagle.hpp"
+
+namespace EAGLE {
+
+ using std::vector;
+ using std::cout;
+ using std::endl;
+
+ struct ProbInd {
+ float prob;
+ int ind1, ind2;
+ ProbInd(float _prob=0, int _ind1=0, int _ind2=0) : prob(_prob), ind1(_ind1), ind2(_ind2) {}
+ bool operator < (const ProbInd &pi) const {
+ return std::min(prob, 1-prob) < std::min(pi.prob, 1-pi.prob);
+ }
+ };
+
+ inline int popcount64_012(uint64 i) {
+ if (i == 0) return 0;
+ else if ((i & (i-1ULL)) == 0) return 1;
+ else return 2;
+ }
+
+ vector <uint> Eagle::findMinErrDipHap(uint64 n0, uint K, bool useTargetHaps) const {
+
+ uint64 Nhaps = 2*((Nref==0 || useTargetHaps) ? N : Nref);
+ if (K > Nhaps) K = Nhaps;
+ vector <uint> bestHaps; bestHaps.reserve(K);
+ if (K == Nhaps) {
+ for (uint64 nHap = 0; nHap < Nhaps; nHap++)
+ if (nHap/2 != n0)
+ bestHaps.push_back(nHap);
+ }
+ else {
+ vector < pair <uint, uint> > hapErrInds(Nhaps);
+ vector <uint64_masks> genoBitsT(Mseg64);
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) genoBitsT[m64] = genoBits[m64*N + n0];
+ for (uint64 nHap = 0; nHap < Nhaps; nHap++) {
+ uint numErrs = (nHap/2 == n0 ? 1000000 : 0);
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ uint64 is1 = haploBitsT[nHap*Mseg64 + m64];
+ uint64 wrongBits = (genoBitsT[m64].is0 & is1) | (genoBitsT[m64].is2 & ~is1);
+ numErrs += popcount64_012(wrongBits);
+ }
+ hapErrInds[nHap].first = numErrs;
+ hapErrInds[nHap].second = nHap;
+ }
+ std::sort(hapErrInds.begin(), hapErrInds.end());
+ for (uint k = 0; k < K; k++)
+ if (hapErrInds[k].second/2 != n0)
+ bestHaps.push_back(hapErrInds[k].second);
+ }
+ return bestHaps;
+ }
+
+ vector <bool> Eagle::computeRefIsMono(const vector <uint> &bestHaps) const {
+ vector <bool> refIsMono(Mseg64*64, true);
+ vector <uint64> anyIs0(Mseg64), anyIs1(Mseg64);
+ for (uint i = 0; i < bestHaps.size(); i++) {
+ uint64 nHap = bestHaps[i];
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ uint64 is1 = haploBitsT[nHap*Mseg64 + m64];
+ anyIs0[m64] |= ~is1;
+ anyIs1[m64] |= is1;
+ }
+ }
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++) {
+ if (maskSnps64j[m64j]) {
+ uint64 m64 = m64j/64, j = m64j&63;
+ refIsMono[m64j] = !((anyIs0[m64]>>j)&1) || !((anyIs1[m64]>>j)&1);
+ }
+ }
+ return refIsMono;
+ }
+
+ float Eagle::runPBWT(uint64 n0, uint64 nF1, uint64 nF2, int Kpbwt, double cMexpect,
+ double histFactor, bool runReverse, bool useTargetHaps, bool impMissing) {
+ vector < pair <int, int> > noConPS;
+ return runPBWT(n0, nF1, nF2, Kpbwt, cMexpect, histFactor, runReverse, useTargetHaps,
+ impMissing, 0, noConPS);
+ }
+
+ float Eagle::runPBWT(uint64 n0, uint64 nF1, uint64 nF2, int Kpbwt, double cMexpect,
+ double histFactor, bool runReverse, bool useTargetHaps, bool impMissing,
+ int usePS, const vector < pair <int, int> > &conPS) {
+ Timer timer;
+
+ vector <uint> m64jInds(Mseg64*64+1);
+
+ const int SPEED_FACTOR = 1; const float lnPerr = logf(powf(10.0f, logPerr));
+ const int CALL_LENGTH_FACTOR = 1;
+
+ bool print = (int) nF1 != -1;
+
+
+ /***** SELECT BEST REFERENCE HAPLOTYPES *****/
+
+ if (print) cout << "selecting " << Kpbwt << " ref haps... " << std::flush;
+ vector <uint> bestHaps = findMinErrDipHap(n0, Kpbwt, useTargetHaps);
+ // find sites at which only one allele is represented in bestHaps => can't be used as split
+ vector <bool> refIsMono = computeRefIsMono(bestHaps); // size = Mseg64*64
+ if (print) cout << " done " << timer.update_time() << endl;
+
+
+ /***** PROCESS TARGET GENOTYPES *****/
+
+ // create vectors of genos, genoBits, and hets
+ vector <uchar> genos64j(Mseg64*64);
+ vector <uint64> hets64j, refMonoHets64j;
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++) {
+ genos64j[m64j] = getGeno0123(m64j, n0);
+ if (maskSnps64j[m64j] && genos64j[m64j]==1) {
+ if (!refIsMono[m64j])
+ hets64j.push_back(m64j);
+ else
+ refMonoHets64j.push_back(m64j);
+ }
+ }
+ vector <uint64_masks> tgtGenoBits(Mseg64);
+ for (uint64 m64 = 0; m64 < Mseg64; m64++)
+ tgtGenoBits[m64] = genoBits[m64*N + n0];
+
+ vector <uint64> pbwtBitsFine(Mseg64);
+ float conf = 0;
+
+ // find split sites (for PBWT HapTree starts): hets and occasionally inter-het sites
+ vector < pair <int, int> > tCallLocs; vector <int> tHomLocs;
+ vector <uint64> splits64j;
+ const double cMmaxSplit = 0.5;
+ if (!hets64j.empty()) {
+ splits64j.push_back(hets64j[0]);
+ for (uint64 h = 1; h < hets64j.size(); h++) {
+ int lastCallLoc = splits64j.size(); // old het ind + 1: tree indices are split indices + 1
+
+ for (uint64 m64j = hets64j[h-1]+1; m64j <= hets64j[h]; m64j++)
+ if (maskSnps64j[m64j] && !refIsMono[m64j] && genos64j[m64j] <= 2)
+ if (m64j == hets64j[h] || cMs64j[m64j] > cMs64j[splits64j.back()] + cMmaxSplit) {
+ splits64j.push_back(m64j);
+ if (m64j < hets64j[h])
+ tHomLocs.push_back(splits64j.size()); // hom ind + 1
+ }
+ int nextCallLoc = splits64j.size(); // new het ind + 1
+ tCallLocs.push_back(make_pair(lastCallLoc, nextCallLoc));
+ }
+ }
+ else { // all hom or missing or mono in ref; put in splits as necessary
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++)
+ if (maskSnps64j[m64j] && !refIsMono[m64j] && genos64j[m64j] <= 2)
+ if (splits64j.empty() || cMs64j[m64j] > cMs64j[splits64j.back()] + cMmaxSplit) {
+ splits64j.push_back(m64j);
+ tHomLocs.push_back(splits64j.size()); // hom ind + 1
+ }
+ }
+ if (print) {
+ cout << "num hets (poly in best haps): " << hets64j.size();
+ cout << " num hets (mono in best haps): " << refMonoHets64j.size();
+ cout << " num splits: " << splits64j.size() << endl;
+ }
+
+ // allocate storage for reference haplotype samples (to use for impMissing and singleton hets)
+ const int samples = 10;
+ vector < vector <HapPair> > refSamples[2];
+ refSamples[0].resize(splits64j.size()+1);
+ refSamples[1].resize(splits64j.size()+1);
+ // store which inter-split chunks contain at least one missing site
+ vector <bool> tHasMissing(splits64j.size()+1); // "missing" = missing or singleton het
+ for (int t = 0; t <= (int) splits64j.size(); t++) {
+ uint64 m64jPrev = t==0 ? -1ULL : splits64j[t-1];
+ uint64 m64jNext = t==(int) splits64j.size() ? Mseg64*64 : splits64j[t];
+ for (uint64 m64j = m64jPrev+1; m64j < m64jNext; m64j++)
+ if (maskSnps64j[m64j] && ((genos64j[m64j] == 3 && impMissing) || // missing
+ (genos64j[m64j] == 1 && refIsMono[m64j]))) // singleton het
+ tHasMissing[t] = true;
+ }
+
+ // create vector of genos at split sites (padded on left and right to match hapBitsT)
+ vector <uchar> splitGenos;
+ splitGenos.push_back(0); // pad on left with 0 (to match hapBitsT)
+ for (uint64 s = 0; s < splits64j.size(); s++)
+ splitGenos.push_back(genos64j[splits64j[s]]);
+ splitGenos.push_back(0); // pad on right with 0 (to match hapBitsT)
+
+ // check for 0 or 1 het (warn)
+ if (hets64j.size() <= 1) {
+ cerr << "WARNING: Sample " << n0-Nref+1 << " (1-indexed) has a het count of "
+ << hets64j.size() << endl;
+ }
+
+ // compute recombination probabilities
+ vector <double> cMcoords(splits64j.size()+2);
+ for (uint64 s = 0; s <= splits64j.size(); s++) {
+ uint64 splitStart = (s == 0 ? 0 : splits64j[s-1]);
+ uint64 splitStop = (s == splits64j.size() ? Mseg64*64 : splits64j[s]);
+ cMcoords[s] = cMs64j[splitStart]; cMcoords[s+1] = cMs64j[splitStop];
+ int homs = 0;
+ for (uint64 m64j = splitStart+1; m64j < splitStop; m64j++)
+ if (genos64j[m64j] == 0 || genos64j[m64j] == 2)
+ homs++;
+ }
+
+ /**** BUILD PBWT DATA STRUCTURE *****/
+
+ // create HapBitsT encoding of ref hets and hom errs
+ if (print) cout << "making HapBitsT... " << std::flush;
+ HapBitsT hapBitsT(haploBitsT, Mseg64, splits64j, splitGenos, tgtGenoBits, bestHaps);
+ if (print) cout << " done " << timer.update_time() << endl;
+
+ // create HapHedge PBWT data structure
+ if (print) cout << "making HapHedge... " << std::flush;
+ HapHedgeErr *hapHedgePtr = new HapHedgeErr(hapBitsT);
+ if (print) cout << " done " << timer.update_time() << endl;
+
+ //hapHedgePtr->printTree(0);
+
+ /***** RUN COARSE (UNCONSTRAINED) DIPTREE SEARCH *****/
+
+ // initialize DipTree object
+ if (print) cout << "making DipTree (unconstr)... " << std::flush;
+ vector <char> constraints(splitGenos.size(), NO_CONSTRAINT);
+ vector <int> splitInds(Mseg64*64+1); // index map for FORMAT:PS constraints
+ if (usePS) {
+ // populate splitInds: 1-based indices t=1..T-2 in splits64j[t-1] of 1-based SNPs m+1
+ for (uint64 m64j = 0, m = 0, t = 0; m64j < Mseg64*64; m64j++)
+ if (maskSnps64j[m64j]) {
+ m++;
+ m64jInds[m] = m64j;
+ if (t < splits64j.size() && splits64j[t]==m64j) {
+ t++;
+ splitInds[m] = t;
+ }
+ else
+ splitInds[m] = 0;
+ }
+
+ // set constraints for fast search
+ for (uint c = 0; c < conPS.size(); c++)
+ if (splitInds[conPS[c].first] && splitInds[abs(conPS[c].second)])
+ constraints[splitInds[conPS[c].first]] =
+ ((splitInds[conPS[c].first]-splitInds[abs(conPS[c].second)])<<1)|(conPS[c].second<0);
+ }
+
+ const int histLengthFast = 30*histFactor, pbwtBeamWidthFast = 30/SPEED_FACTOR;
+ DipTree dipTreeFast(*hapHedgePtr, splitGenos, &constraints[0], cMcoords, cMexpect,
+ histLengthFast, pbwtBeamWidthFast, lnPerr, 0);
+ if (print) cout << " done " << timer.update_time() << endl;
+
+ // explore search space; make phase calls
+ if (print) cout << "making phase calls (uncon)... " << std::flush;
+ const int callLengthFast = 10*CALL_LENGTH_FACTOR;
+ const float minFix = 0.5f, maxFix = 0.9f, fixThresh = 0.01f;
+ vector <ProbInd> probInds;
+ vector <uint64> pbwtBitsFast(Mseg64);
+ uint64 lastBit = 0;
+ for (uint64 i = 0; i < tCallLocs.size(); i++) {
+ float probAA = dipTreeFast.callProbAA(tCallLocs[i].first, tCallLocs[i].second,
+ callLengthFast);
+ ProbInd probInd(probAA, tCallLocs[i].first, tCallLocs[i].second);
+ if (probAA < 0.5f)
+ lastBit = !lastBit;
+ uint64 m64j = hets64j[i+1];
+ pbwtBitsFast[m64j/64] |= lastBit<<(m64j&63);
+ if (i > 0) { // try calling rel phase vs. 2 hets back (in case prev het is err)
+ float probAA2 = dipTreeFast.callProbAA(tCallLocs[i-1].first, tCallLocs[i].second,
+ callLengthFast);
+ ProbInd probInd2(probAA2, tCallLocs[i-1].first, tCallLocs[i].second);
+ if (probInd2 < probInd)
+ probInd = probInd2;
+ }
+ probInds.push_back(probInd);
+ }
+ int T = splitGenos.size(); // splits64j.size()+2; tree indices are split indices + 1
+ vector <char> revConstraints(T, NO_CONSTRAINT);
+ // set relative phase constraints for most confident hets
+ std::sort(probInds.begin(), probInds.end());
+ //float fracFixed = 0;
+ for (int f = 0; f < minFix*probInds.size() ||
+ (f < maxFix*probInds.size()
+ && (probInds[f].prob < fixThresh || probInds[f].prob > 1-fixThresh)) ||
+ (f < (int) probInds.size() && (probInds[f].prob==0 || probInds[f].prob==1)); f++) {
+ //fracFixed = (f+1.0f) / probInds.size();
+ constraints[probInds[f].ind2] = // het constraint: fix relative phase of ind2 wrt ind1
+ revConstraints[T-1-probInds[f].ind1] =
+ ((probInds[f].ind2 - probInds[f].ind1)<<1) | (probInds[f].prob < 0.5f);
+ if (constraints[probInds[f].ind1] == NO_CONSTRAINT)
+ constraints[probInds[f].ind1] = OPP_CONSTRAINT; // set to -2 = "start of het block"
+ revConstraints[T-1-probInds[f].ind2] = OPP_CONSTRAINT;
+ }
+ // make hom dosage calls (and divide by 2 to sort properly: uncertainty = dist from 0 or 1)
+ probInds.clear();
+ for (uint64 i = 0; i < tHomLocs.size(); i++)
+ probInds.push_back(ProbInd(dipTreeFast.callDosage(tHomLocs[i], callLengthFast) / 2,
+ tHomLocs[i]));
+ std::sort(probInds.begin(), probInds.end());
+ for (int f = 0; f < minFix*probInds.size() ||
+ (f < maxFix*probInds.size()
+ && (probInds[f].prob < fixThresh || probInds[f].prob > 1-fixThresh)) ||
+ (f < (int) probInds.size() && (probInds[f].prob==0 || probInds[f].prob==1)); f++)
+ constraints[probInds[f].ind1] = revConstraints[T-1-probInds[f].ind1] =
+ (probInds[f].prob >= 0.5f); // hom constraint: no err allowed
+ if (print) cout << " done " << timer.update_time() << endl;
+ //cout << "frac fixed: " << fracFixed << endl;
+
+ // set constraints for fine search
+ if (usePS == 2) {
+ for (uint c = 0; c < conPS.size(); c++)
+ if (splitInds[conPS[c].first] && splitInds[abs(conPS[c].second)])
+ constraints[splitInds[conPS[c].first]] =
+ revConstraints[T-1-splitInds[abs(conPS[c].second)]] =
+ ((splitInds[conPS[c].first]-splitInds[abs(conPS[c].second)])<<1)|(conPS[c].second<0);
+ }
+
+ /***** RUN FINE (CONSTRAINED) DIPTREE SEARCH *****/
+
+ // initialize DipTree object
+ if (print) cout << "making DipTree (constrained)..." << std::flush;
+ const int histLengthFine = 100*histFactor, pbwtBeamWidthFine = 50/SPEED_FACTOR;
+ DipTree dipTreeFine(*hapHedgePtr, splitGenos, &constraints[0], cMcoords, cMexpect,
+ histLengthFine, pbwtBeamWidthFine, lnPerr, 0);
+ if (print) cout << " done " << timer.update_time() << endl;
+
+ // explore search space; make phase calls
+ if (print) cout << "making phase calls (constr)... " << std::flush;
+ const int callLengthFine = 20*CALL_LENGTH_FACTOR;
+ const int callLengthSample = 20;
+
+ // sample refs (BEFORE callProbAA: sampleRefs needs recent history that gets overwritten)
+ for (int t = 0; t < T-1; t++)
+ if (tHasMissing[t])
+ refSamples[0][t] = dipTreeFine.sampleRefs(t, callLengthSample, samples, bestHaps, true);
+
+ vector <float> probAAsCur;
+ lastBit = 0;
+ for (uint64 i = 0; i < tCallLocs.size(); i++) {
+ float probAA = dipTreeFine.callProbAA(tCallLocs[i].first, tCallLocs[i].second,
+ callLengthFine);
+ probAAsCur.push_back(probAA);
+ conf += std::max(probAA, 1-probAA);
+ if (probAA < 0.5f)
+ lastBit = !lastBit;
+ uint64 m64j = hets64j[i+1];
+ pbwtBitsFine[m64j/64] |= lastBit<<(m64j&63);
+ }
+ conf /= tCallLocs.size();
+ if (print) cout << " done " << timer.update_time() << endl;
+
+ delete hapHedgePtr;
+
+
+ if (runReverse) {
+ // create HapBitsT encoding of ref hets and hom errs
+ if (print) cout << "making revHapBitsT... " << std::flush;
+ HapBitsT revHapBitsT(hapBitsT, -2);
+ if (print) cout << " done " << timer.update_time() << endl;
+
+ // create HapHedge PBWT data structure
+ if (print) cout << "making revHapHedge... " << std::flush;
+ HapHedgeErr revHapHedge(revHapBitsT);
+ if (print) cout << " done " << timer.update_time() << endl;
+
+ vector <uchar> revSplitGenos(splitGenos);
+ std::reverse(revSplitGenos.begin(), revSplitGenos.end());
+ vector <double> revcMcoords(T);
+ for (int t = 0; t < T; t++) revcMcoords[t] = cMcoords[T-1] - cMcoords[T-1-t];
+
+ // initialize DipTree object
+ if (print) cout << "making revDipTree (constr)... " << std::flush;
+ DipTree revDipTreeFine(revHapHedge, revSplitGenos, &revConstraints[0], revcMcoords, cMexpect,
+ histLengthFine, pbwtBeamWidthFine, lnPerr, 0);
+ if (print) cout << " done " << timer.update_time() << endl;
+
+ // explore search space; make phase calls
+ if (print) cout << "making rev phase calls (con)..." << std::flush;
+
+ // sample refs (BEFORE callProbAA: sampleRefs needs recent history that gets overwritten)
+ for (int t = T-2; t >= 0; t--)
+ if (tHasMissing[t])
+ refSamples[1][t] =
+ revDipTreeFine.sampleRefs(T-2-t, callLengthSample, samples, bestHaps, false);
+
+ vector <uint64> revPbwtBitsFine(Mseg64);
+ lastBit = 0;
+ for (int i = tCallLocs.size()-1; i >= 0; i--) {
+ float probAA = revDipTreeFine.callProbAA(T-1-tCallLocs[i].second, T-1-tCallLocs[i].first,
+ callLengthFine);
+ if (probAA + probAAsCur[i] < 1)
+ lastBit = !lastBit;
+ uint64 m64j = hets64j[i];
+ revPbwtBitsFine[m64j/64] |= lastBit<<(m64j&63);
+ }
+ if (print) cout << " done " << timer.update_time() << endl;
+ pbwtBitsFine = revPbwtBitsFine;
+ }
+
+ // trio analysis output
+ if (print) {
+ /*
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ checkHaploBits(n0, nF1, nF2, pbwtBitsFast[m64], m64, 25);
+ checkHaploBits(n0, nF1, nF2, pbwtBitsFine[m64], m64, 25);
+ cout << endl;
+ }
+ */
+ vector <bool> phaseVec;
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ vector <bool> phaseSeg = checkHaploBits(n0, nF1, nF2, pbwtBitsFast[m64], m64, -1);
+ phaseVec.insert(phaseVec.end(), phaseSeg.begin(), phaseSeg.end());
+ }
+ printf("FAST# major SE: %2d # tot SE: %2d / %d\n", countMajorSE(phaseVec),
+ countSE(phaseVec), (int) phaseVec.size()-1);
+
+ phaseVec.clear();
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ vector <bool> phaseSeg = checkHaploBits(n0, nF1, nF2, pbwtBitsFine[m64], m64, -1);
+ phaseVec.insert(phaseVec.end(), phaseSeg.begin(), phaseSeg.end());
+ }
+ printf("FINE# major SE: %2d # tot SE: %2d / %d\n", countMajorSE(phaseVec),
+ countSE(phaseVec), (int) phaseVec.size()-1);
+
+
+ // check accuracy of fast calls
+ vector <int> trioRelPhaseVec = trioRelPhase(n0, nF1, nF2);
+ const int NUM_CALL_LENGTHS = 2;
+ int callLengths[NUM_CALL_LENGTHS] = {10, 20/*, 50, 100*/};
+ for (int l = 0; l < NUM_CALL_LENGTHS; l++) {
+ cout << "callLength = " << callLengths[l] << endl;
+ vector <float> probAAs;
+ for (uint64 i = 0; i < tCallLocs.size(); i++)
+ probAAs.push_back(dipTreeFast.callProbAA(tCallLocs[i].first, tCallLocs[i].second,
+ callLengths[l]));
+ vector < vector <float> > probCorFastSlow;
+ for (uint i = 0; i < probAAs.size(); i++)
+ if (trioRelPhaseVec[i] >= 0) {
+ vector <float> tmp(4);
+ tmp[0] = std::min(probAAs[i], 1-probAAs[i]);
+ tmp[1] = (probAAs[i] > 0.5f) == !trioRelPhaseVec[i];
+ tmp[2] = (probAAsCur[i] > 0.5f) == !trioRelPhaseVec[i];
+ probCorFastSlow.push_back(tmp);
+ }
+ std::sort(probCorFastSlow.begin(), probCorFastSlow.end());
+ const int NUM_PCTS = 7;
+ int pcts[NUM_PCTS] = {50, 80, 90, 95, 98, 99, 100};
+ for (int p = 0; p < NUM_PCTS; p++) {
+ int pct = pcts[p];
+ uint iCut = probCorFastSlow.size()*pct/100;
+ int numErrs = 0, numErrsCur = 0;
+ for (uint i = 0; i < iCut; i++) {
+ if (probCorFastSlow[i][1] == 0)
+ numErrs++;
+ if (probCorFastSlow[i][2] == 0)
+ numErrsCur++;
+ }
+ printf(" len=%d,cut=%d%%: p opp = %f, %d errs, %d cur / %d calls\n",
+ callLengths[l], pct, probCorFastSlow[iCut-1][0], numErrs, numErrsCur, iCut);
+ }
+ }
+ }
+
+ // write phase calls
+ uint64 nTargetHap = 2*(n0-Nref);
+ uint64 nTargetOpp = 2*(n0-Nref) + 1;
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ tmpHaploBitsT[nTargetHap*Mseg64 + m64] = pbwtBitsFine[m64];
+ tmpHaploBitsT[nTargetOpp*Mseg64 + m64] = ~pbwtBitsFine[m64];
+ }
+
+ // set hom bits
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ for (uint64 j = 0; j < 64ULL; j++) {
+ uint64 m64j = m64*64+j;
+ if (maskSnps64j[m64j]) {
+ if (genos64j[m64j] == 0) {
+ tmpHaploBitsT[nTargetHap*Mseg64 + m64] &= ~(1ULL<<j);
+ tmpHaploBitsT[nTargetOpp*Mseg64 + m64] &= ~(1ULL<<j);
+ }
+ else if (genos64j[m64j] == 2) {
+ tmpHaploBitsT[nTargetHap*Mseg64 + m64] |= 1ULL<<j;
+ tmpHaploBitsT[nTargetOpp*Mseg64 + m64] |= 1ULL<<j;
+ }
+ }
+ }
+ }
+
+ // impute missing genos and phase hets monomorphic in bestHaps
+ for (int t = 0; t <= (int) splits64j.size(); t++) {
+ if (!tHasMissing[t]) continue;
+
+ if (!runReverse) {
+ // no reverse samples available; just use fwd samples and don't bother with ends
+ for (int s = 0; s < samples; s++)
+ for (int h = 0; h < 2; h++)
+ refSamples[0][t][s].haps[h].isEnd = false;
+ refSamples[1][t] = refSamples[0][t];
+ }
+
+ // identify missing sites
+ const uint64 m64jPrev = t==0 ? -1ULL : splits64j[t-1];
+ const uint64 m64jNext = t==(int) splits64j.size() ? Mseg64*64 : splits64j[t];
+ vector <uint64> miss64j;
+ for (uint64 m64j = m64jPrev+1; m64j < m64jNext; m64j++)
+ if (genos64j[m64j] == 3 && impMissing) // missing
+ miss64j.push_back(m64j);
+
+ // orient each hap pair wrt called phase
+ if (!hets64j.empty())
+ for (int fb = 0; fb < 2; fb++) {
+ for (int s = 0; s < samples; s++) {
+ double cMmid;
+ if (t==0) cMmid = cMs64j[splits64j[t]];
+ else if (t==(int) splits64j.size()) cMmid = cMs64j[splits64j[t-1]];
+ else cMmid = (cMs64j[splits64j[t-1]] + cMs64j[splits64j[t]]) / 2;
+
+ vector < pair <double, int> > cMdistSigns;
+ for (int i = 15; i >= 0; i--)
+ if ((((refSamples[fb][t][s].haps[0].tMaskRev ^
+ refSamples[fb][t][s].haps[1].tMaskRev)>>i)&1)
+ && splitGenos[t-i]==1) {
+ uint64 m64j = splits64j[t-i-1];
+ cMdistSigns.push_back(make_pair(fabs(cMs64j[m64j]-cMmid), ((refSamples[fb][t][s].haps[0].tMaskRev>>i)&1) == ((pbwtBitsFine[m64j/64]>>(m64j&63))&1)));
+ }
+ for (int i = 0; i < 16; i++)
+ if ((((refSamples[fb][t][s].haps[0].tMaskFwd ^
+ refSamples[fb][t][s].haps[1].tMaskFwd)>>i)&1)
+ && splitGenos[t+i+1]==1) {
+ uint64 m64j = splits64j[t+i];
+ cMdistSigns.push_back(make_pair(fabs(cMs64j[m64j]-cMmid), ((refSamples[fb][t][s].haps[0].tMaskFwd>>i)&1) == ((pbwtBitsFine[m64j/64]>>(m64j&63))&1)));
+ }
+ if (!cMdistSigns.empty()) {
+ sort(cMdistSigns.begin(), cMdistSigns.end());
+ if (!cMdistSigns[0].second)
+ std::swap(refSamples[fb][t][s].haps[0], refSamples[fb][t][s].haps[1]);
+ }
+ }
+ }
+
+ // for each haplotype in turn, call missing sites (and save mean hap length)
+ double hMeanLens[2];
+ for (int h = 0; h < 2; h++) {
+ vector <int> endRefs[2];
+ vector <int> nonEndRefs[2], nonEndLens[2];
+ // split sampled haplotypes into buckets: those that end in (t,t+1) and those that don't
+ for (int fb = 0; fb < 2; fb++)
+ for (int s = 0; s < samples; s++) {
+ if (refSamples[fb][t][s].haps[h].isEnd)
+ endRefs[fb].push_back(refSamples[fb][t][s].haps[h].refSeq);
+ else {
+ nonEndRefs[fb].push_back(refSamples[fb][t][s].haps[h].refSeq);
+ nonEndLens[fb].push_back(refSamples[fb][t][s].haps[h].tLength);
+ }
+ }
+
+ // initialize allele dosages
+ int nMiss = miss64j.size();
+ double alleleDoses[nMiss][2];
+ for (int m = 0; m < nMiss; m++) alleleDoses[m][0] = alleleDoses[m][1] = 0;
+
+ // process non-ends: call using longer of fwd, rev ref samples
+ double meanLens[2] = {0, 0};
+ for (int fb = 0; fb < 2; fb++)
+ if (!nonEndLens[fb].empty())
+ meanLens[fb] = std::accumulate(nonEndLens[fb].begin(), nonEndLens[fb].end(), 0) /
+ (double) nonEndLens[fb].size();
+ int fbLong = meanLens[0] > meanLens[1] ? 0 : 1;
+ for (uint k = 0; k < nonEndRefs[fbLong].size(); k++) {
+ int refSeq = nonEndRefs[fbLong][k];
+ for (int m = 0; m < nMiss; m++)
+ alleleDoses[m][(haploBitsT[refSeq*Mseg64+miss64j[m]/64]>>(miss64j[m]&63))&1] += 1;
+ }
+
+ // compute mean lengths including ends=0 for phasing singletons later
+ for (int fb = 0; fb < 2; fb++)
+ meanLens[fb] = std::accumulate(nonEndLens[fb].begin(), nonEndLens[fb].end(), 0) /
+ (double) samples;
+ hMeanLens[h] = std::max(meanLens[0], meanLens[1]); // set to max of fwd, rev
+
+ // process ends: find most likely recombination points at which fwd and rev haps meet
+ if (!endRefs[0].empty() && !endRefs[1].empty()) {
+ for (uint k = 0; k < endRefs[0].size() && k < endRefs[1].size(); k++) {
+ int refSeqFwd = endRefs[0][k], refSeqRev = endRefs[1][k];
+ int errFwd = 0, errRev = 0;
+ for (uint64 m64j = m64jPrev+1; m64j < m64jNext; m64j++) {
+ if ((genos64j[m64j] == 0 || genos64j[m64j] == 2) &&
+ (((haploBitsT[refSeqRev*Mseg64+m64j/64]>>(m64j&63))&1) != genos64j[m64j]/2))
+ errRev++;
+ }
+
+ // find recombination points that minimize errors (usually 0 errors)
+ int minErr = 1<<30;
+ vector <double> cMdiffs; vector <uint64> revStarts;
+
+ for (uint64 m64j = m64jPrev; m64j < m64jNext; m64j++) {
+ if (m64j != m64jPrev) { // update err counts
+ if ((genos64j[m64j] == 0 || genos64j[m64j] == 2) &&
+ (((haploBitsT[refSeqFwd*Mseg64+m64j/64]>>(m64j&63))&1) != genos64j[m64j]/2))
+ errFwd++;
+ if ((genos64j[m64j] == 0 || genos64j[m64j] == 2) &&
+ (((haploBitsT[refSeqRev*Mseg64+m64j/64]>>(m64j&63))&1) != genos64j[m64j]/2))
+ errRev--;
+ }
+
+ // rev starts at m64j+1
+ double cMdiff = cMs64j[m64j+1] - (m64j==-1ULL ? 0 : cMs64j[m64j]) + 1e-9;
+
+ if (errFwd+errRev < minErr) {
+ minErr = errFwd+errRev;
+ cMdiffs.clear();
+ revStarts.clear();
+ }
+ if (errFwd+errRev == minErr) {
+ cMdiffs.push_back(cMdiff);
+ revStarts.push_back(m64j+1);
+ }
+ }
+
+ // augment dosages proportionally to cMdiffs (btwn consecutive SNPs) at recomb points
+ double cMtot = std::accumulate(cMdiffs.begin(), cMdiffs.end(), 0.0);
+ for (int m = 0; m < nMiss; m++) {
+ double cMcum = 0;
+ for (uint x = 0; x < revStarts.size(); x++) {
+ if (miss64j[m] < revStarts[x])
+ cMcum += cMdiffs[x];
+ else
+ break;
+ }
+ alleleDoses[m][(haploBitsT[refSeqFwd*Mseg64+miss64j[m]/64]>>(miss64j[m]&63))&1] +=
+ cMcum / cMtot;
+ alleleDoses[m][(haploBitsT[refSeqRev*Mseg64+miss64j[m]/64]>>(miss64j[m]&63))&1] +=
+ 1 - cMcum / cMtot;
+ }
+ }
+ }
+
+ // make final calls
+ for (int m = 0; m < nMiss; m++) {
+ uint64 m64 = miss64j[m]/64, j = miss64j[m]&63;
+ if (alleleDoses[m][0] >= alleleDoses[m][1])
+ tmpHaploBitsT[(nTargetHap+h)*Mseg64 + m64] &= ~(1ULL<<j);
+ else
+ tmpHaploBitsT[(nTargetHap+h)*Mseg64 + m64] |= 1ULL<<j;
+ }
+ }
+
+ // call phase at "singleton" hets monomorphic among bestHaps
+ for (uint64 m64j = m64jPrev+1; m64j < m64jNext; m64j++)
+ if (genos64j[m64j] == 1 && refIsMono[m64j]) { // "singleton" het
+ uint64 m64 = m64j/64, j = m64j&63;
+ uint64 commonBit = haploBitsT[bestHaps[0]*Mseg64 + m64] & (1ULL<<j);
+ uint64 rareBit = commonBit ^ (1ULL<<j);
+
+ int hShorter = hMeanLens[0] < hMeanLens[1] ? 0 : 1; // put rare allele on shorter hap
+
+ for (int h = 0; h < 2; h++)
+ tmpHaploBitsT[(nTargetHap+h)*Mseg64 + m64] &= ~(1ULL<<j); // clear bit
+ tmpHaploBitsT[(nTargetHap+hShorter)*Mseg64 + m64] |= rareBit;
+ tmpHaploBitsT[(nTargetHap+!hShorter)*Mseg64 + m64] |= commonBit;
+ /*
+ cout << "common bit: " << commonBit << endl;
+ cout << "rare bit: " << rareBit << endl;
+ cout << "hMeanLens[0]: " << hMeanLens[0] << endl;
+ cout << "hMeanLens[1]: " << hMeanLens[1] << endl;
+ cout << "hShorter: " << hShorter << endl;
+ */
+ }
+ }
+ /*
+ for (int t = 0; t < 500; t += 100) {
+ cout << "==== t: " << t << " ====" << endl;
+ for (int fb = 0; fb < 2; fb++) {
+ cout << "--- fb: " << fb << " ---" << endl;
+ for (int s = 0; s < 3; s++) {
+ cout << ".. sample: " << s << " .. " << refSamples[fb][t][s].haps[0].tLength << "," << refSamples[fb][t][s].haps[1].tLength << endl;
+ for (int h = 0; h < 2; h++) {
+ for (int i = 15; i >= 0; i--) {
+ if ((((refSamples[fb][t][s].haps[h].tMaskRev ^ refSamples[fb][t][s].haps[1-h].tMaskRev)>>i)&1) && splitGenos[t-i]==1) {
+ uint64 m64j = splits64j[t-i-1];
+ cout << (((refSamples[fb][t][s].haps[h].tMaskRev>>i)&1) == ((pbwtBitsFine[m64j/64]>>(m64j&63))&1) ? "+" : "-");
+ }
+ else
+ cout << ".";
+ }
+ cout << "|";
+ for (int i = 0; i < 16; i++) {
+ if ((((refSamples[fb][t][s].haps[h].tMaskFwd ^ refSamples[fb][t][s].haps[1-h].tMaskFwd)>>i)&1) && splitGenos[t+i+1]==1) {
+ uint64 m64j = splits64j[t+i];
+ cout << (((refSamples[fb][t][s].haps[h].tMaskFwd>>i)&1) == ((pbwtBitsFine[m64j/64]>>(m64j&63))&1) ? "+" : "-");
+
+ }
+ else
+ cout << ".";
+ }
+ cout << " ";
+ }
+ cout << endl;
+ }
+ }
+ }
+ */
+
+ if (print && usePS) {
+ int correct = 0;
+ for (uint c = 0; c < conPS.size(); c++) {
+ int m1 = conPS[c].first, m2 = abs(conPS[c].second), isOpp = conPS[c].second<0;
+ uint m64j1 = m64jInds[m1], m64j2 = m64jInds[m2];
+ assert(genos64j[m64j1]==1 && genos64j[m64j2]==1);
+ correct += (uint) isOpp == (((pbwtBitsFine[m64j1/64]>>(m64j1&63))&1) ^
+ ((pbwtBitsFine[m64j2/64]>>(m64j2&63))&1));
+ }
+ cout << "constraints respected: " << correct << " / " << conPS.size() << endl;
+ }
+
+ return conf;
+ }
+
+}
diff --git a/src/EagleParams.cpp b/src/EagleParams.cpp
new file mode 100644
index 0000000..548b9cb
--- /dev/null
+++ b/src/EagleParams.cpp
@@ -0,0 +1,379 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <vector>
+#include <string>
+#include <iostream>
+#include <cstdlib>
+
+#include "StringUtils.hpp"
+#include "FileUtils.hpp"
+#include "EagleParams.hpp"
+
+#include <boost/program_options.hpp>
+
+namespace EAGLE {
+
+ using std::vector;
+ using std::string;
+ using std::cout;
+ using std::cerr;
+ using std::endl;
+
+ // populates members; error-checks
+ bool EagleParams::processCommandLineArgs(int argc, char *argv[]) {
+
+ vector <string> removeFileTemplates, excludeFileTemplates;
+ string chromStr; // allow "X"
+
+ namespace po = boost::program_options;
+
+ po::options_description commonOptions;
+ commonOptions.add_options()
+ ("geneticMapFile", po::value<string>(&geneticMapFile)->required(),
+ "HapMap genetic map provided with download: tables/genetic_map_hg##.txt.gz")
+ // "chr pos rate(cM/Mb) map(cM)"
+ ("outPrefix", po::value<string>(&outPrefix)->required(), "prefix for output files")
+ ("numThreads", po::value<int>(&numThreads)->default_value(1),
+ "number of computational threads")
+ ;
+
+ po::options_description nonRefMode
+ ("Input options for phasing without a reference");
+ nonRefMode.add_options()
+ // genotype data parameters
+ ("bfile", po::value<string>(), "prefix of PLINK .fam, .bim, .bed files")
+ ("bfilegz", po::value<string>(), "prefix of PLINK .fam.gz, .bim.gz, .bed.gz files")
+ ("fam", po::value<string>(&famFile),
+ "PLINK .fam file (note: file names ending in .gz are auto-decompressed)")
+ ("bim", po::value<string>(&bimFile), "PLINK .bim file")
+ ("bed", po::value<string>(&bedFile), "PLINK .bed file")
+ ("vcf", po::value<string>(&vcfFile), "[compressed] VCF/BCF file containing input genotypes")
+ ("remove", po::value< vector <string> >(&removeFileTemplates),
+ "file(s) listing individuals to ignore (no header; FID IID must be first two columns)")
+ ("exclude", po::value< vector <string> >(&excludeFileTemplates),
+ "file(s) listing SNPs to ignore (no header; SNP ID must be first column)")
+ ("maxMissingPerSnp", po::value<double>(&maxMissingPerSnp)->default_value(0.1, "0.1"),
+ "QC filter: max missing rate per SNP")
+ ("maxMissingPerIndiv", po::value<double>(&maxMissingPerIndiv)->default_value(0.1, "0.1"),
+ "QC filter: max missing rate per person")
+ ;
+
+ po::options_description refMode
+ ("Input/output options for phasing using a reference panel");
+ refMode.add_options()
+ ("vcfRef", po::value<string>(&vcfRef),
+ "tabix-indexed [compressed] VCF/BCF file for reference haplotypes")
+ ("vcfTarget", po::value<string>(&vcfTarget),
+ "tabix-indexed [compressed] VCF/BCF file for target genotypes")
+ ("vcfOutFormat", po::value<string>(&vcfOutFormat)->default_value("z"),
+ "b|u|z|v: compressed BCF (b), uncomp BCF (u), compressed VCF (z), uncomp VCF (v)")
+ ("noImpMissing", "disable imputation of missing ./. target genotypes")
+ ("allowRefAltSwap", "allow swapping of REF/ALT in target vs. ref VCF")
+ ;
+
+ po::options_description bothModes("Region selection options");
+ bothModes.add_options()
+ ("chrom", po::value<string>(&chromStr)->default_value("0"),
+ "chromosome to analyze (if input has many)")
+ ("bpStart", po::value<double>(&bpStart)->default_value(0),
+ "minimum base pair position to analyze")
+ ("bpEnd", po::value<double>(&bpEnd)->default_value(1e9, "1e9"),
+ "maximum base pair position to analyze")
+ ("bpFlanking", po::value<double>(&bpFlanking)->default_value(0),
+ "(ref-mode only) flanking region to use during phasing but discard in output")
+ ;
+
+ po::options_description algOptions("Algorithm options");
+ algOptions.add_options()
+ ("Kpbwt", po::value<int>(&Kpbwt)->default_value(10000),
+ "number of conditioning haplotypes") // TODO: throw error if set in --v1 mode
+ ("pbwtIters", po::value<int>(&pbwtIters)->default_value(0),
+ "number of PBWT phasing iterations (0=auto)")
+ ("expectIBDcM", po::value<double>(&expectIBDcM)->default_value(2, "2.0"),
+ "expected length of haplotype copying (cM)")
+ ("histFactor", po::value<double>(&histFactor)->default_value(0, "0"),
+ "history length multiplier (0=auto)")
+ ("genoErrProb", po::value<double>(&pErr)->default_value(0.003, "0.003"),
+ "estimated genotype error probability")
+ ("pbwtOnly", "in non-ref mode, use only PBWT iters (automatic for sequence data)")
+ ("v1", "use Eagle1 phasing algorithm (instead of default Eagle2 algorithm)")
+ ;
+
+ po::options_description hidden("Hidden options");
+ hidden.add_options()
+ ("help,h", "print help message with typical options")
+
+ // experimental options
+ ("usePS", po::value<int>(&usePS)->default_value(0),
+ "use FORMAT:PS phase constraints in target VCF: 1=soft, 2=harder")
+ ("runStep2", po::value<int>(&runStep2)->default_value(-1),
+ "enable/disable Step 2 of non-ref algorithm (-1=auto)")
+ ("chromX", po::value<int>(&chromX)->default_value(23), "maximum chromosome number (chrX)")
+
+ // Eagle1 advanced options
+ ("v1fast", "Eagle1 fast mode: --maxBlockLen=0.3, --maxStatePairsStep4=100, --fracStep4=0.5")
+ ("maxBlockLen", po::value<double>(&cMmax)->default_value(0),
+ "max length (in cM units) of a SNP block; increase to trade accuracy for speed (0=auto)")
+ ("maxStatePairsStep3", po::value<int>(&beamWidth3)->default_value(100),
+ "maximum state pairs per position in dynamic programming (HMM-like) search (step 3)")
+ ("maxStatePairsStep4", po::value<int>(&beamWidth4)->default_value(200),
+ "maximum state pairs per position in dynamic programming (HMM-like) search (step 4)")
+ ("fracStep4", po::value<double>(&fracStep4)->default_value(1),
+ "fraction of samples to re-phase in 4th step")
+ ("seed", po::value<uint>(&seed)->default_value(0), "random seed (ignored in ref-mode)")
+
+ // error-checking
+ ("noMapCheck", "disable automatic check of genetic map scale")
+
+ // testing options
+ ("iter", po::value<int>(&iter)->default_value(0), "iter to run")
+ ("maskFile", po::value<string>(&maskFile), "indivs to mask (e.g., relatives)")
+ ("tmpPhaseConfsPrefix", po::value<string>(&tmpPhaseConfsPrefix),
+ "prefix for tmp files of phase confidences")
+ ("maxHapStates", po::value<int>(&maxHapStates)->default_value(80),
+ "maximum copying haplotype states per position in dynamic programming search")
+ ("trioCheck", "flag to output trio check; assumes target samples are in child,mat,pat order")
+ ;
+
+ po::options_description visible("Options");
+ visible.add(commonOptions).add(nonRefMode).add(refMode).add(bothModes).add(algOptions);
+
+ po::options_description all("All options");
+ all.add(commonOptions).add(nonRefMode).add(refMode).add(bothModes).add(algOptions).add(hidden);
+ all.add_options()
+ ("bad-args", po::value< vector <string> >(), "bad args")
+ ;
+ po::positional_options_description positional_desc;
+ positional_desc.add("bad-args", -1); // for error-checking command line
+
+ po::variables_map vm;
+ po::command_line_parser cmd_line(argc, argv);
+ cmd_line.options(all);
+ cmd_line.style(po::command_line_style::default_style ^ po::command_line_style::allow_guessing);
+ cmd_line.positional(positional_desc);
+ try {
+ po::store(cmd_line.run(), vm);
+
+ if (vm.count("help")) {
+ cout << endl;
+ cout << visible << endl;
+ exit(0);
+ }
+
+ po::notify(vm); // throws an error if there are any problems
+
+ usePBWT = !vm.count("v1");
+ pbwtOnly = vm.count("pbwtOnly");
+ if (pbwtOnly && !usePBWT) {
+ cerr << "ERROR: --pbwtOnly cannot be specified if using the --v1 algorithm" << endl;
+ return false;
+ }
+ trioCheck = vm.count("trioCheck");
+
+ if (vm.count("bfile") +
+ vm.count("bfilegz") +
+ (vm.count("fam") || vm.count("bim") || vm.count("bed")) +
+ vm.count("vcf") +
+ (vm.count("vcfRef") || vm.count("vcfTarget")) != 1) {
+ cerr << "ERROR: Use exactly one of the --bfile, --bfilegz, --fam,bim,bed, --vcf, or"
+ << endl << " --vcfRef,vcfTarget input formats" << endl;
+ return false;
+ }
+
+ if (vm.count("bfile")) {
+ string bfile = vm["bfile"].as<string>();
+ famFile = bfile + ".fam";
+ bimFile = bfile + ".bim";
+ bedFile = bfile + ".bed";
+ }
+
+ if (vm.count("bfilegz")) {
+ string bfile = vm["bfilegz"].as<string>();
+ famFile = bfile + ".fam.gz";
+ bimFile = bfile + ".bim.gz";
+ bedFile = bfile + ".bed.gz";
+ }
+
+ if (vm.count("bad-args")) {
+ cerr << "ERROR: Unknown options:";
+ vector <string> bad_args = vm["bad-args"].as< vector <string> >();
+ for (uint i = 0; i < bad_args.size(); i++) cerr << " " << bad_args[i];
+ cerr << endl;
+ return false;
+ }
+
+ noMapCheck = vm.count("noMapCheck");
+ noImpMissing = vm.count("noImpMissing");
+ allowRefAltSwap = vm.count("allowRefAltSwap");
+
+ if (vm.count("vcfRef") || vm.count("vcfTarget") || vm.count("vcf")) { // VCF mode
+ if (vm.count("vcf")) { // non-ref mode
+ if (noImpMissing) {
+ cerr << "ERROR: --noImpMissing is only supported in ref-mode" << endl;
+ return false;
+ }
+ if (bpFlanking != 0) {
+ cerr << "ERROR: --bpFlanking is only supported in ref-mode" << endl;
+ return false;
+ }
+ }
+ else { // ref-mode
+ if (vcfRef.empty()) {
+ cerr << "ERROR: --vcfRef must be specified in reference-based phasing mode" << endl;
+ return false;
+ }
+ if (vcfTarget.empty()) {
+ cerr << "ERROR: --vcfTarget must be specified in reference-based phasing mode" << endl;
+ return false;
+ }
+ if (pbwtIters > 1 && noImpMissing) {
+ cerr << "ERROR: --pbwtIters cannot be greater than 1 if --noImpMissing is set" << endl;
+ return false;
+ }
+ }
+
+ // vcf input checks for both ref and non-ref mode
+ if (geneticMapFile == "USE_BIM") {
+ cerr << "ERROR: --geneticMapFile must be specified when using VCF/BCF input"
+ << endl;
+ return false;
+ }
+ if (!removeFileTemplates.empty() || !excludeFileTemplates.empty() ||
+ maxMissingPerSnp != 0.1 || maxMissingPerIndiv != 0.1) {
+ cerr << "ERROR: --remove, --exclude, --maxMissingPerSnp, --maxMissingPerIndiv"
+ << " are not supported for VCF/BCF input or in reference mode" << endl;
+ return false;
+ }
+ if (vcfOutFormat == "b") { vcfOutSuffix = "bcf"; vcfWriteMode = "wb"; }
+ else if (vcfOutFormat == "u") { vcfOutSuffix = "bcf"; vcfWriteMode = "wbu"; }
+ else if (vcfOutFormat == "z") { vcfOutSuffix = "vcf.gz"; vcfWriteMode = "wz"; }
+ else if (vcfOutFormat == "v") { vcfOutSuffix = "vcf"; vcfWriteMode = "w"; }
+ else {
+ cerr << "ERROR: --vcfOutFormat must be one of {b,u,z,v}" << endl;
+ return false;
+ }
+ if (bpFlanking < 0) {
+ cerr << "ERROR: --bpFlanking cannot be negative" << endl;
+ return false;
+ }
+ }
+ else { // non-ref mode
+ if (famFile.empty()) {
+ cerr << "ERROR: fam file must be specified either using --fam or --bfile"
+ << endl;
+ return false;
+ }
+ if (bimFile.empty()) {
+ cerr << "ERROR: bim file must be specified either using --bim or --bfile"
+ << endl;
+ return false;
+ }
+ if (bedFile.empty()) {
+ cerr << "ERROR: bed file must be specified either using --bed or --bfile"
+ << endl;
+ return false;
+ }
+ if (noImpMissing) {
+ cerr << "ERROR: --noImpMissing is only supported in ref-mode" << endl;
+ return false;
+ }
+ if (bpFlanking != 0) {
+ cerr << "ERROR: --bpFlanking is only supported in ref-mode" << endl;
+ return false;
+ }
+ }
+
+ removeFiles = StringUtils::expandRangeTemplates(removeFileTemplates);
+ excludeFiles = StringUtils::expandRangeTemplates(excludeFileTemplates);
+
+ if (!(0 <= maxMissingPerSnp && maxMissingPerSnp <= 1)) {
+ cerr << "ERROR: --maxMissingPerSnp must be between 0 and 1" << endl;
+ return false;
+ }
+ if (!(0 <= maxMissingPerIndiv && maxMissingPerIndiv <= 1)) {
+ cerr << "ERROR: --maxMissingPerIndiv must be between 0 and 1" << endl;
+ return false;
+ }
+ chrom = StringUtils::bcfNameToChrom(chromStr.c_str(), 0, chromX); // checks for range
+
+ if (pbwtIters < 0 || pbwtIters > 3) {
+ cerr << "ERROR: --pbwtIters must be either 0=auto, 1, 2, or 3" << endl;
+ return false;
+ }
+
+ // check advanced options
+ if (vm.count("v1fast")) {
+ cMmax = 0.5;
+ beamWidth3 = 100;
+ beamWidth4 = 100;
+ fracStep4 = 0.5;
+ }
+ if ((cMmax != 0 && cMmax < 0.1) || cMmax > 1.0) {
+ cerr << "ERROR: --maxBlockLen must be 0=auto or between 0.1 and 1 cM" << endl;
+ return false;
+ }
+ if (beamWidth3 < 10 || beamWidth3 > 1000) {
+ cerr << "ERROR: --maxStatePairsStep3 must be between 10 and 1000" << endl;
+ return false;
+ }
+ if (beamWidth4 < 10 || beamWidth4 > 1000) {
+ cerr << "ERROR: --maxStatePairsStep4 must be between 10 and 1000" << endl;
+ return false;
+ }
+ if (maxHapStates < 40 || maxHapStates > 1000) {
+ cerr << "ERROR: --maxHapStates must be between 40 and 1000" << endl;
+ return false;
+ }
+ if (fracStep4 < 0.0 || fracStep4 > 1.0) {
+ cerr << "ERROR: --fracStep4 must be between 0.0 and 1.0" << endl;
+ return false;
+ }
+ if (pErr < 1e-6 || pErr > 0.1) {
+ cerr << "ERROR: --genoErrProb must be between 0.000001 and 0.1" << endl;
+ return false;
+ }
+
+ // check that all files specified are readable/writeable
+ FileUtils::requireEmptyOrReadable(famFile);
+ FileUtils::requireEmptyOrReadable(bimFile);
+ FileUtils::requireEmptyOrReadable(bedFile);
+ FileUtils::requireEmptyOrReadable(vcfRef);
+ FileUtils::requireEmptyOrReadable(vcfTarget);
+ if (geneticMapFile != "USE_BIM") {
+ vector <string> reqHeader;
+ reqHeader.push_back("chr"); reqHeader.push_back("position");
+ reqHeader.push_back("COMBINED_rate(cM/Mb)"); reqHeader.push_back("Genetic_Map(cM)");
+ if (FileUtils::parseHeader(geneticMapFile, " \t") != reqHeader) {
+ cerr << "ERROR: --geneticMapFile must have four columns with names:" << endl
+ << " chr position COMBINED_rate(cM/Mb) Genetic_Map(cM)" << endl;
+ return false;
+ }
+ }
+ FileUtils::requireEachEmptyOrReadable(removeFiles);
+ FileUtils::requireEachEmptyOrReadable(excludeFiles);
+ }
+ catch (po::error &e) {
+ cerr << "ERROR: " << e.what() << endl << endl;
+ cerr << visible << endl;
+ return false;
+ }
+ return true;
+ }
+}
+
diff --git a/src/EagleParams.hpp b/src/EagleParams.hpp
new file mode 100644
index 0000000..a0b6eb2
--- /dev/null
+++ b/src/EagleParams.hpp
@@ -0,0 +1,83 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef EAGLEPARAMS_HPP
+#define EAGLEPARAMS_HPP
+
+#include <vector>
+#include <string>
+
+#include "Types.hpp"
+
+namespace EAGLE {
+
+ class EagleParams {
+ public:
+
+ // main input files
+ std::string famFile, bimFile, bedFile, vcfFile, vcfRef, vcfTarget;
+ int chrom, chromX;
+
+ // optional reference map file for filling in genpos
+ std::string geneticMapFile;
+
+ std::vector <std::string> removeFiles; // list(s) of indivs to remove
+ std::vector <std::string> excludeFiles; // list(s) of SNPs to exclude
+ double bpStart, bpEnd, bpFlanking;
+
+ std::string outPrefix; // .haps.gz .sample
+ std::string vcfOutFormat, vcfOutSuffix, vcfWriteMode;
+ // outFormat b|u|z|v -> outSuffix bcf|bcf|vcf.gz|vcf, writeMode wb|wbu|wz|w
+ bool noImpMissing;
+ int usePS; // use FORMAT:PS phase constraints: 1=soft, 2=harder
+
+ bool allowRefAltSwap; // in reference-based phasing mode
+ bool usePBWT;
+ bool pbwtOnly; // in non-ref mode, don't run Steps 1 or 2
+ int runStep2; // in non-ref mode, do/don't run Step 2
+ int pbwtIters;
+ double expectIBDcM; // expected length of an IBD segment (for transition probabilities)
+ double histFactor; // history length multiplier
+
+ // QC params
+ double maxMissingPerSnp, maxMissingPerIndiv;
+
+ int numThreads;
+
+ double cMmax;
+ int beamWidth3, beamWidth4;
+ int maxHapStates;
+ double fracStep4;
+ double pErr;
+ uint seed;
+ bool noMapCheck;
+
+ int Kpbwt;
+
+ // testing
+ int iter;
+ std::string tmpPhaseConfsPrefix;
+ std::string maskFile; // list of indivs to mask (e.g., relatives)
+ bool trioCheck;
+
+ // populates members; error-checks
+ bool processCommandLineArgs(int argc, char *argv[]);
+ };
+}
+
+#endif
diff --git a/src/FileUtils.cpp b/src/FileUtils.cpp
new file mode 100644
index 0000000..805a41e
--- /dev/null
+++ b/src/FileUtils.cpp
@@ -0,0 +1,215 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <vector>
+#include <string>
+#include <iostream>
+#include <fstream>
+#include <cstdio>
+#include <cstdlib>
+
+#include "StringUtils.hpp"
+#include "FileUtils.hpp"
+#include "Types.hpp"
+
+#include <boost/iostreams/filtering_stream.hpp>
+#include <boost/iostreams/filter/gzip.hpp>
+
+namespace FileUtils {
+
+ using std::string;
+ using std::vector;
+ using std::cerr;
+ using std::endl;
+
+ void openOrExit(std::ifstream &stream, const string &file,
+ std::ios_base::openmode mode) {
+ stream.open(file.c_str(), mode);
+ if (!stream) {
+ cerr << "ERROR: Unable to open file: " << file << endl;
+ exit(1);
+ }
+ }
+ void openWritingOrExit(std::ofstream &stream, const string &file,
+ std::ios_base::openmode mode) {
+ stream.open(file.c_str(), mode);
+ if (!stream) {
+ cerr << "ERROR: Unable to open file for writing: " << file << endl;
+ exit(1);
+ }
+ }
+ void requireEmptyOrReadable(const std::string &file) {
+ if (file.empty()) return;
+ std::ifstream fin;
+ fin.open(file.c_str());
+ if (!fin) {
+ cerr << "ERROR: Unable to open file: " << file << endl;
+ exit(1);
+ }
+ fin.close();
+ }
+ void requireEachEmptyOrReadable(const std::vector <std::string> &fileList) {
+ for (uint i = 0; i < fileList.size(); i++)
+ requireEmptyOrReadable(fileList[i]);
+ }
+ void requireEmptyOrWriteable(const std::string &file) {
+ if (file.empty()) return;
+ std::ofstream fout;
+ fout.open(file.c_str(), std::ios::out|std::ios::app);
+ if (!fout) {
+ cerr << "ERROR: Output file is not writeable: " << file << endl;
+ exit(1);
+ }
+ fout.close();
+ }
+ vector <string> parseHeader(const string &fileName, const string &delimiters) {
+ AutoGzIfstream fin; fin.openOrExit(fileName);
+ string header;
+ getline(fin, header);
+ vector <string> split = StringUtils::tokenizeMultipleDelimiters(header, delimiters);
+ fin.close();
+ return split;
+ }
+ int lookupColumnInd(const string &fileName, const string &delimiters, const string &columnName) {
+ vector <string> headers = parseHeader(fileName, delimiters);
+ int columnInd = -1;
+ for (uint c = 0; c < headers.size(); c++)
+ if (headers[c] == columnName)
+ columnInd = c; // first column is snp ID, treated separately
+ if (columnInd == -1) {
+ cerr << "WARNING: Column " << columnName << " not found in headers of " << fileName << endl;
+ //exit(1);
+ }
+ return columnInd;
+ }
+ double readDoubleNanInf(std::istream &stream) {
+ string str;
+ stream >> str;
+ double x;
+ sscanf(str.c_str(), "%lf", &x);
+ return x;
+ }
+
+ vector < std::pair <string, string> > readFidIids(const string &file) {
+ vector < std::pair <string, string> > ret;
+ AutoGzIfstream fin;
+ fin.openOrExit(file);
+ string FID, IID, line;
+ while (fin >> FID >> IID) {
+ if (FID.empty() || IID.empty()) {
+ cerr << "ERROR: In file " << file << endl;
+ cerr << " unable to read FID and IID; check format" << endl;
+ exit(1);
+ }
+ ret.push_back(make_pair(FID, IID));
+ getline(fin, line);
+ }
+ fin.close();
+ return ret;
+ }
+
+
+ int AutoGzIfstream::lineCount(const std::string &file) {
+ AutoGzIfstream fin; fin.openOrExit(file);
+ int ctr = 0; string line;
+ while (getline(fin, line))
+ ctr++;
+ return ctr;
+ }
+
+ /***** AutoGzIfstream class implementation *****/
+
+ void AutoGzIfstream::openOrExit(const std::string &file, std::ios_base::openmode mode) {
+ fin.open(file.c_str(), mode);
+ if (!fin) {
+ cerr << "ERROR: Unable to open file: " << file << endl;
+ exit(1);
+ }
+ if ((int) file.length() > 3 && file.substr(file.length()-3) == ".gz")
+ boost_in.push(boost::iostreams::gzip_decompressor());
+ boost_in.push(fin);
+ }
+
+ void AutoGzIfstream::close() {
+ fin.close();
+ boost_in.reset();
+ }
+
+ AutoGzIfstream::operator bool() const {
+ return boost_in;
+ }
+
+ AutoGzIfstream& AutoGzIfstream::read(char *s, std::streamsize n) {
+ boost_in.read(s, n);
+ return *this;
+ }
+
+ int AutoGzIfstream::get() {
+ return boost_in.get();
+ }
+
+ double AutoGzIfstream::readDoubleNanInf() {
+ return FileUtils::readDoubleNanInf(boost_in);
+ }
+
+ void AutoGzIfstream::clear() {
+ boost_in.clear();
+ }
+
+ AutoGzIfstream& AutoGzIfstream::seekg(std::streamoff off, std::ios_base::seekdir way) {
+ boost_in.seekg(off, way);
+ return *this;
+ }
+
+ AutoGzIfstream& getline(AutoGzIfstream& in, std::string &s) {
+ std::getline(in.boost_in, s);
+ return in;
+ }
+
+
+ /***** AutoGzOfstream class implementation *****/
+
+ void AutoGzOfstream::openOrExit(const std::string &file, std::ios_base::openmode mode) {
+ fout.open(file.c_str(), mode);
+ if (!fout) {
+ cerr << "ERROR: Unable to open file: " << file << endl;
+ exit(1);
+ }
+ if ((int) file.length() > 3 && file.substr(file.length()-3) == ".gz")
+ boost_out.push(boost::iostreams::gzip_compressor());
+ boost_out.push(fout);
+ }
+
+ void AutoGzOfstream::close() {
+ boost_out.reset();
+ }
+
+ AutoGzOfstream& AutoGzOfstream::operator << (std::ostream&(*manip)(std::ostream&)) {
+ manip(boost_out);
+ return *this;
+ }
+
+ void AutoGzOfstream::unsetf(std::ios_base::fmtflags mask) {
+ boost_out.unsetf(mask);
+ }
+
+ AutoGzOfstream::operator bool() const {
+ return boost_out;
+ }
+
+}
diff --git a/src/FileUtils.hpp b/src/FileUtils.hpp
new file mode 100644
index 0000000..7f7f01c
--- /dev/null
+++ b/src/FileUtils.hpp
@@ -0,0 +1,94 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef FILEUTILS_HPP
+#define FILEUTILS_HPP
+
+#include <vector>
+#include <string>
+#include <fstream>
+
+#include "StringUtils.hpp"
+
+#include <boost/iostreams/filtering_stream.hpp>
+
+namespace FileUtils {
+
+ void openOrExit(std::ifstream &stream, const std::string &file,
+ std::ios_base::openmode mode=std::ios::in);
+
+ void openWritingOrExit(std::ofstream &stream, const std::string &file,
+ std::ios_base::openmode mode=std::ios::out);
+
+ void requireEmptyOrReadable(const std::string &file);
+
+ void requireEachEmptyOrReadable(const std::vector <std::string> &fileList);
+
+ void requireEmptyOrWriteable(const std::string &file);
+
+ std::vector <std::string> parseHeader(const std::string &fileName,
+ const std::string &delimiters);
+
+ int lookupColumnInd(const std::string &fileName, const std::string &delimiters,
+ const std::string &columnName);
+
+ double readDoubleNanInf(std::istream &stream);
+
+ std::vector < std::pair <std::string, std::string> > readFidIids(const std::string &file);
+
+ class AutoGzIfstream {
+ boost::iostreams::filtering_istream boost_in;
+ std::ifstream fin;
+ public:
+ static int lineCount(const std::string &file);
+
+ void openOrExit(const std::string &file, std::ios_base::openmode mode=std::ios::in);
+ void close();
+ template <class T> AutoGzIfstream& operator >> (T &x) {
+ boost_in >> x;
+ return *this;
+ }
+ operator bool() const;
+ AutoGzIfstream& read(char *s, std::streamsize n);
+ int get();
+ double readDoubleNanInf();
+ void clear();
+ AutoGzIfstream& seekg(std::streamoff off, std::ios_base::seekdir way);
+ friend AutoGzIfstream& getline(AutoGzIfstream& in, std::string &s);
+ };
+
+ AutoGzIfstream& getline(AutoGzIfstream& in, std::string &s);
+
+ class AutoGzOfstream {
+ boost::iostreams::filtering_ostream boost_out;
+ std::ofstream fout;
+ public:
+ void openOrExit(const std::string &file, std::ios_base::openmode mode=std::ios::out);
+ void close();
+ template <class T> AutoGzOfstream& operator << (const T &x) {
+ boost_out << x;
+ return *this;
+ }
+ AutoGzOfstream& operator << (std::ostream&(*manip)(std::ostream&));
+ void unsetf(std::ios_base::fmtflags);
+ operator bool() const;
+ };
+
+}
+
+#endif
diff --git a/src/GenoData.cpp b/src/GenoData.cpp
new file mode 100644
index 0000000..15f27ff
--- /dev/null
+++ b/src/GenoData.cpp
@@ -0,0 +1,845 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <iostream>
+#include <fstream>
+#include <sstream>
+#include <map>
+#include <algorithm>
+#include <cstring>
+#include <cmath>
+
+#include <htslib/vcf.h>
+
+#include "Types.hpp"
+#include "FileUtils.hpp"
+#include "MemoryUtils.hpp"
+#include "MapInterpolater.hpp"
+#include "LapackConst.hpp"
+#include "GenoData.hpp"
+
+namespace EAGLE {
+
+ using std::vector;
+ using std::string;
+ using std::cout;
+ using std::cerr;
+ using std::endl;
+ using FileUtils::getline;
+
+ int GenoData::plinkChromCode(const string &chrom) {
+ if (isdigit(chrom[0])) return atoi(chrom.c_str());
+ if (chrom == "X") return 23;
+ if (chrom == "Y") return 24;
+ if (chrom == "XY") return 25;
+ if (chrom == "MT") return 26;
+ return -1;
+ }
+
+ // set indivsPreQC, NpreQC
+ // return bedIndivRemoved
+ vector <bool> GenoData::processIndivs(const string &famFile,
+ const vector <string> &removeFiles) {
+ std::map <string, uint64> FID_IID_to_ind;
+ string line;
+
+ vector <IndivInfoX> bedIndivs;
+ cout << "Reading fam file: " << famFile << endl;
+ FileUtils::AutoGzIfstream fin; fin.openOrExit(famFile);
+ while (getline(fin, line)) {
+ std::istringstream iss(line);
+ IndivInfoX indiv;
+ if (!(iss >> indiv.famID >> indiv.indivID >> indiv.paternalID >> indiv.maternalID
+ >> indiv.sex >> indiv.pheno)) {
+ cerr << "ERROR: Incorrectly formatted fam file: " << famFile << endl;
+ cerr << "Line " << bedIndivs.size()+1 << ":" << endl;
+ cerr << line << endl;
+ cerr << "Unable to input 6 values (4 string, 1 int, 1 double)" << endl;
+ exit(1);
+ }
+ string combined_ID = indiv.famID + " " + indiv.indivID;
+ if (FID_IID_to_ind.find(combined_ID) != FID_IID_to_ind.end()) {
+ cerr << "ERROR: Duplicate individual in fam file at line " << bedIndivs.size()+1 << endl;
+ exit(1);
+ }
+ FID_IID_to_ind[combined_ID] = bedIndivs.size();
+ bedIndivs.push_back(indiv);
+ }
+ fin.close();
+ uint64 Nbed = bedIndivs.size();
+
+ cout << "Total indivs in PLINK data: Nbed = " << Nbed << endl;
+
+ // process individuals to remove
+ vector <bool> bedIndivRemoved(Nbed);
+ for (uint f = 0; f < removeFiles.size(); f++) {
+ const string &removeFile = removeFiles[f];
+ cout << "Reading remove file (indivs to remove): " << removeFile << endl;
+ fin.openOrExit(removeFile);
+ int lineCtr = 0;
+ int numRemoved = 0;
+ int numAbsent = 0;
+ while (getline(fin, line)) {
+ lineCtr++;
+ std::istringstream iss(line);
+ string FID, IID;
+ if (!(iss >> FID >> IID)) {
+ cerr << "ERROR: Incorrectly formatted remove file: " << removeFile << endl;
+ cerr << "Line " << lineCtr << ":" << endl;
+ cerr << line << endl;
+ cerr << "Unable to input FID and IID" << endl;
+ exit(1);
+ }
+ string combined_ID = FID + " " + IID;
+ if (FID_IID_to_ind.find(combined_ID) == FID_IID_to_ind.end()) {
+ if (numAbsent < 5)
+ cerr << "WARNING: Unable to find individual to remove: " << combined_ID << endl;
+ numAbsent++;
+ }
+ else if (!bedIndivRemoved[FID_IID_to_ind[combined_ID]]) {
+ bedIndivRemoved[FID_IID_to_ind[combined_ID]] = true;
+ numRemoved++;
+ }
+ }
+ fin.close();
+ cout << "Removed " << numRemoved << " individual(s)" << endl;
+ if (numAbsent)
+ cerr << "WARNING: " << numAbsent << " individual(s) not found in data set" << endl;
+ }
+
+ for (uint64 nbed = 0; nbed < Nbed; nbed++)
+ if (!bedIndivRemoved[nbed])
+ indivsPreQC.push_back(bedIndivs[nbed]);
+ NpreQC = indivsPreQC.size();
+ cout << "Total indivs stored in memory: NpreQC = " << NpreQC << endl;
+
+ return bedIndivRemoved;
+ }
+
+ vector <SnpInfoX> GenoData::readBimFile(const string &bimFile) {
+ vector <SnpInfoX> ret;
+ string line;
+ FileUtils::AutoGzIfstream fin; fin.openOrExit(bimFile);
+ int numOutOfOrder = 0;
+ while (getline(fin, line)) {
+ std::istringstream iss(line);
+ SnpInfoX snp; string chrom_str;
+ if (!(iss >> chrom_str >> snp.ID >> snp.genpos >> snp.physpos >> snp.allele1 >> snp.allele2))
+ {
+ cerr << "ERROR: Incorrectly formatted bim file: " << bimFile << endl;
+ cerr << "Line " << ret.size()+1 << ":" << endl;
+ cerr << line << endl;
+ cerr << "Unable to input 6 values (2 string, 1 double, 1 int, 2 string)" << endl;
+ exit(1);
+ }
+ snp.chrom = plinkChromCode(chrom_str);
+ if (snp.chrom == -1) {
+ cerr << "ERROR: Unknown chromosome code in bim file: " << bimFile << endl;
+ cerr << "Line " << ret.size()+1 << ":" << endl;
+ cerr << line << endl;
+ exit(1);
+ }
+ if (!ret.empty() &&
+ (snp.chrom < ret.back().chrom ||
+ (snp.chrom == ret.back().chrom && (snp.physpos <= ret.back().physpos ||
+ snp.genpos < ret.back().genpos)))) {
+ if (numOutOfOrder < 5) {
+ cerr << "WARNING: Out-of-order snp in bim file: " << bimFile << endl;
+ cerr << "Line " << ret.size()+1 << ":" << endl;
+ cerr << line << endl;
+ }
+ numOutOfOrder++;
+ //exit(1);
+ }
+ ret.push_back(snp);
+ }
+ if (numOutOfOrder)
+ cerr << "WARNING: Total number of out-of-order snps in bim file: " << numOutOfOrder << endl;
+ fin.close();
+ // TODO: exit with error or sort SNPs?
+ return ret;
+ }
+
+ // set snpsPreQC, MpreQC
+ // return bedSnpExcluded
+ vector <bool> GenoData::processSnps(const string &bimFile, int chrom, double bpStart,
+ double bpEnd, const vector <string> &excludeFiles) {
+ FileUtils::AutoGzIfstream fin;
+ string line;
+
+ // read bim file
+ cout << "Reading bim file: " << bimFile << endl;
+ vector <SnpInfoX> bedSnps = readBimFile(bimFile);
+
+ uint64 Mbed = bedSnps.size();
+ cout << "Total snps in PLINK data: Mbed = " << Mbed << endl;
+
+ vector <bool> bedSnpExcluded(Mbed);
+
+ if (chrom == 0) {
+ if (bedSnps[0].chrom != bedSnps.back().chrom) {
+ cerr << "ERROR: Only one chromosome may be analyzed at a time; use --chrom" << endl;
+ exit(1);
+ }
+ else
+ chrom = bedSnps[0].chrom;
+ }
+
+ uint64 MbedOnChrom = 0;
+ for (uint64 mbed = 0; mbed < Mbed; mbed++) {
+ if (bedSnps[mbed].chrom != chrom || bedSnps[mbed].physpos < bpStart ||
+ bedSnps[mbed].physpos > bpEnd)
+ bedSnpExcluded[mbed] = true;
+ else
+ MbedOnChrom++;
+ }
+ if (MbedOnChrom < Mbed) {
+ cout << "Restricting to " << MbedOnChrom << " SNPs on chrom " << chrom
+ << " in region [bpStart,bpEnd] = [" << bpStart << "," << bpEnd << "]" << endl;
+ }
+
+ // create dictionary rsID -> index in full bed snp list
+ std::map <string, uint64> rsID_to_ind;
+ for (uint64 mbed = 0; mbed < Mbed; mbed++) {
+ if (rsID_to_ind.find(bedSnps[mbed].ID) != rsID_to_ind.end()) {
+ cerr << "WARNING: Duplicate snp ID " << bedSnps[mbed].ID
+ << " -- masking duplicate" << endl;
+ bedSnpExcluded[mbed] = true;
+ }
+ else
+ rsID_to_ind[bedSnps[mbed].ID] = mbed;
+ }
+
+ // TODO: also limit SNP density (e.g., Omni 2.5M) and/or MAF?
+ // process snps to exclude
+ for (uint f = 0; f < excludeFiles.size(); f++) {
+ const string &excludeFile = excludeFiles[f];
+ cout << "Reading exclude file (SNPs to exclude): " << excludeFile << endl;
+ fin.openOrExit(excludeFile);
+ int numExcluded = 0;
+ int numAbsent = 0;
+ while (getline(fin, line)) {
+ std::istringstream iss(line);
+ string rsID; iss >> rsID;
+ if (rsID_to_ind.find(rsID) == rsID_to_ind.end()) {
+ if (numAbsent < 5)
+ cerr << "WARNING: Unable to find SNP to exclude: " << rsID << endl;
+ numAbsent++;
+ }
+ else if (!bedSnpExcluded[rsID_to_ind[rsID]]) {
+ bedSnpExcluded[rsID_to_ind[rsID]] = true;
+ numExcluded++;
+ }
+ }
+ fin.close();
+ cout << "Excluded " << numExcluded << " SNP(s)" << endl;
+ if (numAbsent)
+ cerr << "WARNING: " << numAbsent << " SNP(s) not found in data set" << endl;
+ }
+
+ for (uint64 mbed = 0; mbed < Mbed; mbed++)
+ if (!bedSnpExcluded[mbed])
+ snpsPreQC.push_back(bedSnps[mbed]);
+ MpreQC = snpsPreQC.size();
+ cout << "Total SNPs stored in memory: MpreQC = " << MpreQC << endl;
+
+ return bedSnpExcluded;
+ }
+
+ void GenoData::processMap(vector <SnpInfoX> &snpsVec, const string &geneticMapFile,
+ bool noMapCheck) {
+ // fill in map if external file provided
+ if (geneticMapFile != "USE_BIM") {
+ cout << "Filling in genetic map coordinates using reference file:" << endl;
+ cout << " " << geneticMapFile << endl;
+ Genetics::MapInterpolater mapInterpolater(geneticMapFile);
+ for (uint64 m = 0; m < snpsVec.size(); m++)
+ snpsVec[m].genpos = mapInterpolater.interp(snpsVec[m].chrom, snpsVec[m].physpos);
+ }
+ else {
+ // check map and rescale if in cM units: calculate d(genpos)/d(physpos)
+ double scale = (snpsVec.back().genpos - snpsVec[0].genpos)
+ / (snpsVec.back().physpos - snpsVec[0].physpos);
+ if (0.5e-6 < scale && scale < 3e-6) {
+ cerr << "WARNING: Genetic map appears to be in cM units; rescaling by 0.01" << endl;
+ for (uint64 m = 0; m < snpsVec.size(); m++)
+ snpsVec[m].genpos *= 0.01;
+ }
+ else if (!(0.5e-8 < scale && scale < 3e-8)) {
+ if (noMapCheck) {
+ cerr << "WARNING: Genetic map appears wrong based on overall cM/Mb" << endl;
+ cerr << " Proceeding anyway because --noMapCheck is set" << endl;
+ }
+ else {
+ cerr << "ERROR: Genetic map appears wrong based on overall cM/Mb" << endl;
+ cerr << " To proceed anyway, set --noMapCheck" << endl;
+ exit(1);
+ }
+ }
+ }
+ }
+
+ inline double log10safe(double x) { return x > 0 ? log10(x) : -1000; }
+
+ void GenoData::buildGenoBits(uchar *genosPreQC, const vector <bool> &genos2bit, double cMmax) {
+ const uint segMin = 16;
+ vector <uint64> preQCsnpInds; vector <double> cMvec;
+ for (uint64 m = 0; m < MpreQC; m++)
+ if (snpsPreQC[m].passQC) {
+ if (preQCsnpInds.size() == 64 ||
+ (preQCsnpInds.size() >= segMin &&
+ snpsPreQC[m].genpos > snpsPreQC[preQCsnpInds[0]].genpos + cMmax/100)) {
+ seg64preQCsnpInds.push_back(preQCsnpInds); seg64cMvecs.push_back(cMvec);
+ preQCsnpInds.clear(); cMvec.clear();
+ }
+ preQCsnpInds.push_back(m); cMvec.push_back(100 * snpsPreQC[m].genpos);
+ }
+ seg64preQCsnpInds.push_back(preQCsnpInds); seg64cMvecs.push_back(cMvec);
+
+ Mseg64 = seg64preQCsnpInds.size();
+ cout << "Number of <=(64-SNP, " << cMmax << "cM) segments: " << Mseg64 << endl;
+ cout << "Average # SNPs per segment: " << M / Mseg64 << endl;
+
+ isFlipped64j = vector <bool> (Mseg64*64);
+ genoBits = ALIGNED_MALLOC_UINT64_MASKS(Mseg64 * N);
+ memset(genoBits, 0, Mseg64 * N * sizeof(genoBits[0]));
+ seg64logPs = (AlleleFreqs *) ALIGNED_MALLOC(Mseg64*64 * sizeof(AlleleFreqs));
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++)
+ for (uint g1 = 0; g1 <= 2; g1++)
+ for (uint g0 = 0; g0 <= 3; g0++)
+ seg64logPs[m64j].cond[g1][g0] = NAN;
+
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ for (uint64 j = 0; j < seg64preQCsnpInds[m64].size(); j++) {
+ uint64 m = seg64preQCsnpInds[m64][j]; // m, n indices are preQC
+ int genoCounts[3]; genoCounts[0] = genoCounts[1] = genoCounts[2] = 0;
+ for (uint64 n = 0; n < NpreQC; n++)
+ if (indivsPreQC.empty() || indivsPreQC[n].passQC) {
+ uchar geno = genosPreQC != NULL ? genosPreQC[m * NpreQC + n] :
+ (genos2bit[2*(m * NpreQC + n)] + 2*genos2bit[2*(m * NpreQC + n)+1]);
+ if (geno <= 2) genoCounts[geno]++;
+ }
+ uchar is0geno = 0, is2geno = 2;
+ if (genoCounts[2] > genoCounts[0]) {
+ isFlipped64j[m64*64+j] = true;
+ is0geno = 2; is2geno = 0;
+ std::swap(genoCounts[0], genoCounts[2]);
+ }
+ uint64 nPostQC = 0;
+ for (uint64 n = 0; n < NpreQC; n++)
+ if (indivsPreQC.empty() || indivsPreQC[n].passQC) {
+ uchar geno = genosPreQC != NULL ? genosPreQC[m * NpreQC + n] :
+ (genos2bit[2*(m * NpreQC + n)] + 2*genos2bit[2*(m * NpreQC + n)+1]);
+ genoBits[m64 * N + nPostQC].is0 |= ((uint64) (geno == is0geno))<<j;
+ genoBits[m64 * N + nPostQC].is2 |= ((uint64) (geno == is2geno))<<j;
+ genoBits[m64 * N + nPostQC].is9 |= ((uint64) (geno > 2))<<j;
+ nPostQC++;
+ }
+ double tot = genoCounts[0] + genoCounts[1] + genoCounts[2];
+ double p0 = genoCounts[0]/tot, p1half = 0.5*genoCounts[1]/tot, p2 = genoCounts[2]/tot;
+ if (p1half == 0) p1half = 1e-9; // avoid division by 0
+ double p = (genoCounts[1] + 2*genoCounts[2]) / (2*tot);
+ AlleleFreqs &af = seg64logPs[m64*64+j];
+
+ for (uint g1 = 0; g1 <= 2; g1++)
+ af.cond[g1][3] = log10safe(genoCounts[g1] / tot); // [3]: unconditioned
+
+ af.cond[2][2] = log10safe(p2 / (p1half + p2));
+ af.cond[1][2] = log10safe(p1half / (p1half + p2));
+ af.cond[0][2] = log10safe(0);
+
+ af.cond[2][1] = log10safe(0.5 * p2 / (p1half + p2));
+ af.cond[1][1] = log10safe(0.5 * p1half / (p1half + p2) + 0.5 * p1half / (p0 + p1half));
+ af.cond[0][1] = log10safe(0.5 * p0 / (p0 + p1half));
+
+ af.cond[2][0] = log10safe(0);
+ af.cond[1][0] = log10safe(p1half / (p0 + p1half));
+ af.cond[0][0] = log10safe(p0 / (p0 + p1half));
+
+ // same orientation as het-het => p(hap=1) = 1-p
+ af.cond[2][4] = log10safe((1-p) * p2 / (p1half + p2));
+ af.cond[1][4] = log10safe((1-p) * p1half / (p1half + p2) + p * p1half / (p0 + p1half));
+ af.cond[0][4] = log10safe(p * p0 / (p0 + p1half));
+
+ // opp orientation to het-het => p(hap=1) = p
+ af.cond[2][5] = log10safe(p * p2 / (p1half + p2));
+ af.cond[1][5] = log10safe(p * p1half / (p1half + p2) + (1-p) * p1half / (p0 + p1half));
+ af.cond[0][5] = log10safe((1-p) * p0 / (p0 + p1half));
+
+ if (p > 0.55) {
+ cerr << "INTERNAL ERROR: Minor/major allele coding bug" << endl;
+ exit(1);
+ }
+ }
+ for (uint64 n = 0; n < N; n++)
+ for (uint64 j = seg64preQCsnpInds[m64].size(); j < 64; j++)
+ genoBits[m64*N+n].is9 |= 1ULL<<j;
+ }
+ }
+
+ /**
+ * fills x[] with indivInds.size() elements corresponding to chosen subset of indivInds
+ * replaces missing values with average; mean-centers and normalizes vector length to 1
+ * if monomorphic among non-missing, fills with all-0s
+ *
+ * return: true if snp is polymorphic in indivInds; false if not
+ */
+ bool GenoData::fillSnpSubRowNorm1(float x[], uint64 m64j, const vector <int> &indivInds) const {
+ uint64 m64 = m64j/64, j = m64j&63, jBit = 1ULL<<j;
+ float sumPresent = 0; int numPresent = 0;
+ for (uint64 i = 0; i < indivInds.size(); i++) {
+ uint64 n = indivInds[i];
+ if (genoBits[m64 * N + n].is0 & jBit) x[i] = 0.0f;
+ else if (genoBits[m64 * N + n].is2 & jBit) x[i] = 2.0f;
+ else if (genoBits[m64 * N + n].is9 & jBit) x[i] = 9.0f;
+ else x[i] = 1.0f;
+ if (x[i] != 9.0f) {
+ sumPresent += x[i];
+ numPresent++;
+ }
+ }
+ float avg = sumPresent / numPresent;
+ float sum2 = 0;
+ for (uint64 i = 0; i < indivInds.size(); i++) {
+ if (x[i] != 9.0f) { // non-missing; mean-center
+ x[i] -= avg;
+ sum2 += x[i]*x[i];
+ }
+ else // missing; replace with mean (centered to 0)
+ x[i] = 0;
+ }
+ if (sum2 < 0.001) { // monomorphic among non-missing
+ for (uint64 i = 0; i < indivInds.size(); i++) x[i] = 0; // set to 0
+ return false;
+ }
+ else { // polymorphic
+ float invNorm = 1.0f / sqrtf(sum2);
+ for (uint64 i = 0; i < indivInds.size(); i++) x[i] *= invNorm; // normalize to vector len 1
+ return true;
+ }
+ }
+
+ float dotProdToAdjR2(float dotProd, int n) {
+ float r2 = dotProd*dotProd;
+ return r2 - (1-r2)/(n-2);
+ }
+
+ vector <double> GenoData::computeInvLD64j(uint64 NsubMax) const {
+ uint64 Mchr = Mseg64*64;
+ vector <double> chipLDscores(Mchr, 1.0);
+
+ uint64 step = std::max(N / NsubMax, 1ULL);
+ vector <int> indivInds;
+ for (uint64 n = 0; n < N && indivInds.size() < NsubMax; n += step)
+ indivInds.push_back(n);
+ uint64 Nsub = indivInds.size();
+ cout << "Estimating LD scores using " << Nsub << " indivs" << endl;
+
+ // allocate memory
+ uchar *chrMaskSnps = ALIGNED_MALLOC_UCHARS(Mchr);
+ memset(chrMaskSnps, 0, Mchr * sizeof(chrMaskSnps[0]));
+ float *chrNormalizedGenos = ALIGNED_MALLOC_FLOATS(Mchr * Nsub);
+ memset(chrNormalizedGenos, 0, Mchr * Nsub * sizeof(chrNormalizedGenos[0]));
+ const int mBlock = 64;
+ float *dotProds = ALIGNED_MALLOC_FLOATS(Mchr * mBlock);
+
+ // fill and normalize genotypes
+ for (uint64 mchr = 0; mchr < Mchr; mchr++) {
+ if ((mchr&63) < seg64cMvecs[mchr/64].size())
+ chrMaskSnps[mchr] = fillSnpSubRowNorm1(chrNormalizedGenos + mchr*Nsub, mchr, indivInds);
+ else
+ chipLDscores[mchr] = 0;
+ }
+
+ uint64 mchrWindowStart = 0;
+ for (uint64 mchr0 = 0; mchr0 < Mchr; mchr0 += mBlock) { // sgemm to compute r2s
+ uint64 mBlockCrop = std::min(Mchr, mchr0+mBlock) - mchr0;
+ while ((mchrWindowStart&63) >= seg64cMvecs[mchrWindowStart/64].size() ||
+ seg64cMvecs[mchrWindowStart/64][mchrWindowStart&63] + 1 <
+ seg64cMvecs[mchr0/64][mchr0&63])
+ mchrWindowStart++;
+ uint64 prevWindowSize = mchr0+mBlockCrop-1 - mchrWindowStart;
+
+ // [mchrWindowStart..mchr0+mBlockCrop-1) x [mchr0..mchr0+mBlockCrop)
+ {
+ char TRANSA_ = 'T';
+ char TRANSB_ = 'N';
+ int M_ = prevWindowSize;
+ int N_ = mBlockCrop;
+ int K_ = Nsub;
+ float ALPHA_ = 1;
+ float *A_ = chrNormalizedGenos + mchrWindowStart*Nsub;
+ int LDA_ = Nsub;
+ float *B_ = chrNormalizedGenos + mchr0*Nsub;
+ int LDB_ = Nsub;
+ float BETA_ = 0;
+ float *C_ = dotProds;
+ int LDC_ = prevWindowSize;
+ SGEMM_MACRO(&TRANSA_, &TRANSB_, &M_, &N_, &K_, &ALPHA_, A_, &LDA_, B_, &LDB_,
+ &BETA_, C_, &LDC_);
+ }
+
+ for (uint64 mPlus = 0; mPlus < mBlockCrop; mPlus++) {
+ uint64 m = mchr0 + mPlus;
+ if (!chrMaskSnps[m]) continue;
+ for (uint64 mPlus2 = 0; mchrWindowStart+mPlus2 < mchr0+mPlus; mPlus2++) {
+ uint64 m2 = mchrWindowStart + mPlus2;
+ if (!chrMaskSnps[m2]) continue;
+ float adjR2 = dotProdToAdjR2(dotProds[mPlus2 + mPlus*prevWindowSize], Nsub);
+ chipLDscores[m] += adjR2;
+ chipLDscores[m2] += adjR2;
+ }
+ }
+ }
+
+ ALIGNED_FREE(dotProds);
+ ALIGNED_FREE(chrNormalizedGenos);
+ ALIGNED_FREE(chrMaskSnps);
+
+ for (uint mchr = 0; mchr < Mchr; mchr++) chipLDscores[mchr] = 1/chipLDscores[mchr];
+ return chipLDscores; // reciprocals taken above
+ }
+
+ void GenoData::printRange(void) const {
+
+ int physRange = snps.back().physpos - snps[0].physpos;
+ double cMrange = 100*(snps.back().genpos - snps[0].genpos);
+
+ cout << "Physical distance range: " << physRange << " base pairs" << endl;
+ cout << "Genetic distance range: " << cMrange << " cM" << endl;
+ cout << "Average # SNPs per cM: " << (int) (M/cMrange+0.5) << endl;
+
+ if (physRange == 0 || cMrange == 0) {
+ cerr << "ERROR: Physical and genetic distance ranges must be positive" << endl;
+ cerr << " First SNP: chr=" << snps[0].chrom << " pos=" << snps[0].physpos
+ << " cM=" << 100*snps[0].genpos << endl;
+ cerr << " Last SNP: chr=" << snps.back().chrom << " pos=" << snps.back().physpos
+ << " cM=" << 100*snps.back().genpos << endl;
+ exit(1);
+ }
+ }
+
+ double GenoData::computeSnpRate(void) const {
+ double cMrange = 100*(snps.back().genpos - snps[0].genpos);
+ return M/cMrange;
+ }
+
+ /**
+ * reads indiv info from fam file, snp info from bim file
+ * allocates memory, reads genotypes, and does QC
+ */
+ void GenoData::initBed(const string &famFile, const string &bimFile, const string &bedFile,
+ int chrom, double bpStart, double bpEnd, const string &geneticMapFile,
+ const vector <string> &excludeFiles, const vector <string> &removeFiles,
+ double maxMissingPerSnp, double maxMissingPerIndiv, bool noMapCheck,
+ double cMmax) {
+
+ // indivsPreQC (without --remove indivs)
+ vector <bool> bedIndivRemoved = processIndivs(famFile, removeFiles);
+ // snpsPreQC (restricted to chrom:bpStart-bpEnd and without --exclude snps)
+ vector <bool> bedSnpExcluded = processSnps(bimFile, chrom, bpStart, bpEnd, excludeFiles);
+ processMap(snpsPreQC, geneticMapFile, noMapCheck); // modify snpsPreQC
+
+ // allocate genotypes
+ cout << "Allocating " << MpreQC << " x " << NpreQC << " bytes to temporarily store genotypes"
+ << endl;
+ uchar *genosPreQC = ALIGNED_MALLOC_UCHARS(MpreQC * NpreQC); // temporary
+
+ cout << "Reading genotypes and performing QC filtering on snps and indivs..." << endl;
+
+ // open bed file
+ const uint64 Nbed = bedIndivRemoved.size(), Mbed = bedSnpExcluded.size();
+ cout << "Reading bed file: " << bedFile << endl;
+ cout << " Expecting " << Mbed * ((Nbed+3)>>2) << " (+3) bytes for "
+ << Nbed << " indivs, " << Mbed << " snps" << endl;
+ FileUtils::AutoGzIfstream fin;
+ fin.openOrExit(bedFile, std::ios::in | std::ios::binary);
+ uchar header[3];
+ fin.read((char *) header, 3);
+ if (!fin || header[0] != 0x6c || header[1] != 0x1b || header[2] != 0x01) {
+ cerr << "ERROR: Incorrect first three bytes of bed file: " << bedFile << endl;
+ exit(1);
+ }
+
+ // read genos + QC snps (and record indiv miss rates)
+ vector <int> numMissingPerIndiv(NpreQC);
+ uchar *bedLineIn = ALIGNED_MALLOC_UCHARS((Nbed+3)>>2);
+ int numSnpsFailedQC = 0;
+ uint64 m = 0;
+ for (uint64 mbed = 0; mbed < Mbed; mbed++) {
+ uchar *genoLine = genosPreQC + m*NpreQC;
+ readBedLine(fin, bedLineIn, genoLine, bedIndivRemoved, !bedSnpExcluded[mbed]);
+ if (!bedSnpExcluded[mbed]) {
+ snpsPreQC[m].MAF = computeMAF(genoLine, NpreQC);
+ snpsPreQC[m].miss = computeSnpMissing(genoLine, NpreQC);
+ snpsPreQC[m].passQC = snpsPreQC[m].miss <= maxMissingPerSnp;
+ if (snpsPreQC[m].passQC) {
+ for (uint64 n = 0; n < NpreQC; n++)
+ numMissingPerIndiv[n] += genoLine[n] == 9;
+ }
+ else {
+ if (numSnpsFailedQC < 5)
+ cout << "Filtering snp " << snpsPreQC[mbed].ID << ": "
+ << snpsPreQC[m].miss << " missing" << endl;
+ numSnpsFailedQC++;
+ }
+ m++;
+ }
+ }
+ ALIGNED_FREE(bedLineIn);
+
+ if (numSnpsFailedQC)
+ cout << "Filtered " << numSnpsFailedQC << " SNPs with > " << maxMissingPerSnp << " missing"
+ << endl;
+
+ if (!fin || fin.get() != EOF) {
+ cerr << "ERROR: Wrong file size or reading error for bed file: "
+ << bedFile << endl;
+ exit(1);
+ }
+ fin.close();
+
+ // select subset of snps passing QC
+ for (uint64 m = 0; m < MpreQC; m++)
+ if (snpsPreQC[m].passQC)
+ snps.push_back(snpsPreQC[m]);
+ M = snps.size();
+
+ // QC indivs for missingness
+ int numIndivsFailedQC = 0;
+ for (uint64 n = 0; n < NpreQC; n++) {
+ indivsPreQC[n].miss = numMissingPerIndiv[n] / (double) M;
+ indivsPreQC[n].passQC = indivsPreQC[n].miss <= maxMissingPerIndiv;
+ if (!indivsPreQC[n].passQC) {
+ if (numIndivsFailedQC < 5)
+ cout << "Filtering indiv " << indivsPreQC[n].famID << " " << indivsPreQC[n].indivID
+ << ": " << numMissingPerIndiv[n] << "/" << M << " missing" << endl;
+ numIndivsFailedQC++;
+ }
+ }
+ if (numIndivsFailedQC)
+ cout << "Filtered " << numIndivsFailedQC << " indivs with > " << maxMissingPerIndiv
+ << " missing" << endl;
+
+ // select subset of indivs passing QC
+ for (uint64 n = 0; n < NpreQC; n++)
+ if (indivsPreQC[n].passQC)
+ indivs.push_back(indivsPreQC[n]);
+ N = indivs.size();
+
+ cout << endl;
+ cout << "Total post-QC indivs: N = " << N << endl;
+ cout << "Total post-QC SNPs: M = " << M << endl;
+
+ cout << "MAF spectrum: " << endl;
+ const double mafBounds6[7] = {0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.500001};
+ vector <int> mafBinCounts(6);
+ for (uint64 m = 0; m < M; m++)
+ for (int b = 0; b < 6; b++)
+ if (mafBounds6[b] <= snps[m].MAF && snps[m].MAF < mafBounds6[b+1])
+ mafBinCounts[b]++;
+ for (int b = 0; b < 6; b++)
+ printf(" %2.0f-%2.0f%%: %7d\n", 100*mafBounds6[b], 100*mafBounds6[b+1], mafBinCounts[b]);
+
+ printRange();
+
+ if (cMmax == 0) {
+ cMmax = std::min(1.0, std::max(N / 1e5, 0.25));
+ cout << "Auto-selecting --maxBlockLen: " << cMmax << " cM" << endl;
+ }
+
+ vector <bool> nullVec;
+ buildGenoBits(genosPreQC, nullVec, cMmax);
+
+ ALIGNED_FREE(genosPreQC);
+ }
+
+ /**
+ * reads genotypes from VCF/BCF file
+ * does not save indiv info (will be reread from VCF during output)
+ * only saves chrom, physpos, genpos in snp info (rest will be reread from VCF during output)
+ * allocates memory, reads genotypes, and restricts to region if specified; does not do QC
+ */
+ void GenoData::initVcf(const string &vcfFile, const int inputChrom, const int chromX,
+ double bpStart, double bpEnd, const string &geneticMapFile,
+ bool noMapCheck, double cMmax) {
+
+ htsFile *fin = hts_open(vcfFile.c_str(), "r");
+ bcf_hdr_t *hdr = bcf_hdr_read(fin);
+ bcf1_t *rec = bcf_init1();
+ int mgt = 0, *gt = NULL;
+
+ NpreQC = bcf_hdr_nsamples(hdr);
+ cout << "Reading genotypes for N = " << NpreQC << " samples" << endl;
+ vector <bool> genos2bit;
+
+ int wantChrom = inputChrom; // might be 0; if so, update
+ // read genos; save chrom and physpos for each SNP
+ while (bcf_read(fin, hdr, rec) >= 0) {
+ // check CHROM
+ int chrom = StringUtils::bcfNameToChrom(bcf_hdr_id2name(hdr, rec->rid), 1, chromX);
+ if (wantChrom == 0) wantChrom = chrom; // if --chrom was not specified, set to first
+ if (chrom != wantChrom) { // only allow multi-chrom file if --chrom has been specified
+ if (inputChrom == 0) {
+ cerr << "ERROR: File contains data for >1 chromosome; specify one with --chrom" << endl;
+ exit(1);
+ }
+ else
+ continue;
+ }
+
+ // check if POS is within selected region
+ int bp = rec->pos+1;
+ if (!(bpStart <= bp && bp <= bpEnd)) continue;
+
+ // check for multi-allelics (TODO: ignore with warning and don't phase in output)
+ if (rec->n_allele > 2) {
+ cerr << "ERROR: Multi-allelic site found (i.e., ALT contains multiple alleles)" << endl;
+ cerr << " Either drop or split (bcftools norm -m) multi-allelic variants" << endl;
+ exit(1);
+ }
+
+ // add chrom and bp to SNP list
+ SnpInfoX snp; snp.chrom = chrom; snp.physpos = bp;
+ snpsPreQC.push_back(snp);
+
+ // read genotypes
+ int ngt = bcf_get_genotypes(hdr, rec, >, &mgt);
+ if (ngt != 2 * (int) NpreQC) {
+ cerr << "ERROR: Samples are not diploid" << endl;
+ exit(1);
+ }
+ for (int i = 0; i < (int) NpreQC; i++) {
+ int ploidy = 2;
+ int *ptr = gt + i*ploidy;
+
+ uchar geno = 0;
+ bool missing = false;
+ for (int j = 0; j < ploidy; j++) {
+ if ( ptr[j]==bcf_int32_vector_end ) {
+ if (j == 0) {
+ cerr << "ERROR: ptr[0]==bcf_int32_vector_end... zero ploidy?" << endl;
+ exit(1);
+ }
+ else { // 2nd of ploidy==2 genotypes is set to bcf_int32_vector_end => haploid
+ if ( missing ) continue; // missing diploid genotype can be written in VCF as "."
+ else if (wantChrom == chromX) // X chromosome => haploid ok
+ geno *= 2; // encode as diploid homozygote
+ else {
+ cerr << "ERROR: Haploid genotype found" << endl;
+ exit(1);
+ }
+ }
+ }
+ else {
+ if ( bcf_gt_is_missing(ptr[j]) ) // missing allele
+ missing = true;
+ else
+ geno += bcf_gt_allele(ptr[j]); // 0=REF, 1=ALT (multi-allelics prohibited)
+ }
+ }
+
+ if (missing) geno = 3;
+ genos2bit.push_back(geno&1);
+ genos2bit.push_back(geno>>1);
+ }
+ }
+
+ free(gt);
+ bcf_destroy(rec);
+ bcf_hdr_destroy(hdr);
+ hts_close(fin);
+
+ cout << "Read M = " << snpsPreQC.size() << " variants" << endl;
+ processMap(snpsPreQC, geneticMapFile, noMapCheck); // modify snpsPreQC
+
+ // don't perform QC; use all SNPs
+ MpreQC = snpsPreQC.size();
+ for (uint64 m = 0; m < MpreQC; m++)
+ snpsPreQC[m].passQC = true;
+ snps = snpsPreQC;
+ M = snps.size();
+
+ // don't perform QC; use all samples
+ N = NpreQC;
+
+ printRange();
+
+ if (cMmax == 0) {
+ cMmax = std::min(1.0, std::max(N / 1e5, 0.25));
+ cout << "Auto-selecting --maxBlockLen: " << cMmax << " cM" << endl;
+ }
+
+ buildGenoBits(NULL, genos2bit, cMmax);
+ }
+
+ GenoData::~GenoData() {
+ ALIGNED_FREE(seg64logPs);
+ ALIGNED_FREE(genoBits);
+ }
+
+ /**
+ * assumes Nbed = bedIndivRemoved.size()
+ * reads (Nbed+3)>>2 bytes into bedLineIn
+ * stores sum(!bedIndivRemoved) bytes into genoLine if loadGenoLine == true
+ */
+ void GenoData::readBedLine(FileUtils::AutoGzIfstream &fin, uchar bedLineIn[], uchar genoLine[],
+ vector <bool> &bedIndivRemoved, bool storeGenoLine) {
+ uint64 Nbed = bedIndivRemoved.size();
+ fin.read((char *) bedLineIn, (Nbed+3)>>2);
+ if (storeGenoLine) {
+ const uchar bedToGeno[4] = {2, 9, 1, 0};
+ uint64 n = 0;
+ for (uint64 nbed = 0; nbed < Nbed; nbed++)
+ if (!bedIndivRemoved[nbed])
+ genoLine[n++] = bedToGeno[(bedLineIn[nbed>>2]>>((nbed&3)<<1))&3];
+ }
+ }
+ double GenoData::computeAlleleFreq(const uchar genoLine[], uint64 genoN) {
+ double sum = 0; int num = 0;
+ for (uint64 n = 0; n < genoN; n++)
+ if (genoLine[n] != 9) {
+ sum += genoLine[n];
+ num++;
+ }
+ return 0.5 * sum / num;
+ }
+ double GenoData::computeMAF(const uchar genoLine[], uint64 genoN) {
+ double alleleFreq = computeAlleleFreq(genoLine, genoN);
+ return std::min(alleleFreq, 1.0-alleleFreq);
+ }
+ double GenoData::computeSnpMissing(const uchar genoLine[], uint64 genoN) {
+ double sum = 0; int num = 0;
+ for (uint64 n = 0; n < genoN; n++) {
+ sum += (genoLine[n] == 9);
+ num++;
+ }
+ return sum / num;
+ }
+
+ const vector <SnpInfoX> &GenoData::getSnps(void) const { return snps; }
+ uint64 GenoData::getN(void) const { return N; }
+ uint64 GenoData::getMseg64(void) const { return Mseg64; }
+ const uint64_masks *GenoData::getGenoBits(void) const { return genoBits; }
+ vector <vector <double> > GenoData::getSeg64cMvecs(void) const { return seg64cMvecs; }
+ const AlleleFreqs *GenoData::getSeg64logPs(void) const { return seg64logPs; }
+ IndivInfoX GenoData::getIndiv(uint64 n) const { return indivs[n]; }
+ const vector <IndivInfoX> &GenoData::getIndivs(void) const { return indivs; }
+ const vector <bool> &GenoData::getIsFlipped64j(void) const { return isFlipped64j; }
+
+};
diff --git a/src/GenoData.hpp b/src/GenoData.hpp
new file mode 100644
index 0000000..277e4b2
--- /dev/null
+++ b/src/GenoData.hpp
@@ -0,0 +1,140 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef GENODATA_HPP
+#define GENODATA_HPP
+
+#include <vector>
+#include <string>
+#include <map>
+#include <boost/utility.hpp>
+
+#include "Types.hpp"
+#include "FileUtils.hpp"
+
+namespace EAGLE {
+
+ struct SnpInfoX {
+ int chrom;
+ std::string ID;
+ double genpos; // Morgans
+ int physpos;
+ std::string allele1, allele2;
+ double MAF; // note: MAFs are computed on preQC indivs
+ double miss;
+ bool passQC;
+ };
+
+ struct IndivInfoX {
+ std::string famID;
+ std::string indivID;
+ std::string paternalID;
+ std::string maternalID;
+ int sex; // (1=male; 2=female; other=unknown)
+ double pheno;
+ double miss;
+ bool passQC;
+ };
+
+ struct AlleleFreqs {
+ double cond[3][6];
+ /*
+ double dip[3];
+ double hap[2];
+ */
+ };
+
+ class GenoData : boost::noncopyable {
+
+ private:
+ uint64 NpreQC, MpreQC; // # of indivs/snps in fam/bim file not in --remove/exclude file(s)
+ uint64 N, M; // post-QC
+
+ std::vector <IndivInfoX> indivsPreQC, indivs; // [VECTOR]: NpreQC, N
+ std::vector <SnpInfoX> snpsPreQC, snps; // [VECTOR]: MpreQC, M
+
+ uint64 Mseg64; // number of <=64-SNP chunks
+ uint64_masks *genoBits; // [[MATRIX]]: M64 x N (is0, is2, is9 64-bit masks; 3 bits/base)
+ std::vector <std::vector <double> > seg64cMvecs;
+ std::vector <std::vector <uint64> > seg64preQCsnpInds;
+ AlleleFreqs *seg64logPs;
+ std::vector <bool> isFlipped64j;
+
+ std::vector <bool> processIndivs(const std::string &famFile,
+ const std::vector <std::string> &removeFiles);
+ std::vector <bool> processSnps(const std::string &bimFile, int chrom, double bpStart,
+ double bpEnd, const std::vector <std::string> &excludeFiles);
+ void processMap(std::vector <SnpInfoX> &snpsVec, const std::string &geneticMapFile,
+ bool noMapCheck);
+ void buildGenoBits(uchar *genosPreQC, const std::vector <bool> &genos2bit, double cMmax);
+ bool fillSnpSubRowNorm1(float x[], uint64 m64j, const std::vector <int> &indivInds) const;
+
+ public:
+ /**
+ * reads indiv info from fam file, snp info from bim file
+ * allocates memory, reads genotypes, and does QC
+ * assumes numbers of bim and bed files match
+ */
+ void initBed(const std::string &famFile, const std::string &bimFile,
+ const std::string &bedFile, int chrom, double bpStart, double bpEnd,
+ const std::string &geneticMapFile, const std::vector <std::string> &excludeFiles,
+ const std::vector <std::string> &removeFiles, double maxMissingPerSnp,
+ double maxMissingPerIndiv, bool noMapCheck, double cMmax);
+ /**
+ * reads genotypes from VCF/BCF file
+ * does not save indiv info (will be reread from VCF during output)
+ * only saves chrom, physpos, genpos in snp info (rest will be reread from VCF during output)
+ * allocates memory, reads genotypes, and restricts to region if specified; does not do QC
+ */
+ void initVcf(const std::string &vcfFile, const int inputChrom, const int chromX,
+ double bpStart, double bpEnd, const std::string &geneticMapFile, bool noMapCheck,
+ double cMmax);
+
+ void printRange(void) const;
+
+ ~GenoData();
+
+ static int plinkChromCode(const std::string &chrom);
+ static std::vector <SnpInfoX> readBimFile(const std::string &bimFile);
+ /**
+ * assumes Nbed = bedIndivRemoved.size()
+ * reads (Nbed+3)>>2 bytes into bedLineIn
+ * stores sum(!bedIndivRemoved) bytes into genoLine if storeGenoLine == true
+ */
+ static void readBedLine(FileUtils::AutoGzIfstream &fin, uchar bedLineIn[], uchar genoLine[],
+ std::vector <bool> &bedIndivRemoved, bool storeGenoLine);
+ static double computeAlleleFreq(const uchar genoLine[], uint64 genoN);
+ static double computeMAF(const uchar genoLine[], uint64 genoN);
+ static double computeSnpMissing(const uchar genoLine[], uint64 genoN);
+
+ const std::vector <SnpInfoX> &getSnps(void) const;
+ uint64 getN(void) const;
+ uint64 getMseg64(void) const;
+ const uint64_masks *getGenoBits(void) const;
+ std::vector <std::vector <double> > getSeg64cMvecs(void) const;
+ const AlleleFreqs *getSeg64logPs(void) const;
+ IndivInfoX getIndiv(uint64 n) const;
+ std::vector <double> computeInvLD64j(uint64 NsubMax) const;
+ const std::vector <IndivInfoX> &getIndivs(void) const;
+ const std::vector <bool> &getIsFlipped64j(void) const;
+ double computeSnpRate(void) const;
+
+ };
+}
+
+#endif
diff --git a/src/HapHedge.cpp b/src/HapHedge.cpp
new file mode 100644
index 0000000..bbeb0ed
--- /dev/null
+++ b/src/HapHedge.cpp
@@ -0,0 +1,599 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <vector>
+#include <iostream>
+#include <cstdlib>
+#include <cstring>
+#include <cassert>
+
+#include "MemoryUtils.hpp"
+#include "Types.hpp"
+#include "HapHedge.hpp"
+
+namespace EAGLE {
+
+ using std::vector;
+ using std::string;
+ using std::cout;
+ using std::endl;
+
+ void HapBitsT::setBit(uint64 n, uint64 m) { haploBitsT[n*M64 + (m>>6)] |= 1ULL<<(m&63); }
+
+#ifdef TEST_VS
+ HapBitsT::HapBitsT(const vector <string> &_hapBitsVS)
+ : hapBitsVS(_hapBitsVS), Nhaps(hapBitsVS.size()), M(hapBitsVS[0].length()) {}
+ int HapBitsT::getBit(uint64 n, uint64 m) const { return hapBitsVS[n][m]-'0'; }
+ HapBitsT::~HapBitsT(void) { }
+#else
+ HapBitsT::HapBitsT(const uint64 *_haploBitsT, uint64 _Nhaps, uint64 Mseg64,
+ const uchar maskSnps64j[]) {
+ Nhaps = _Nhaps;
+ M = 0;
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++)
+ if (maskSnps64j[m64j])
+ M++;
+ M64 = (M+63)/64;
+ haploBitsT = ALIGNED_MALLOC_UINT64S(Nhaps * M64);
+ memset(haploBitsT, 0, Nhaps * M64 * sizeof(haploBitsT[0]));
+ for (uint64 n = 0; n < Nhaps; n++) {
+ uint64 mCur = 0;
+ for (uint64 m64j = 0; m64j < Mseg64*64; m64j++)
+ if (maskSnps64j[m64j]) {
+ if ((_haploBitsT[n*Mseg64 + (m64j>>6)]>>(m64j&63))&1)
+ setBit(n, mCur);
+ mCur++;
+ }
+ }
+ }
+ inline int popcount64_01(uint64 i) {
+ return i!=0;
+ }
+ inline int popcount64_012(uint64 i) {
+ if (i == 0) return 0;
+ else if ((i & (i-1ULL)) == 0) return 1;
+ else return 2;
+ }
+ HapBitsT::HapBitsT(const uint64 *inBitsT, uint64 Mseg64, const vector <uint64> &splits64j,
+ const vector <uchar> &splitGenos, const vector <uint64_masks> &tgtGenoBits,
+ const vector <uint> &bestHaps) {
+ Nhaps = bestHaps.size();
+ M = 2*(splits64j.size()+2);
+ M64 = (M+63)/64;
+ haploBitsT = ALIGNED_MALLOC_UINT64S(Nhaps * M64);
+ memset(haploBitsT, 0, Nhaps * M64 * sizeof(haploBitsT[0]));
+
+ for (uint k = 0; k < bestHaps.size(); k++) {
+ uint64 n = bestHaps[k];
+ if (k+1<bestHaps.size() && bestHaps[k+1]==n+1) {
+ // process both haplotypes from this individual
+ uint64 hLastDiff = 0; // for double-IBD
+ uint64 hLastErrA = 0, hLastErrB = 0;
+ for (uint64 h = 0; h <= splits64j.size(); h++) {
+ // set hom before splits64j[h]
+ int numErrsA = 0, numErrsB = 0;
+ uint64 homStart = (h == 0 ? 0 : splits64j[h-1]+1);
+ uint64 homStop = (h == splits64j.size() ? Mseg64*64 : std::max(splits64j[h], 1ULL)) - 1;
+ for (uint64 m64 = (homStart>>6); m64 <= (homStop>>6); m64++) {
+ uint64 mask = -1ULL;
+ if (m64 == (homStart>>6))
+ mask &= (-1ULL>>(homStart&63))<<(homStart&63);
+ if (m64 == (homStop>>6))
+ mask &= (-1ULL>>(63-(homStop&63)));
+
+ numErrsA += popcount64_01(mask&((tgtGenoBits[m64].is0 & inBitsT[n*Mseg64 + m64]) |
+ (tgtGenoBits[m64].is2 & ~inBitsT[n*Mseg64 + m64])));
+ numErrsB += popcount64_01(mask&((tgtGenoBits[m64].is0 & inBitsT[(n+1)*Mseg64 + m64]) |
+ (tgtGenoBits[m64].is2 & ~inBitsT[(n+1)*Mseg64 + m64])));
+ if (numErrsA && numErrsB)
+ break;
+ }
+
+ if (numErrsA)
+ setBit(k, 2*h+1);
+ if (numErrsB)
+ setBit(k+1, 2*h+1);
+
+ // set bits at split
+ int splitA = 0, splitB = 0, splitGeno = 0;
+ if (h < splits64j.size()) {
+ splitA = (inBitsT[n*Mseg64 + (splits64j[h]>>6)]>>(splits64j[h]&63))&1;
+ splitB = (inBitsT[(n+1)*Mseg64 + (splits64j[h]>>6)]>>(splits64j[h]&63))&1;
+ splitGeno = splitGenos[h];
+ if (splitA) setBit(k, 2*(h+1));
+ if (splitB) setBit(k+1, 2*(h+1));
+ }
+
+ // check for double-IBD
+ if (numErrsA || numErrsB || splitA+splitB != splitGeno || h==splits64j.size()) {
+ if (h > hLastDiff+20) {
+ for (uint h2 = hLastDiff+1; h2 < h; h2++) {
+ setBit(k, 2*h2+1);
+ setBit(k+1, 2*h2+1);
+ }
+ }
+ else if (h > hLastDiff+10) {
+ uint64 kMask = hLastErrA >= hLastErrB ? k : k+1;
+ for (uint h2 = hLastDiff+1; h2 < h; h2++)
+ setBit(kMask, 2*h2+1);
+ }
+ hLastDiff = h;
+ }
+ if (numErrsA) hLastErrA = h;
+ if (numErrsB) hLastErrB = h;
+ }
+ k++;
+ }
+ else { // process lone reference haplotype
+ for (uint64 h = 0; h <= splits64j.size(); h++) {
+ // set hom
+ uint64 homStart = (h == 0 ? 0 : splits64j[h-1]+1);
+ uint64 homStop = (h == splits64j.size() ? Mseg64*64 : splits64j[h]) - 1;
+ int numErrs = 0;
+ for (uint64 m64 = (homStart>>6); m64 <= (homStop>>6); m64++) {
+ uint64 mask = -1ULL;
+ if (m64 == (homStart>>6))
+ mask &= (-1ULL>>(homStart&63))<<(homStart&63);
+ if (m64 == (homStop>>6))
+ mask &= (-1ULL>>(63-(homStop&63)));
+ numErrs += popcount64_01(mask&((tgtGenoBits[m64].is0 & inBitsT[n*Mseg64 + m64]) |
+ (tgtGenoBits[m64].is2 & ~inBitsT[n*Mseg64 + m64])));
+ if (numErrs)
+ break;
+ }
+ if (numErrs)
+ setBit(k, 2*h+1);
+
+ // set bit at split
+ if (h < splits64j.size() && ((inBitsT[n*Mseg64+(splits64j[h]>>6)]>>(splits64j[h]&63))&1))
+ setBit(k, 2*(h+1));
+ }
+ }
+ }
+ }
+ const unsigned char flipByte[] = {
+ 0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0,
+ 0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8,
+ 0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4,
+ 0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC,
+ 0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2,
+ 0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
+ 0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6,
+ 0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
+ 0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
+ 0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9,
+ 0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
+ 0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
+ 0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3,
+ 0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
+ 0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7,
+ 0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF
+ };
+ uint64 reverse64(uint64 v) {
+ return
+ ((uint64) flipByte[v & 0xff] << 56ULL) |
+ ((uint64) flipByte[(v >> 8) & 0xff] << 48ULL) |
+ ((uint64) flipByte[(v >> 16) & 0xff] << 40ULL) |
+ ((uint64) flipByte[(v >> 24) & 0xff] << 32ULL) |
+ ((uint64) flipByte[(v >> 32) & 0xff] << 24ULL) |
+ ((uint64) flipByte[(v >> 40) & 0xff] << 16ULL) |
+ ((uint64) flipByte[(v >> 48) & 0xff] << 8ULL) |
+ flipByte[v >> 56];
+ }
+ HapBitsT::HapBitsT(const HapBitsT &hapBitsFwdT, int dir)
+ : Nhaps(hapBitsFwdT.getNhaps()), M(hapBitsFwdT.getM()), M64((M+63)/64) {
+
+ assert(dir<0); // reverse existing HapBitsT
+
+ haploBitsT = ALIGNED_MALLOC_UINT64S(Nhaps * M64);
+ memset(haploBitsT, 0, Nhaps * M64 * sizeof(haploBitsT[0]));
+
+ if (dir == -1) { // simple reversal
+ for (uint64 n = 0; n < Nhaps; n++)
+ for (uint64 m = 0; m < M; m++)
+ if (hapBitsFwdT.getBit(n, m))
+ setBit(n, M-1-m);
+ }
+ else { // repad 0 xyz 0 0 -> 0 zyx 0 0
+ uint64 Mtrim = M-1, Mtrim64 = (Mtrim+63)/64, offset = Mtrim&63;
+ for (uint64 n = 0; n < Nhaps; n++) {
+ /*
+ for (uint64 m = 1; m < M-2; m++)
+ if (hapBitsFwdT.getBit(n, m))
+ setBit(n, M-2-m);
+ */
+ for (uint64 m64 = 0; m64 < Mtrim64; m64++) {
+ uint64 fwdBits = offset==0 ? hapBitsFwdT.getBits64(n, Mtrim64-1-m64) :
+ ((m64<Mtrim64-1 ? (hapBitsFwdT.getBits64(n, Mtrim64-2-m64)>>offset) : 0) |
+ (hapBitsFwdT.getBits64(n, Mtrim64-1-m64)<<(64-offset)));
+ //assert(haploBitsT[n * M64 + m64] == reverse64(fwdBits));
+ haploBitsT[n * M64 + m64] = reverse64(fwdBits);
+ }
+ }
+ }
+ }
+ int HapBitsT::getBit(uint64 n, uint64 m) const { return (haploBitsT[n*M64 + (m>>6)]>>(m&63))&1; }
+ uint64 HapBitsT::getBits64(uint64 n, uint64 m64) const { return haploBitsT[n*M64 + m64]; }
+ HapBitsT::~HapBitsT(void) { ALIGNED_FREE(haploBitsT); }
+#endif
+ int HapBitsT::getNhaps(void) const { return Nhaps; }
+ int HapBitsT::getM(void) const { return M; }
+
+
+ int d, a, count, up, left, right;
+ WorkTreeNode::WorkTreeNode(int _d, int _a, int _count, int _up, int _left, int _right) :
+ d(_d), a(_a), count(_count), up(_up), left(_left), right(_right) {}
+
+
+ void dfsPreOrder(WorkTreeNode workNodes[], HapTreeNode *nodes, int &pos, int cur) {
+ if (cur == -1) return;
+ nodes[pos].mSplit = workNodes[cur].d;
+ int left = workNodes[cur].left;
+ nodes[pos].count0 = 1 + (left==-1 ? 0 : workNodes[left].count);
+ nodes[pos].seq1 = workNodes[cur].a;
+ pos++;
+ dfsPreOrder(workNodes, nodes, pos, workNodes[cur].left);
+ dfsPreOrder(workNodes, nodes, pos, workNodes[cur].right);
+ }
+
+ void dfsPreOrderMulti(WorkTreeNode workNodes[], HapTreeMultiNode *nodes, int &pos, int cur) {
+ if (cur == -1) return;
+ int curPos = pos;
+ nodes[curPos].mSplit = workNodes[cur].d;
+ nodes[curPos].count0 = workNodes[cur].count;
+ nodes[curPos].node0 = curPos+1;
+ nodes[curPos].seq1 = workNodes[cur].a;
+ pos++;
+ dfsPreOrderMulti(workNodes, nodes, pos, workNodes[cur].left);
+ if (pos == curPos+1) { // no next node 0
+ nodes[curPos].node0 = -1;
+ /* MEMORY-SAVING ALTERNATIVE: use 1 bit (sign of state.count) to encode next node
+ nodes[curPos].count0 = -nodes[curPos].count0;
+ */
+ }
+ nodes[curPos].node1 = pos;
+ dfsPreOrderMulti(workNodes, nodes, pos, workNodes[cur].right);
+ if (pos == nodes[curPos].node1) // no next node 1
+ nodes[curPos].node1 = -1;
+ }
+
+ HapTree::HapTree(const HapBitsT &_hapBitsT, int a[], int d[])
+ : hapBitsT(_hapBitsT), Nhaps(hapBitsT.getNhaps()), invNhaps(1.0f/Nhaps) {
+
+ seq0 = a[0];
+ nodes = (HapTreeNode *) ALIGNED_MALLOC((Nhaps-1) * sizeof(nodes[0]));
+
+ WorkTreeNode *workNodes = (WorkTreeNode *) ALIGNED_MALLOC(Nhaps * sizeof(workNodes[0]));
+
+ // perform in-order traversal
+ workNodes[0] = WorkTreeNode(-1, a[0], 0, -1, -1, -1);
+ for (int n = 1; n < Nhaps; n++) {
+ if (d[n] >= d[n-1]) {
+ workNodes[n] = WorkTreeNode(d[n], a[n], 1, n-1, -1, -1);
+ workNodes[n-1].right = n;
+ }
+ else {
+ int cur = n-1, up = workNodes[cur].up;
+ while (d[n] < workNodes[up].d) {
+ workNodes[up].count += workNodes[cur].count;
+ cur = up; up = workNodes[cur].up;
+ }
+ workNodes[n] = WorkTreeNode(d[n], a[n], workNodes[cur].count+1, up, cur, -1);
+ workNodes[cur].up = n;
+ workNodes[up].right = n;
+ }
+ }
+ int cur = Nhaps-1, up = workNodes[cur].up;
+ while (up != -1) {
+ workNodes[up].count += workNodes[cur].count;
+ cur = up; up = workNodes[cur].up;
+ }
+
+ // perform pre-order traversal
+ int pos = 0;
+ dfsPreOrder(workNodes, nodes, pos, workNodes[0].right);
+
+ ALIGNED_FREE(workNodes);
+ }
+ HapTree::~HapTree(void) {
+ ALIGNED_FREE(nodes);
+ }
+ float HapTree::getInvNhaps(void) const {
+ return invNhaps;
+ }
+ HapTreeState HapTree::getRootState(void) const {
+ HapTreeState h;
+ h.seq = seq0; h.node = 0; h.count = Nhaps;
+ return h;
+ }
+ bool HapTree::next(int m, HapTreeState &state, int nextBit) const {
+ if (state.count == 1 || nodes[state.node].mSplit > m) {
+ return hapBitsT.getBit(state.seq, m) == nextBit;
+ }
+ else {
+ if (nextBit == 0) {
+ state.count = nodes[state.node++].count0;
+ }
+ else {
+ state.seq = nodes[state.node].seq1;
+ int c0 = nodes[state.node].count0;
+ state.node += c0;
+ state.count -= c0;
+ }
+ return true;
+ }
+ }
+
+ HapTreeMulti::HapTreeMulti(const HapBitsT &_hapBitsT, SortDiv ad[], int M,
+ WorkTreeNode workNodes[])
+ : hapBitsT(_hapBitsT), Nhaps(hapBitsT.getNhaps()), invNhaps(1.0f/Nhaps) {
+
+ // perform in-order traversal
+ workNodes[0] = WorkTreeNode(-1, ad[0].a, 0, -1, -1, -1);
+ int n = 0, nextMult = 1, uniqHaps = Nhaps;
+ for (int nHap = 1; nHap < Nhaps; nHap++) {
+ if (ad[nHap].d == M) { // sequence is identical to previous; merge
+ nextMult++;
+ uniqHaps--;
+ }
+ else {
+ n++;
+ if (ad[nHap].d >= workNodes[n-1].d) {
+ workNodes[n] = WorkTreeNode(ad[nHap].d, ad[nHap].a, nextMult, n-1, -1, -1);
+ workNodes[n-1].right = n;
+ }
+ else {
+ int cur = n-1, up = workNodes[cur].up;
+ int leftCount = nextMult;
+ while (ad[nHap].d < workNodes[up].d) {
+ leftCount += workNodes[cur].count;
+ cur = up; up = workNodes[cur].up;
+ }
+ leftCount += workNodes[cur].count;
+ workNodes[n] = WorkTreeNode(ad[nHap].d, ad[nHap].a, leftCount, up, cur, -1);
+ workNodes[cur].up = n;
+ workNodes[up].right = n;
+ }
+ nextMult = 1;
+ }
+ }
+
+ nodes = (HapTreeMultiNode *) ALIGNED_MALLOC((uniqHaps-1) * sizeof(nodes[0]));
+
+ rootState.seq = ad[0].a; rootState.node = workNodes[0].right==-1?-1:0; rootState.count = Nhaps;
+ int pos = 0;
+ dfsPreOrderMulti(workNodes, nodes, pos, workNodes[0].right);
+ }
+ HapTreeMulti::~HapTreeMulti(void) {
+ ALIGNED_FREE(nodes);
+ }
+ float HapTreeMulti::getInvNhaps(void) const {
+ return invNhaps;
+ }
+ HapTreeState HapTreeMulti::getRootState(void) const {
+ return rootState;
+ }
+ bool HapTreeMulti::next(int m, HapTreeState &state, int nextBit) const {
+ if (state.node == -1 || nodes[state.node].mSplit > m) {
+ return hapBitsT.getBit(state.seq, m) == nextBit;
+ }
+ else {
+ if (nextBit == 0) {
+ state.count = nodes[state.node].count0;
+ state.node = nodes[state.node].node0;
+ /* MEMORY-SAVING ALTERNATIVE: use 1 bit (sign of state.count) to encode next node
+ if (state.count > 0)
+ state.node++;
+ else {
+ state.node = -1;
+ state.count = -state.count;
+ }
+ */
+ }
+ else {
+ state.count -= nodes[state.node].count0;
+ /* MEMORY-SAVING ALTERNATIVE: use 1 bit (sign of state.count) to encode next node
+ state.count -= abs(nodes[state.node].count0);
+ */
+ state.seq = nodes[state.node].seq1;
+ state.node = nodes[state.node].node1; // update last: overwrites state.node!
+ }
+ return true;
+ }
+ }
+ void HapTreeMulti::nextAtFrac(int m, HapTreeState &state, double nextFrac) const {
+ // see above for MEMORY-SAVING ALTERNATIVES
+ if (state.node == -1 || nodes[state.node].mSplit > m) {
+ return;
+ }
+ else {
+ if (nodes[state.node].count0 >= nextFrac * state.count) { // nextBit = 0
+ state.count = nodes[state.node].count0;
+ state.node = nodes[state.node].node0;
+ }
+ else {
+ state.count -= nodes[state.node].count0;
+ state.seq = nodes[state.node].seq1;
+ state.node = nodes[state.node].node1; // update last: overwrites state.node!
+ }
+ }
+ }
+ // for debugging
+ void HapTreeMulti::dfsPrint(string curPrefix, int m, int M, const HapTreeState &state) const {
+ if (m > M) return;
+ cout << "m = " << m << ", prefix = " << curPrefix << ": count = " << state.count << endl;
+ for (int b = 0; b < 2; b++) {
+ HapTreeState nextState = state;
+ if (next(m, nextState, b))
+ dfsPrint(curPrefix + (char) ('0'+b), m+1, M, nextState);
+ }
+ }
+
+ HapHedge::HapHedge(const HapBitsT &_hapBitsT, int _skip/*, const vector <int> &treeStarts*/) :
+ hapBitsT(_hapBitsT), skip(_skip), T((hapBitsT.getM()+skip-1) / skip) {
+
+ treePtrs = new HapTree *[T];
+
+ int N = hapBitsT.getNhaps(), M = hapBitsT.getM();
+
+ // initialize work arrays
+ int *a1 = new int[N], *d1 = new int[N], *a = new int[N], *b = new int[N], *d = new int[N],
+ *e = new int[N];
+ for (int n = 0; n < N; n++) {
+ a1[n] = n;
+ d1[n] = M;
+ }
+
+ for (int m = M-1; m >= 0; m--) {
+ // compute sort order and divergence array
+ int u = 0, v = 0, p = m, q = m;
+ for (int n = 0; n < N; n++) {
+ if (d1[n] < p) p = d1[n];
+ if (d1[n] < q) q = d1[n];
+ if (hapBitsT.getBit(a1[n], m) == 0) {
+ a[u] = a1[n]; d[u] = p; u++; p = M;
+ }
+ else {
+ b[v] = a1[n]; e[v] = q; v++; q = M;
+ }
+ }
+ memcpy(a1, a, u * sizeof(a1[0])); memcpy(a1+u, b, v * sizeof(a1[0]));
+ memcpy(d1, d, u * sizeof(d1[0])); memcpy(d1+u, e, v * sizeof(d1[0]));
+
+ // perform pre-order traversal and store to HapTree
+ if (m % skip == 0) {
+ treePtrs[m/skip] = new HapTree(hapBitsT, a1, d1);
+ }
+ }
+ delete[] a1;
+ delete[] d1;
+ delete[] a;
+ delete[] b;
+ delete[] d;
+ delete[] e;
+ }
+ HapHedge::~HapHedge(void) {
+ for (int t = 0; t < T; t++)
+ delete treePtrs[t];
+ delete[] treePtrs;
+ }
+ const HapTree &HapHedge::getHapTree(int t) const {
+ return *treePtrs[t];
+ };
+ int HapHedge::getM(void) const {
+ return hapBitsT.getM();
+ }
+ int HapHedge::getSkip(void) const {
+ return skip;
+ }
+ int HapHedge::getNumTrees(void) const {
+ return T;
+ }
+ const HapBitsT &HapHedge::getHapBitsT(void) const {
+ return hapBitsT;
+ }
+
+ HapHedgeErr::HapHedgeErr(const HapBitsT &_hapBitsT) :
+ hapBitsT(_hapBitsT), T((hapBitsT.getM()+1) / 2) {
+
+ treePtrs = new HapTreeMulti *[T];
+
+ int N = hapBitsT.getNhaps(), M = hapBitsT.getM();
+
+ // initialize work arrays
+ SortDiv *ad = new SortDiv[N+1], *ad1 = new SortDiv[N+1]; // N+1 for convenience below
+ WorkTreeNode *workNodes = (WorkTreeNode *) ALIGNED_MALLOC(N * sizeof(workNodes[0]));
+ for (int n = 0; n <= N; n++) { // N+1 for convenience below
+ ad[n].a = n;
+ ad[n].d = M;
+ }
+
+ uint64 *curBits64 = new uint64[N];
+ uchar *curBits8 = new uchar[N]; // N-byte buffer hopefully fits in L1 cache for random access
+
+ for (int m = M-1; m >= 0; m--) {
+ if (m == M-1 || (m&63) == 63) { // move current 64-bit m-block for each sample to curBits64
+ for (int n = 0; n < N; n++)
+ curBits64[n] = hapBitsT.getBits64(n, m>>6);
+ }
+ //uint64 curBit64 = 1ULL<<(m&63);
+ if (m == M-1 || (m&7) == 7) { // move current 8-bit m-block for each sample to curBits8
+ uint64 shift = (m&63)&~7;
+ for (int n = 0; n < N; n++)
+ curBits8[n] = (uchar) (curBits64[n]>>shift);
+ }
+ uchar curBit8 = 1<<(m&7);
+
+ // compute sort order and divergence array
+ int u = 0, v = 0, p = m, q = m;
+ if (m % 2 != 1) {
+ for (int n = 0; n < N; n++) {
+ int a_n = ad[n].a;
+ if (/*curBits64[a_n] & curBit64*/curBits8[a_n] & curBit8) {
+ ad1[v].a = a_n; ad1[v].d = q; v++; q = ad[n+1].d; if (q < p) p = q;
+ }
+ else {
+ ad[u].a = a_n; ad[u].d = p; u++; p = ad[n+1].d; if (p < q) q = p;
+ }
+ }
+ }
+ else {
+ for (int n = 0; n < N; n++) {
+ int a_n = ad[n].a;
+ if (/*curBits64[a_n] & curBit64*/curBits8[a_n] & curBit8) {
+ ad1[v].a = a_n; ad1[v].d = M; v++; q = ad[n+1].d; if (q < p) p = q;
+ }
+ else {
+ ad[u].a = a_n; ad[u].d = p; u++; p = ad[n+1].d; if (p < q) q = p;
+ }
+ }
+ ad1[0].d = m;
+ }
+ memcpy(ad+u, ad1, v * sizeof(ad[0]));
+
+ // perform pre-order traversal and store to HapTree
+ if (m % 2 == 0)
+ treePtrs[m/2] = new HapTreeMulti(hapBitsT, ad, M, workNodes);
+ }
+
+ delete[] curBits8;
+ delete[] curBits64;
+ ALIGNED_FREE(workNodes);
+ delete[] ad1;
+ delete[] ad;
+ }
+ HapHedgeErr::~HapHedgeErr(void) {
+ for (int t = 0; t < T; t++)
+ delete treePtrs[t];
+ delete[] treePtrs;
+ }
+ const HapTreeMulti &HapHedgeErr::getHapTreeMulti(int t) const {
+ return *treePtrs[t];
+ };
+ int HapHedgeErr::getNumTrees(void) const {
+ return T;
+ }
+ const HapBitsT &HapHedgeErr::getHapBitsT() const {
+ return hapBitsT;
+ }
+ // for debugging
+ void HapHedgeErr::printTree(int t) const {
+ treePtrs[t]->dfsPrint("", 2*t, 2*T, treePtrs[t]->getRootState());
+ }
+
+}
diff --git a/src/HapHedge.hpp b/src/HapHedge.hpp
new file mode 100644
index 0000000..c7212f3
--- /dev/null
+++ b/src/HapHedge.hpp
@@ -0,0 +1,133 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef HAPHEDGE_HPP
+#define HAPHEDGE_HPP
+
+#include <vector>
+
+#include "Types.hpp"
+
+using namespace std;
+
+namespace EAGLE {
+
+ struct HapTreeState {
+ int seq, node, count;
+ };
+
+ struct SortDiv {
+ int a, d;
+ };
+
+ class HapBitsT {
+#ifdef TEST_VS
+ const std::vector <std::string> &hapBitsVS;
+#endif
+ uint64 *haploBitsT;
+ uint64 Nhaps, M, M64;
+ void setBit(uint64 n, uint64 m);
+ public:
+#ifdef TEST_VS
+ HapBitsT(const std::vector <std::string> &_hapBitsVS);
+#else
+ HapBitsT(const uint64 *_haploBitsT, uint64 _Nhaps, uint64 Mseg64, const uchar maskSnps64j[]);
+ HapBitsT(const uint64 *inBitsT, uint64 Mseg64, const std::vector <uint64> &splits64j,
+ const std::vector <uchar> &splitGenos, const std::vector <uint64_masks> &tgtGenoBits,
+ const std::vector <uint> &bestHaps);
+ HapBitsT(const HapBitsT &hapBitsFwdT, int dir); // reverse constructor (dir must equal -1)
+ uint64 getBits64(uint64 n, uint64 m64) const;
+#endif
+ ~HapBitsT(void);
+ int getBit(uint64 n, uint64 m) const;
+ int getNhaps(void) const;
+ int getM(void) const;
+ };
+
+ struct HapTreeNode {
+ int mSplit, count0, seq1;
+ };
+
+ struct HapTreeMultiNode {
+ int mSplit, count0, node0, seq1, node1;
+ // MEMORY-SAVING ALTERNATIVE: use 1 bit (sign of state.count) vs. node0 to encode next node
+ };
+
+ struct WorkTreeNode {
+ int d, a, count, up, left, right;
+ WorkTreeNode(int _d, int _a, int _count, int _up, int _left, int _right);
+ };
+
+ class HapTree {
+ const HapBitsT &hapBitsT;
+ const int Nhaps; float invNhaps;
+ int seq0; // lexicographically first ref seq
+ HapTreeNode *nodes; // [Nhaps-1]
+ public:
+ HapTree(const HapBitsT &_hapBitsT, int a[], int d[]);
+ ~HapTree(void);
+ float getInvNhaps(void) const;
+ HapTreeState getRootState(void) const;
+ bool next(int m, HapTreeState &state, int nextBit) const;
+ };
+
+ class HapTreeMulti {
+ const HapBitsT &hapBitsT;
+ const int Nhaps; float invNhaps;
+ HapTreeState rootState;
+ HapTreeMultiNode *nodes; // [Nhaps-1]
+ public:
+ HapTreeMulti(const HapBitsT &_hapBitsT, SortDiv ad[], int M, WorkTreeNode workNodes[]);
+ ~HapTreeMulti(void);
+ float getInvNhaps(void) const;
+ HapTreeState getRootState(void) const;
+ bool next(int m, HapTreeState &state, int nextBit) const;
+ void nextAtFrac(int m, HapTreeState &state, double nextFrac) const;
+ void dfsPrint(std::string curPrefix, int m, int M, const HapTreeState &state) const;
+ };
+
+ class HapHedge {
+ const HapBitsT &hapBitsT;
+ const int skip, T;
+ HapTree **treePtrs;
+ public:
+ HapHedge(const HapBitsT &_hapBitsT, int skip/*, const std::vector <int> &treeStarts*/);
+ ~HapHedge(void);
+ const HapTree &getHapTree(int t) const;
+ int getM(void) const;
+ int getSkip(void) const;
+ int getNumTrees(void) const;
+ const HapBitsT &getHapBitsT(void) const;
+ };
+
+ class HapHedgeErr {
+ const HapBitsT &hapBitsT;
+ const int T;
+ HapTreeMulti **treePtrs;
+ public:
+ HapHedgeErr(const HapBitsT &_hapBitsT);
+ ~HapHedgeErr(void);
+ const HapTreeMulti &getHapTreeMulti(int t) const;
+ int getNumTrees(void) const;
+ const HapBitsT &getHapBitsT() const;
+ void printTree(int t) const;
+ };
+
+}
+
+#endif
diff --git a/src/LapackConst.hpp b/src/LapackConst.hpp
new file mode 100644
index 0000000..8082e2d
--- /dev/null
+++ b/src/LapackConst.hpp
@@ -0,0 +1,76 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef LAPACKCONST_HPP
+#define LAPACKCONST_HPP
+
+#ifdef USE_MKL
+
+#include "mkl.h"
+#define DGER_MACRO dger
+#define DGEMV_MACRO dgemv
+#define DGEMM_MACRO dgemm
+#define SGEMM_MACRO sgemm
+#define DGELS_MACRO dgels
+#define DGESVD_MACRO dgesvd
+
+#else
+
+ extern "C" int dgesvd_(char *jobu, char *jobvt, int *m, int *n, double *a, int *lda, double *s,
+ double *u, int *ldu, double *vt, int *ldvt, double *work, int *lwork,
+ int *info);
+ extern "C" int dgemv_(char *TRANS, int *M, int *N, double *ALPHA, double *A, int *LDA,
+ const double *X, int *INCX, double *BETA, double *Y, int *INCY);
+ extern "C" int dger_(int *M, int *N, double *ALPHA, double *X, int *INCX, const double *Y,
+ int *INCY, double *A, int *LDA);
+ extern "C" int dgemm_(char *TRANSA, char *TRANSB, int *M, int *N, int *K, double *ALPHA,
+ const double *A, int *LDA, const double *B, int *LDB, double *BETA,
+ double *C, int *LDC);
+ extern "C" int sgemm_(char *TRANSA, char *TRANSB, int *M, int *N, int *K, float *ALPHA,
+ const float *A, int *LDA, const float *B, int *LDB, float *BETA,
+ float *C, int *LDC);
+ extern "C" int dgels_(char *TRANS, int *M, int *N, int *NRHS, double *A, int *LDA, double *B,
+ int *LDB, double *WORK, int *LWORK, int *INFO);
+
+#define DGER_MACRO dger_
+#define DGEMV_MACRO dgemv_
+#define DGEMM_MACRO dgemm_
+#define SGEMM_MACRO sgemm_
+#define DGELS_MACRO dgels_
+#define DGESVD_MACRO dgesvd_
+
+#endif
+
+
+
+/*
+namespace LapackConst {
+
+#ifndef USE_MKL
+#ifdef USE_MKL
+ inline CBLAS_TRANSPOSE lapackTransToMKL(char trans) {
+ return (trans=='N'||trans=='n') ? CblasNoTrans : CblasTrans;
+ }
+#endif
+
+ void dgemm_wrap(char TRANSA, char TRANSB, int M, int N, int K, double ALPHA,
+ const double *A, int LDA, const double *B, int LDB, double BETA,
+ double *C, int LDC);
+}
+*/
+#endif
diff --git a/src/Makefile b/src/Makefile
new file mode 100644
index 0000000..e17235d
--- /dev/null
+++ b/src/Makefile
@@ -0,0 +1,107 @@
+# build options:
+# 1. cmd line: linking = dynamic (default), static-except-glibc (recommended release), static
+# 2. cmd line: debug = false (default => -O2), true (-g)
+
+
+### modify these paths to local BLAS, Boost and htslib install directories
+LBLAS = -lopenblas # alternatively -llapack (just need sgemm_)
+BLAS_DIR = /opt/openblas/0.2.14/lib
+BOOST_INSTALL_DIR = /home/pl88/boost_1_58_0/install
+HTSLIB_DIR = /groups/price/poru/external_software/htslib/htslib-1.3
+
+
+### these paths are used only for static linking
+ZLIB_STATIC_DIR = /opt/zlib-1.2.8/lib # probably unnecessary on most systems
+GLIBC_STATIC_DIR = /home/pl88/glibc-static/usr/lib64
+
+
+ifeq ($(strip ${linking}),)
+ linking = dynamic
+endif
+
+CC = g++
+
+ifeq (${debug},true)
+ CFLAGS += -g
+else
+ CFLAGS += -O2
+endif
+ifeq (${prof},true)
+ CFLAGS += -g -pg
+ LFLAGS += -pg
+endif
+
+CFLAGS += -std=c++0x -msse -msse2 -fopenmp -Wall
+LFLAGS += -fopenmp
+
+
+# add BLAS lib path
+ifneq ($(strip ${BLAS_DIR}),)
+ LPATHS += -L${BLAS_DIR}
+ ifeq (${linking},dynamic)
+ LPATHS += -Wl,-rpath,${BLAS_DIR}
+ endif
+endif
+
+# add Boost include and lib paths
+ifneq ($(strip ${BOOST_INSTALL_DIR}),)
+ CPATHS += -I${BOOST_INSTALL_DIR}/include
+ LPATHS += -L${BOOST_INSTALL_DIR}/lib
+ ifeq (${linking},dynamic)
+ LPATHS += -Wl,-rpath,${BOOST_INSTALL_DIR}/lib
+ endif
+endif
+
+# add htslib include and lib paths
+ifneq ($(strip ${HTSLIB_DIR}),)
+ CPATHS += -I${HTSLIB_DIR}
+ LPATHS += -L${HTSLIB_DIR}
+ ifeq (${linking},dynamic)
+ LPATHS += -Wl,-rpath,${HTSLIB_DIR}
+ endif
+endif
+
+# add zlib.a path for static linking on Orchestra
+ifneq ($(strip ${ZLIB_STATIC_DIR}),)
+ ifneq (${linking},dynamic)
+ LPATHS += -L${ZLIB_STATIC_DIR}
+ endif
+endif
+
+# add flags for static linking; build LAPACK/MKL component of link line
+ifeq (${linking},static)
+ LFLAGS += -static
+ LPATHS += -L${GLIBC_STATIC_DIR} -L${ZLIB_STATIC_DIR}
+else ifeq (${linking},static-except-glibc)
+ LFLAGS += -static-libgcc -static-libstdc++
+ LPATHS += -L${ZLIB_STATIC_DIR}
+endif
+
+# build link line (minus flags)
+LLIBS = -lhts -lboost_program_options -lboost_iostreams -lz ${LBLAS}
+ifeq (${linking},static-except-glibc)
+ L = ${LPATHS} -Wl,-Bstatic ${LLIBS} -Wl,-Bdynamic -lpthread -lm
+else
+ L = ${LPATHS} ${LLIBS} -lpthread -lm
+endif
+
+
+T = eagle
+O = DipTreePBWT.o Eagle.o EagleImpMiss.o EagleParams.o EaglePBWT.o FileUtils.o GenoData.o HapHedge.o MapInterpolater.o MemoryUtils.o NumericUtils.o StaticMultimap.o StringUtils.o SyncedVcfData.o Timer.o
+OMAIN = EagleMain.o $O
+
+.PHONY: clean
+
+$T: ${OMAIN}
+ ${CC} ${LFLAGS} -o $T ${OMAIN} $L
+
+%.o: %.cpp
+ ${CC} ${CFLAGS} ${CPATHS} -o $@ -c $<
+EagleMain.o: Version.hpp
+Eagle.o: Version.hpp
+
+all: $T
+
+clean:
+ rm -f *.o
+ rm -f $T
diff --git a/src/MapInterpolater.cpp b/src/MapInterpolater.cpp
new file mode 100644
index 0000000..e7c35a5
--- /dev/null
+++ b/src/MapInterpolater.cpp
@@ -0,0 +1,83 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <cstdlib>
+#include <string>
+#include <map>
+#include <utility>
+#include <iostream>
+#include <fstream>
+
+#include "MapInterpolater.hpp"
+#include "FileUtils.hpp"
+
+namespace Genetics {
+
+ using std::vector;
+ using std::string;
+ using std::pair;
+ using std::make_pair;
+ using std::map;
+ using std::cout;
+ using std::cerr;
+ using std::endl;
+ using FileUtils::getline;
+
+ const string MapInterpolater::MAP_FILE_HEADER =
+ "chr position COMBINED_rate(cM/Mb) Genetic_Map(cM)";
+
+ MapInterpolater::MapInterpolater(const string &geneticMapFile) {
+ if (geneticMapFile.empty()) return;
+
+ chrBpToRateGen[make_pair(0, 0)] = make_pair(0.0, 0.0); // sentinel at beginning
+ string line;
+ FileUtils::AutoGzIfstream fin; fin.openOrExit(geneticMapFile);
+ getline(fin, line);
+ if (line != MAP_FILE_HEADER) {
+ cerr << "ERROR: Wrong format of reference map " << geneticMapFile << endl;
+ cerr << " Expecting header: " << MAP_FILE_HEADER << endl;
+ exit(1);
+ }
+ int chr0 = 0, bp0 = 0; double gen0 = 0;
+ int chr, bp; double rate, gen;
+ while (fin >> chr >> bp >> rate >> gen) {
+ if (chr == chr0)
+ chrBpToRateGen[make_pair(chr, bp)] = make_pair((gen-gen0)/(1e-6*(bp-bp0)), gen);
+ chr0 = chr; bp0 = bp; gen0 = gen;
+ }
+ chrBpToRateGen[make_pair(chr, bp+1e9)] = make_pair(1.0, gen+1e3); // sentinel at end
+ }
+
+ // returns interpolated genetic position in Morgans
+ double MapInterpolater::interp(int chr, int bp) const {
+ if (chrBpToRateGen.empty()) return 0;
+ map < pair <int, int>, pair <double, double> >::const_iterator ubIter =
+ chrBpToRateGen.upper_bound(make_pair(chr, bp)); // map record > (chr, bp)
+ if (ubIter == chrBpToRateGen.end()) {
+ cerr << "ERROR: Chromosome " << chr << " is not in genetic map" << endl;
+ exit(1);
+ }
+ int ubChr = ubIter->first.first;
+ int ubBp = ubIter->first.second;
+ double ubRate = ubIter->second.first;
+ double ubGen = ubIter->second.second;
+
+ if (chr == ubChr) return 0.01 * (ubGen + 1e-6 * ubRate * (bp-ubBp)); // interpolate interval
+ else return 0.01 * (--ubIter)->second.second; // end of previous chromosome
+ }
+}
diff --git a/src/MapInterpolater.hpp b/src/MapInterpolater.hpp
new file mode 100644
index 0000000..15577dd
--- /dev/null
+++ b/src/MapInterpolater.hpp
@@ -0,0 +1,40 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef MAPINTERPOLATER_HPP
+#define MAPINTERPOLATER_HPP
+
+#include <string>
+#include <map>
+#include <utility>
+
+namespace Genetics {
+
+ class MapInterpolater {
+ std::map < std::pair <int, int>, std::pair <double, double> > chrBpToRateGen;
+ static const std::string MAP_FILE_HEADER;
+ public:
+ // input file format: chr position COMBINED_rate(cM/Mb) Genetic_Map(cM)
+ // (Oxford map format preceded by chr column)
+ MapInterpolater(const std::string &geneticMapFile);
+ // returns interpolated genetic position in Morgans
+ double interp(int chr, int bp) const;
+ };
+
+}
+#endif
diff --git a/src/MemoryUtils.cpp b/src/MemoryUtils.cpp
new file mode 100644
index 0000000..cf204d4
--- /dev/null
+++ b/src/MemoryUtils.cpp
@@ -0,0 +1,40 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <cstdlib>
+#include <iostream>
+
+#include "MemoryUtils.hpp"
+#include "Types.hpp"
+
+void *ALIGNED_MALLOC(uint64 size) {
+#ifdef USE_MKL_MALLOC
+ void *p = mkl_malloc(size, MEM_ALIGNMENT);
+#else
+ void *p = _mm_malloc(size, MEM_ALIGNMENT);
+#endif
+ // TODO: change to assert() or dispense with altogether and change ALIGNED_MALLOC to macro?
+ if (p == NULL) {
+ std::cerr << "ERROR: Failed to allocate " << size << " bytes" << std::endl;
+ exit(1);
+ } else if ((uint64) p & 0xf) {
+ std::cerr << "ERROR: Memory alignment of " << size << " bytes failed" << std::endl;
+ exit(1);
+ }
+ return p;
+}
diff --git a/src/MemoryUtils.hpp b/src/MemoryUtils.hpp
new file mode 100644
index 0000000..e6dd802
--- /dev/null
+++ b/src/MemoryUtils.hpp
@@ -0,0 +1,45 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef MEMORYUTILS_HPP
+#define MEMORYUTILS_HPP
+
+#include "Types.hpp"
+
+#define MEM_ALIGNMENT 64
+
+//#define ALIGNED_MALLOC(size) mkl_malloc(size, MEM_ALIGNMENT)
+//#define ALIGNED_MALLOC(size) _mm_malloc(size, MEM_ALIGNMENT)
+void *ALIGNED_MALLOC(uint64 size);
+
+#ifdef USE_MKL_MALLOC
+#include <mkl.h>
+#define ALIGNED_FREE mkl_free
+#else
+#include <xmmintrin.h>
+#define ALIGNED_FREE _mm_free
+#endif
+
+#define ALIGNED_MALLOC_DOUBLES(numDoubles) (double *) ALIGNED_MALLOC((numDoubles)*sizeof(double))
+#define ALIGNED_MALLOC_FLOATS(numFloats) (float *) ALIGNED_MALLOC((numFloats)*sizeof(float))
+#define ALIGNED_MALLOC_UCHARS(numUchars) (uchar *) ALIGNED_MALLOC((numUchars)*sizeof(uchar))
+#define ALIGNED_MALLOC_UINTS(numUints) (uint *) ALIGNED_MALLOC((numUints)*sizeof(uint))
+#define ALIGNED_MALLOC_UINT64S(numUint64s) (uint64 *) ALIGNED_MALLOC((numUint64s)*sizeof(uint64))
+#define ALIGNED_MALLOC_UINT64_MASKS(numUint64_masks) (uint64_masks *) ALIGNED_MALLOC((numUint64_masks)*sizeof(uint64_masks))
+
+#endif
diff --git a/src/NumericUtils.cpp b/src/NumericUtils.cpp
new file mode 100644
index 0000000..449c994
--- /dev/null
+++ b/src/NumericUtils.cpp
@@ -0,0 +1,108 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <cmath>
+#include <cstdlib>
+#include <utility>
+
+#include "NumericUtils.hpp"
+
+namespace NumericUtils {
+ double sum(const double x[], uint64 N) {
+ double ans = 0;
+ for (uint64 n = 0; n < N; n++)
+ ans += x[n];
+ return ans;
+ }
+ double mean(const std::vector <double> &x) {
+ uint64 N = x.size(); return sum(&x[0], N) / N;
+ }
+ // takes into account that some 0 values may indicate missing/ignored: divide out by Nused, not N
+ double mean(const double x[], uint64 N, uint64 Nused) {
+ return sum(x, N) / Nused;
+ }
+ // regress y on x, assuming both have been 0-centered (so 0-filled missing values ok)
+ double regCoeff(const double y[], const double x[], uint64 N) {
+ /* WRONG! if not mean-centered already, need to mask missing indivs in loop
+ double xbar = mean(x, N, Nused);
+ double ybar = mean(y, N, Nused);
+ cout << "xbar: " << xbar << " ybar: " << ybar << endl;
+ double numer = 0, denom = 0;
+ for (uint64 n = 0; n < N; n++) {
+ numer += (x[n]-xbar) * (y[n]-ybar);
+ denom += sq(x[n]-xbar);
+ }
+ */
+ double numer = 0, denom = 0;
+ for (uint64 n = 0; n < N; n++) {
+ numer += x[n] * y[n];
+ denom += sq(x[n]);
+ }
+ return numer / denom;
+ }
+ double dot(const double x[], const double y[], uint64 N) {
+ double ans = 0;
+ for (uint64 n = 0; n < N; n++)
+ ans += x[n] * y[n];
+ return ans;
+ }
+ double norm2(const double x[], uint64 N) {
+ double ans = 0;
+ for (uint64 n = 0; n < N; n++)
+ ans += sq(x[n]);
+ return ans;
+ }
+ void normalize(double x[], uint64 N) {
+ double scale = 1.0 / sqrt(norm2(x, N));
+ for (uint64 n = 0; n < N; n++)
+ x[n] *= scale;
+ }
+
+ std::pair <double, double> meanStdDev(const double x[], uint64 N) {
+ double mu = 0, s2 = 0;
+ for (uint64 n = 0; n < N; n++) mu += x[n];
+ mu /= N;
+ for (uint64 n = 0; n < N; n++) s2 += sq(x[n]-mu);
+ s2 /= (N-1);
+ return std::make_pair(mu, sqrt(s2));
+ }
+ std::pair <double, double> meanStdErr(const double x[], uint64 N) {
+ std::pair <double, double> ret = meanStdDev(x, N);
+ ret.second /= sqrt((double) N);
+ return ret;
+ }
+ std::pair <double, double> meanStdDev(const std::vector <double> &x) {
+ return meanStdDev(&x[0], x.size());
+ }
+ std::pair <double, double> meanStdErr(const std::vector <double> &x) {
+ return meanStdErr(&x[0], x.size());
+ }
+ void logSumExp(float &x, float y) {
+ float big, diff;
+ if (x > y) {
+ big = x; diff = y-x;
+ }
+ else {
+ big = y;
+ diff = x-y;
+ }
+ if (diff < -10) x = big; // a < 1e-4 * b => ignore a
+ else if (diff < -5) x = big + expf(diff); // a < 1e-2 * b => use 1st-order Taylor expansion
+ else x = big + logf(1.0f + expf(diff));
+ }
+}
diff --git a/src/NumericUtils.hpp b/src/NumericUtils.hpp
new file mode 100644
index 0000000..1e64273
--- /dev/null
+++ b/src/NumericUtils.hpp
@@ -0,0 +1,52 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef NUMERICUTILS_HPP
+#define NUMERICUTILS_HPP
+
+#include <cstdlib>
+#include <vector>
+#include <utility>
+
+#include "Types.hpp"
+
+namespace NumericUtils {
+
+ inline double sq(double x) { return x*x; }
+ double sum(const double x[], uint64 N);
+ double mean(const std::vector <double> &x);
+
+ // takes into account that some 0 values may indicate missing/ignored: divide out by Nused, not N
+ double mean(const double x[], uint64 N, uint64 Nused);
+
+ // regress y on x, assuming both have been 0-centered (so 0-filled missing values ok)
+ double regCoeff(const double y[], const double x[], uint64 N);
+
+ double dot(const double x[], const double y[], uint64 N);
+ double norm2(const double x[], uint64 N);
+ void normalize(double x[], uint64 N);
+
+ void logSumExp(float &x, float y);
+
+ std::pair <double, double> meanStdDev(const double x[], uint64 N);
+ std::pair <double, double> meanStdErr(const double x[], uint64 N);
+ std::pair <double, double> meanStdDev(const std::vector <double> &x);
+ std::pair <double, double> meanStdErr(const std::vector <double> &x);
+}
+
+#endif
diff --git a/src/StaticMultimap.cpp b/src/StaticMultimap.cpp
new file mode 100644
index 0000000..7347751
--- /dev/null
+++ b/src/StaticMultimap.cpp
@@ -0,0 +1,114 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <vector>
+#include <map>
+#include <algorithm>
+
+#include "MemoryUtils.hpp"
+#include "Types.hpp"
+#include "StaticMultimap.hpp"
+
+namespace EAGLE {
+
+ using std::vector;
+ using std::map;
+ using std::min;
+ using std::swap;
+
+ uint randMWC(uint &z, uint &w, uint mod) {
+ z=36969*(z&65535)+(z>>16);
+ w=18000*(w&65535)+(w>>16);
+ return ((z<<16)+w) % mod;
+ }
+
+ StaticMultimap::StaticMultimap() : initialized(false) {}
+
+ // for value=[0..len(keyVec)), keyVec[value] = key
+ // ignore if key == -1
+ // only store up to maxValuesPerKey (if more values, choose randomly)
+ StaticMultimap::StaticMultimap(const vector <uint> &keyVec, uint maxValuesPerKey) {
+ init(keyVec, maxValuesPerKey);
+ }
+
+ void StaticMultimap::init(const vector <uint> &keyVec, uint maxValuesPerKey) {
+ // populate tmp map: non-ignored key -> indices i with keyVec[i]=key
+ map < uint, vector <uint> > tmpMap;
+ for (uint i = 0; i < keyVec.size(); i++)
+ if (keyVec[i] != -1U)
+ tmpMap[keyVec[i]].push_back(i);
+
+ // iterate through tmp map to determine total number of values to keep (chop at max)
+ nKeys = tmpMap.size();
+ nValues = 0;
+ for (map < uint, vector <uint> >::iterator it = tmpMap.begin(); it != tmpMap.end(); it++)
+ nValues += min((uint) it->second.size(), maxValuesPerKey);
+
+ // allocate arrays
+ keys = ALIGNED_MALLOC_UINTS(nKeys);
+ startInds = ALIGNED_MALLOC_UINTS(nKeys);
+ lenValues = ALIGNED_MALLOC_UINTS(nKeys + nValues);
+
+ // initialize Marsaglia's MWC
+ uint z = 362436069, w = 521288629;
+
+ // iterate through tmp map to randomly select and store up to maxValuesPerKey
+ uint keysPos = 0, lenValuesPos = 0;
+ for (map < uint, vector <uint> >::iterator it = tmpMap.begin(); it != tmpMap.end(); it++) {
+ keys[keysPos] = it->first; // store key
+ startInds[keysPos] = lenValuesPos; // store address of record in storage array
+ keysPos++;
+ vector <uint> &values = it->second;
+ if (values.size() > maxValuesPerKey) // randomly move maxValuesPerKey elements to front
+ for (uint j = 0; j < maxValuesPerKey; j++)
+ swap(values[j], values[j + randMWC(z, w, values.size() - j)]);
+ uint nValuesCurKey = min((uint) values.size(), maxValuesPerKey);
+ lenValues[lenValuesPos++] = nValuesCurKey; // store length as first entry of record
+ for (uint j = 0; j < nValuesCurKey; j++) // store list of values
+ lenValues[lenValuesPos++] = values[j];
+ }
+
+ initialized = true;
+ }
+
+ StaticMultimap::~StaticMultimap() {
+ if (initialized) {
+ ALIGNED_FREE(lenValues);
+ ALIGNED_FREE(startInds);
+ ALIGNED_FREE(keys);
+ }
+ }
+
+ // returns pointer to record: list size followed by list
+ // returns NULL if key not found
+ const uint *StaticMultimap::query(uint key) const {
+ uint lo = 0, hi = nKeys; // condition: keys[lo] <= key < keys[hi] (hi = nKeys ok)
+ while (lo+1<hi) {
+ uint mid = (lo+hi)/2;
+ if (keys[mid] <= key)
+ lo = mid;
+ else
+ hi = mid;
+ }
+ if (keys[lo] == key)
+ return lenValues + startInds[lo];
+ else
+ return NULL;
+ }
+
+}
diff --git a/src/StaticMultimap.hpp b/src/StaticMultimap.hpp
new file mode 100644
index 0000000..8916f65
--- /dev/null
+++ b/src/StaticMultimap.hpp
@@ -0,0 +1,56 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef STATICMULTIMAP_HPP
+#define STATICMULTIMAP_HPP
+
+#include <vector>
+
+#include "Types.hpp"
+
+namespace EAGLE {
+
+ class StaticMultimap {
+
+ private:
+
+ uint nKeys, nValues;
+ uint *keys; // [VECTOR]: nKeys
+ uint *startInds; // [VECTOR]: nKeys
+ uint *lenValues; // [VECTOR]: nKeys + nValues (each record: list size followed by list)
+ bool initialized;
+
+ public:
+
+ StaticMultimap();
+ // for value=[0..len(keyVec)), keyVec[value] = key
+ // ignore if key == -1
+ // only store up to maxValuesPerKey (if more values, choose randomly)
+ StaticMultimap(const std::vector <uint> &keyVec, uint maxValuesPerKey);
+ void init(const std::vector <uint> &keyVec, uint maxValuesPerKey);
+ ~StaticMultimap();
+
+ // returns pointer to record: list size followed by list
+ // returns NULL if key not found
+ const uint *query(uint key) const;
+
+ };
+
+}
+
+#endif
diff --git a/src/StringUtils.cpp b/src/StringUtils.cpp
new file mode 100644
index 0000000..4366ccb
--- /dev/null
+++ b/src/StringUtils.cpp
@@ -0,0 +1,151 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <vector>
+#include <string>
+#include <cstdlib>
+#include <cstdio>
+#include <cstring>
+#include <cctype>
+#include <iostream>
+#include <sstream>
+
+#include "StringUtils.hpp"
+
+namespace StringUtils {
+ using std::vector;
+ using std::string;
+ using std::cout;
+ using std::cerr;
+ using std::endl;
+
+ int stoi(const string &s) {
+ int i;
+ if (sscanf(s.c_str(), "%d", &i) == 0) {
+ cerr << "ERROR: Could not parse integer from string: " << s << endl;
+ exit(1);
+ }
+ return i;
+ }
+ double stod(const string &s) {
+ double d;
+ sscanf(s.c_str(), "%lf", &d);
+ return d;
+ }
+ string itos(int i) {
+ std::ostringstream oss;
+ oss << i;
+ return oss.str();
+ }
+ string findDelimiters(const string &s, const string &c) {
+ string delims;
+ for (uint p = 0; p < s.length(); p++)
+ if (c.find(s[p], 0) != string::npos)
+ delims += s[p];
+ return delims;
+ }
+ // will not return blanks
+ vector <string> tokenizeMultipleDelimiters(const string &s, const string &c)
+ {
+ uint p = 0;
+ vector <string> ans;
+ string tmp;
+ while (p < s.length()) {
+ tmp = "";
+ while (p < s.length() && c.find(s[p], 0) != string::npos)
+ p++;
+ while (p < s.length() && c.find(s[p], 0) == string::npos) {
+ tmp += s[p];
+ p++;
+ }
+ if (tmp != "")
+ ans.push_back(tmp);
+ }
+ return ans;
+ }
+
+ void rangeErrorExit(const string &str, const string &delims) {
+ cerr << "ERROR: Invalid delimiter sequence for specifying range: " << endl;
+ cerr << " Template string: " << str << endl;
+ cerr << " Delimiter sequence found: " << delims << endl;
+ cerr << "Range in must have format {start:end} with no other " << RANGE_DELIMS
+ << " chars" << endl;
+ exit(1);
+ }
+
+ // basic range template: expand "{start:end}" to vector <string> with one entry per range element
+ // if end==start-1, will return empty
+ vector <string> expandRangeTemplate(const string &str) {
+ vector <string> ret;
+ string delims = findDelimiters(str, RANGE_DELIMS);
+ if (delims.empty())
+ ret.push_back(str);
+ else if (delims == RANGE_DELIMS) {
+ vector <string> tokens = tokenizeMultipleDelimiters(str, RANGE_DELIMS);
+ for (int i = 0; i < (int) str.size(); i++)
+ if (str[i] == ':' && (str[i-1] == '{' || str[i+1] == '}'))
+ rangeErrorExit(str, delims);
+ int startInd = (str[0] != RANGE_DELIMS[0]), endInd = startInd+1;
+ string prefix, suffix;
+ if (str[0] != RANGE_DELIMS[0]) prefix = tokens[0];
+ if (str[str.length()-1] != RANGE_DELIMS[2]) suffix = tokens.back();
+ int start = StringUtils::stoi(tokens[startInd]), end = StringUtils::stoi(tokens[endInd]);
+ if (start > end+1 || end > start+1000000) {
+ cerr << "ERROR: Invalid range in template string: " << str << endl;
+ cerr << " Start: " << start << endl;
+ cerr << " End: " << end << endl;
+ exit(1);
+ }
+ for (int i = start; i <= end; i++)
+ ret.push_back(prefix + itos(i) + suffix);
+ }
+ else
+ rangeErrorExit(str, delims);
+ return ret;
+ }
+
+ vector <string> expandRangeTemplates(const vector <string> &rangeTemplates) {
+ vector <string> expanded;
+ for (uint i = 0; i < rangeTemplates.size(); i++) {
+ vector <string> range = expandRangeTemplate(rangeTemplates[i]);
+ expanded.insert(expanded.end(), range.begin(), range.end());
+ }
+ return expanded;
+ }
+
+ int bcfNameToChrom(const char *nameBuf, int chromMin, int chromX) {
+ int chrom;
+ int startPos = 0;
+ if (strlen(nameBuf)>3 &&
+ tolower(nameBuf[0])=='c' && tolower(nameBuf[1])=='h' && tolower(nameBuf[2])=='r')
+ startPos = 3; // allow prefix "chr"
+ if ((int) strlen(nameBuf) == startPos + 1 && toupper(nameBuf[startPos])=='X')
+ chrom = chromX;
+ else {
+ sscanf(nameBuf + startPos, "%d", &chrom);
+ if (!isdigit(nameBuf[startPos]) || !(chrom >= chromMin && chrom <= chromX)) {
+ cerr << "ERROR: Invalid chromosome: " << nameBuf << endl;
+ cerr << " Chromosome number must be between " << chromMin
+ << " and --chromX (= " << chromX << ")" << endl;
+ exit(1);
+ }
+ }
+ return chrom;
+ }
+
+}
diff --git a/src/StringUtils.hpp b/src/StringUtils.hpp
new file mode 100644
index 0000000..c02d2ce
--- /dev/null
+++ b/src/StringUtils.hpp
@@ -0,0 +1,45 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef STRINGUTILS_HPP
+#define STRINGUTILS_HPP
+
+#include <vector>
+#include <string>
+
+namespace StringUtils {
+
+ const std::string RANGE_DELIMS = "{:}"; // must have 3 chars
+
+ int stoi(const std::string &s);
+ double stod(const std::string &s);
+ std::string itos(int i);
+ std::string findDelimiters(const std::string &s, const std::string &c);
+
+ // will not return blanks
+ std::vector <std::string> tokenizeMultipleDelimiters(const std::string &s, const std::string &c);
+
+ // basic range template: expand "{start:end}" to vector <string> with one entry per range element
+ // if end==start-1, will return empty
+ std::vector <std::string> expandRangeTemplate(const std::string &str);
+ std::vector <std::string> expandRangeTemplates(const std::vector <std::string> &rangeTemplates);
+
+ int bcfNameToChrom(const char *nameBuf, int chromMin, int chromX);
+}
+
+#endif
diff --git a/src/SyncedVcfData.cpp b/src/SyncedVcfData.cpp
new file mode 100644
index 0000000..765a37e
--- /dev/null
+++ b/src/SyncedVcfData.cpp
@@ -0,0 +1,520 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <iostream>
+#include <vector>
+#include <string>
+#include <map>
+#include <algorithm>
+#include <cstring>
+
+#include <htslib/vcf.h>
+#include <htslib/synced_bcf_reader.h>
+
+#include "Types.hpp"
+#include "MemoryUtils.hpp"
+#include "MapInterpolater.hpp"
+#include "StringUtils.hpp"
+#include "SyncedVcfData.hpp"
+
+namespace EAGLE {
+
+ using std::vector;
+ using std::string;
+ using std::pair;
+ using std::make_pair;
+ using std::cout;
+ using std::cerr;
+ using std::endl;
+
+ void process_ref_genotypes(int nsmpl, int ngt, int32_t *gt, bool allowHaploid, bool refAltSwap,
+ vector <bool> &hapsRef, int &numMissing, int &numUnphased, uint &w) {
+ numMissing = numUnphased = 0;
+ if (ngt != 2*nsmpl) {
+ cerr << "ERROR: ref ploidy != 2 (ngt != 2*nsmpl): ngt="
+ << ngt << ", nsmpl=" << nsmpl << endl;
+ exit(1);
+ }
+ int ploidy = ngt/nsmpl;
+ for (int i=0; i<nsmpl; i++)
+ {
+ int32_t *ptr = gt + i*ploidy;
+ bool haps[2]; bool missing = false, unphased = false;
+ for (int j=0; j<ploidy; j++)
+ {
+ if ( ptr[j]==bcf_int32_vector_end ) {
+ if (j == 0) {
+ cerr << "ERROR: ptr[0]==bcf_int32_vector_end... zero ploidy?" << endl;
+ exit(1);
+ }
+ else { // 2nd of ploidy==2 genotypes is set to bcf_int32_vector_end => haploid
+ if ( missing ) continue; // missing diploid genotype can be written in VCF as "."
+ else if (allowHaploid) { // X chromosome => haploid ok
+ haps[j] = haps[j-1]; // encode as diploid homozygote
+ unphased = false;
+ }
+ else {
+ cerr << "ERROR: ref genotypes contain haploid sample" << endl;
+ exit(1);
+ }
+ }
+ }
+ else {
+ if ( bcf_gt_is_missing(ptr[j]) ) { // missing allele
+ missing = true;
+ }
+ else {
+ int idx = bcf_gt_allele(ptr[j]); // allele index
+ haps[j] = (idx >= 1); // encode REF allele -> 0, ALT allele(s) -> 1
+ if ( j==1 && !bcf_gt_is_phased(ptr[j]) ) unphased = true;
+ }
+ }
+ }
+ if (missing) {
+ haps[0] = haps[1] = 0; // set both alleles to REF allele
+ numMissing++;
+ }
+ else if (unphased) {
+ if (haps[0] != haps[1] && ((w=18000*(w&65535)+(w>>16))&1))
+ std::swap(haps[0], haps[1]); // randomize phasing
+ numUnphased++;
+ }
+ if (refAltSwap) { // target REF/ALT are swapped relative to reference REF/ALT
+ haps[0] = !haps[0];
+ haps[1] = !haps[1];
+ }
+ hapsRef.push_back(haps[0]);
+ hapsRef.push_back(haps[1]);
+ }
+ }
+
+ void process_target_genotypes(int nsmpl, int ngt, int32_t *gt, bool allowHaploid,
+ vector <uchar> &genosTarget, int &numMissing) {
+ numMissing = 0;
+ if (ngt != 2*nsmpl) {
+ cerr << "ERROR: target ploidy != 2 (ngt != 2*nsmpl): ngt="
+ << ngt << ", nsmpl=" << nsmpl << endl;
+ exit(1);
+ }
+ int ploidy = ngt/nsmpl;
+ for (int i=0; i<nsmpl; i++)
+ {
+ int32_t *ptr = gt + i*ploidy;
+ bool missing = false;
+ uchar g = 0;
+ for (int j=0; j<ploidy; j++)
+ {
+ if ( ptr[j]==bcf_int32_vector_end ) {
+ if (j == 0) {
+ cerr << "ERROR: ptr[0]==bcf_int32_vector_end... zero ploidy?" << endl;
+ exit(1);
+ }
+ else { // 2nd of ploidy==2 genotypes is set to bcf_int32_vector_end => haploid
+ if ( missing ) continue; // missing diploid genotype can be written in VCF as "."
+ else if (allowHaploid) // X chromosome => haploid ok
+ g *= 2; // encode as diploid homozygote
+ else {
+ cerr << "ERROR: target genotypes contain haploid sample" << endl;
+ exit(1);
+ }
+ }
+ }
+ else {
+ if ( bcf_gt_is_missing(ptr[j]) ) { // missing allele
+ missing = true;
+ }
+ else {
+ int idx = bcf_gt_allele(ptr[j]); // allele index
+ if (idx > 1) {
+ cerr << "ERROR: multi-allelic site found in target; should have been filtered"
+ << endl;
+ exit(1);
+ }
+ g += idx;
+ }
+ }
+ }
+ if (missing) {
+ g = 9;
+ numMissing++;
+ }
+ genosTarget.push_back(g);
+ }
+ }
+
+ vector < pair <int, int> > SyncedVcfData::processVcfs
+ (const string &vcfRef, const string &vcfTarget, bool allowRefAltSwap, int chrom, int chromX,
+ double bpStart, double bpEnd, vector <bool> &hapsRef, vector <uchar> &genosTarget,
+ const string &tmpFile, const string &writeMode, int usePS,
+ vector < vector < pair <int, int> > > &conPSall) {
+
+ vector < pair <int, int> > chrBps;
+
+ bcf_srs_t *sr = bcf_sr_init();
+ sr->require_index = 1;
+
+ if ( chrom!=0 )
+ {
+ kstring_t str = {0,0,0};
+ ksprintf(&str,"%d:%d-%d",chrom,(uint32_t)bpStart,(uint32_t)bpEnd);
+ if ( bcf_sr_set_regions(sr, str.s, 0)!=0 )
+ {
+ cerr << "ERROR: failed to initialize the region:" << str.s;
+ exit(1);
+ }
+ free(str.s);
+ }
+
+ // By default, the synced reader requires that CHR, POS and ALT are the same
+ // in both files. If this is too strict and SNP/indel/all lines with the same
+ // position should be considered as matching, uncomment:
+ //
+ // sr->collapse = COLLAPSE_SNPS|COLLAPSE_INDELS;
+ //
+ // See also examples in bcftools/vcfisec etc.
+
+ if (allowRefAltSwap)
+ sr->collapse = COLLAPSE_SNPS;
+
+ if (!bcf_sr_add_reader(sr, vcfRef.c_str())) {
+ cerr << "ERROR: Could not open " << vcfRef << " for reading: " << bcf_sr_strerror(sr->errnum)
+ << endl;
+ exit(1);
+ }
+ if (!bcf_sr_add_reader(sr, vcfTarget.c_str())) {
+ cerr << "ERROR: Could not open " << vcfTarget << " for reading: "
+ << bcf_sr_strerror(sr->errnum) << endl;
+ exit(1);
+ }
+
+ bcf_hdr_t *ref_hdr = bcf_sr_get_header(sr, 0);
+ bcf_hdr_t *tgt_hdr = bcf_sr_get_header(sr, 1);
+
+ // Open VCF for writing, "-" stands for standard output
+ // wbu .. uncompressed BCF
+ // wb .. compressed BCF
+ // wz .. compressed VCF
+ // w .. uncompressed VCF
+ htsFile *out = hts_open(tmpFile.c_str(), writeMode.c_str());
+
+ // Print the VCF header
+ bcf_hdr_write(out, tgt_hdr);
+
+ Nref = bcf_hdr_nsamples(ref_hdr);
+ Ntarget = bcf_hdr_nsamples(tgt_hdr);
+ conPSall.resize(Ntarget);
+ std::map <int, int> bpToSyncedIndex1; // bp -> 1-based index (m+1)
+
+ // Read target sample IDs
+ targetIDs.resize(Ntarget);
+ for (uint i = 0; i < Ntarget; i++)
+ targetIDs[i] = tgt_hdr->samples[i];
+
+ cout << endl;
+ cout << "Reference samples: Nref = " << Nref << endl;
+ cout << "Target samples: Ntarget = " << Ntarget << endl;
+
+ M = 0;
+ uint MtargetOnly = 0, MrefOnly = 0, MmultiAllelic = 0, Mmonomorphic = 0;
+ uint MwithMissingRef = 0, MwithUnphasedRef = 0, MnotInRegion = 0, MnotOnChrom = 0;
+ uint MrefAltError = 0, numRefAltSwaps = 0;
+ uint64 GmissingRef = 0, GunphasedRef = 0, GmissingTarget = 0;
+ uint w = 521288629; // fast rng: Marsaglia's MWC
+
+ int mref_gt = 0, *ref_gt = NULL;
+ int mtgt_gt = 0, *tgt_gt = NULL;
+ int mtgt_ps = 0, *tgt_ps = NULL; int Mps = 0; uint64 err_ps = 0, good_ps = 0;
+ int prev_rid = -1; // chromosome BCF id and human-readable numeric id
+ while ( bcf_sr_next_line(sr) )
+ {
+ bcf1_t *ref = bcf_sr_get_line(sr, 0);
+ bcf1_t *tgt = bcf_sr_get_line(sr, 1);
+ if ( !ref ) {
+ //fprintf(stderr, "onlyT .. %s:%d\n", bcf_seqname(tgt_hdr, tgt), tgt->pos+1);
+ MtargetOnly++;
+ continue;
+ }
+ if ( !tgt ) {
+ //fprintf(stderr, "onlyR .. %s:%d\n", bcf_seqname(ref_hdr, ref), ref->pos+1);
+ MrefOnly++;
+ continue;
+ }
+ //fprintf(stderr, "match .. %s:%d\n", bcf_seqname(ref_hdr, ref), ref->pos+1);
+
+ // filter out multi-allelic and monomorphic markers
+ int ntgt_gt = bcf_get_genotypes(tgt_hdr, tgt, &tgt_gt, &mtgt_gt);
+ if (tgt->n_allele > 2) {
+ MmultiAllelic++;
+ continue;
+ }
+ if (tgt->n_allele < 2) {
+ Mmonomorphic++;
+ continue;
+ }
+
+ bool refAltSwap = false;
+
+ if (allowRefAltSwap) { // perform further error-checking
+ if (tgt->n_allele != 2 || ref->n_allele != 2) {
+ MrefAltError++;
+ continue;
+ }
+ bcf_unpack(tgt, BCF_UN_STR); // unpack thru ALT
+ bcf_unpack(ref, BCF_UN_STR); // unpack thru ALT
+ /*
+ printf("tgt REF=%s, ALT=%s ref REF=%s, ALT=%s\n", tgt->d.allele[0], tgt->d.allele[1],
+ ref->d.allele[0], ref->d.allele[1]);
+ */
+ if (strcmp(tgt->d.allele[0], ref->d.allele[0]) == 0 &&
+ strcmp(tgt->d.allele[1], ref->d.allele[1]) == 0) {
+ refAltSwap = false;
+ }
+ else if (strcmp(tgt->d.allele[0], ref->d.allele[1]) == 0 &&
+ strcmp(tgt->d.allele[1], ref->d.allele[0]) == 0) {
+ refAltSwap = true;
+ numRefAltSwaps++;
+ }
+ else {
+ MrefAltError++;
+ continue;
+ }
+ }
+
+ // Check the chromosome: if region was requested (chrom is set), synced
+ // reader already positioned us in the right region. Otherwise, we process
+ // only the first chromosome in the file and quit
+ if ( prev_rid<0 )
+ {
+ prev_rid = tgt->rid;
+ if ( !chrom ) // learn the human-readable id
+ {
+ chrom = StringUtils::bcfNameToChrom(bcf_hdr_id2name(tgt_hdr, tgt->rid), 1, chromX);
+ }
+ }
+ if ( prev_rid!=tgt->rid ) break;
+
+ M++; // SNP passes checks
+ bpToSyncedIndex1[tgt->pos+1] = M; // TODO: be careful about duplicate bp (multiallelics?)
+
+ // append chromosome number and base pair coordinate to chrBps
+ chrBps.push_back(make_pair(chrom, tgt->pos+1));
+
+ // process reference haplotypes: append 2*Nref entries (0/1 pairs) to hapsRef[]
+ // check for missing/unphased ref genos (missing -> REF allele; unphased -> random phase)
+ int nref_gt = bcf_get_genotypes(ref_hdr, ref, &ref_gt, &mref_gt);
+ int numMissing, numUnphased;
+ process_ref_genotypes(Nref, nref_gt, ref_gt, chrom==chromX, refAltSwap, hapsRef,
+ numMissing, numUnphased, w);
+ if (numMissing) MwithMissingRef++;
+ if (numUnphased) MwithUnphasedRef++;
+ GmissingRef += numMissing;
+ GunphasedRef += numUnphased;
+
+ // process target genotypes: append Ntarget entries (0/1/2/9) to genosTarget[]
+ process_target_genotypes(Ntarget, ntgt_gt, tgt_gt, chrom==chromX, genosTarget, numMissing);
+ GmissingTarget += numMissing;
+
+ // process target PS field
+ if (usePS && bcf_get_format_int32(tgt_hdr, tgt, "PS", &tgt_ps, &mtgt_ps) >= 0) {
+ Mps++;
+ for (uint i = 0; i < Ntarget; i++)
+ if (tgt_ps[i] != bcf_int32_missing) {
+ std::map <int, int>::iterator it = bpToSyncedIndex1.find(abs(tgt_ps[i]));
+ if (it == bpToSyncedIndex1.end() ||
+ genosTarget[(M-1)*Ntarget + i] != 1 ||
+ genosTarget[(it->second-1)*Ntarget + i] != 1)
+ err_ps++;
+ else {
+ conPSall[i].push_back(make_pair((int) M, it->second * (tgt_ps[i]>0 ? 1 : -1)));
+ good_ps++;
+ }
+ }
+ }
+
+ // print the record
+ bcf_write(out, tgt_hdr, tgt);
+ }
+
+ bcf_sr_destroy(sr);
+ hts_close(out);
+ free(ref_gt);
+ free(tgt_gt);
+
+ cout << "SNPs to analyze: M = " << M << " SNPs in both target and reference" << endl;
+ if (Mps) {
+ cout << " " << Mps << " SNPs with FORMAT:PS field" << endl;
+ cout << good_ps << " usable FORMAT:PS constraints" << endl;
+ cout << err_ps << " unusable FORMAT:PS constraints" << endl;
+ }
+ if (numRefAltSwaps)
+ cerr << "--> WARNING: REF/ALT were swapped in " << numRefAltSwaps << " of these SNPs <--"
+ << endl;
+ cout << endl;
+ cout << "SNPs ignored: " << MtargetOnly << " SNPs in target but not reference" << endl;
+ if (MtargetOnly > M/10U)
+ cerr << " --> WARNING: Check REF/ALT agreement between target and ref <--"
+ << endl;
+ cout << " " << MrefOnly << " SNPs in reference but not target" << endl;
+ if (MnotOnChrom)
+ cout << " " << MnotOnChrom << " SNPs not in specified chrom" << endl;
+ if (MnotInRegion)
+ cout << " " << MnotInRegion << " SNPs not in selected region (+ flanks)"
+ << endl;
+ cout << " " << MmultiAllelic << " multi-allelic SNPs" << endl;
+ cout << " " << Mmonomorphic << " monomorphic SNPs" << endl;
+ if (MrefAltError)
+ cout << " " << MrefAltError << " SNPs with REF/ALT matching errors" << endl;
+ cout << endl;
+
+ if (MwithMissingRef) {
+ cerr << "WARNING: Reference contains missing genotypes (set to reference allele)" << endl;
+ cerr << " Fraction of sites with missing data: "
+ << MwithMissingRef / (double) M << endl;
+ cerr << " Fraction of ref genotypes missing: "
+ << GmissingRef / (double) M / Nref << endl;
+ }
+ if (MwithUnphasedRef) {
+ cerr << "WARNING: Reference contains unphased genotypes (set to random phase)" << endl;
+ cerr << " Fraction of sites with unphased data: "
+ << MwithUnphasedRef / (double) M << endl;
+ cerr << " Fraction of ref genotypes unphased: "
+ << GunphasedRef / (double) M / Nref << endl;
+ }
+ cout << "Missing rate in target genotypes: " << GmissingTarget / (double) M / Ntarget << endl;
+ cout << endl;
+
+ if (M <= 1U) {
+ cerr << endl << "ERROR: Target and ref have too few matching SNPs (M = " << M << ")" << endl;
+ exit(1);
+ }
+
+ return chrBps;
+ }
+
+ vector <double> SyncedVcfData::processMap(vector < pair <int, int> > &chrBps,
+ const string &geneticMapFile) {
+ cout << "Filling in genetic map coordinates using reference file:" << endl;
+ cout << " " << geneticMapFile << endl;
+ Genetics::MapInterpolater mapInterpolater(geneticMapFile);
+ vector <double> cMs(chrBps.size());
+ for (uint64 m = 0; m < chrBps.size(); m++)
+ cMs[m] = 100 * mapInterpolater.interp(chrBps[m].first, chrBps[m].second);
+ return cMs;
+ }
+
+ void SyncedVcfData::buildGenoBits(const vector <bool> &hapsRef,
+ const vector <uchar> &genosTarget, const vector <double> &cMs,
+ double cMmax) {
+ const uint segMin = 16;
+ vector <uint64> snpInds; vector <double> cMvec;
+ vector < vector <uint64> > seg64snpInds;
+ for (uint64 m = 0; m < M; m++) {
+ if (cMvec.size() == 64 || (cMvec.size() >= segMin && cMs[m] > cMvec[0] + cMmax)) {
+ seg64snpInds.push_back(snpInds); seg64cMvecs.push_back(cMvec);
+ snpInds.clear(); cMvec.clear();
+ }
+ snpInds.push_back(m); cMvec.push_back(cMs[m]);
+ }
+ seg64snpInds.push_back(snpInds); seg64cMvecs.push_back(cMvec);
+
+ Mseg64 = seg64snpInds.size();
+ cout << "Number of <=(64-SNP, " << cMmax << "cM) segments: " << Mseg64 << endl;
+ cout << "Average # SNPs per segment: " << M / Mseg64 << endl;
+
+ uint64 N = Nref + Ntarget;
+ genoBits = ALIGNED_MALLOC_UINT64_MASKS(Mseg64 * N);
+ memset(genoBits, 0, Mseg64 * N * sizeof(genoBits[0]));
+
+ for (uint64 m64 = 0; m64 < Mseg64; m64++) {
+ for (uint64 j = 0; j < seg64snpInds[m64].size(); j++) {
+ uint64 m = seg64snpInds[m64][j];
+ for (uint64 n = 0; n < Nref; n++) { // store haploBits for ref haplotypes in genoBits
+ bool haps0 = hapsRef[m * 2*Nref + 2*n];
+ bool haps1 = hapsRef[m * 2*Nref + 2*n+1];
+ genoBits[m64 * N + n].is0 |= ((uint64) haps0)<<j;
+ genoBits[m64 * N + n].is2 |= ((uint64) haps1)<<j;
+ }
+ for (uint64 n = Nref; n < N; n++) { // set genoBits for target genotypes
+ uchar geno = genosTarget[m * Ntarget + n-Nref];
+ genoBits[m64 * N + n].is0 |= ((uint64) (geno == 0))<<j;
+ genoBits[m64 * N + n].is2 |= ((uint64) (geno == 2))<<j;
+ genoBits[m64 * N + n].is9 |= ((uint64) (geno == 9))<<j;
+ }
+ }
+ for (uint64 n = 0; n < N; n++)
+ for (uint64 j = seg64snpInds[m64].size(); j < 64; j++)
+ genoBits[m64*N+n].is9 |= 1ULL<<j;
+ }
+ }
+
+ /**
+ * reads ref+target vcf data
+ * writes target[isec] to tmpFile
+ * fills in cM coordinates and seg64cMvecs, genoBits
+ */
+ SyncedVcfData::SyncedVcfData(const string &vcfRef, const string &vcfTarget, bool allowRefAltSwap,
+ int chrom, int chromX, double bpStart, double bpEnd,
+ const string &geneticMapFile, double cMmax, const string &tmpFile,
+ const string &writeMode, int usePS,
+ vector < vector < pair <int, int> > > &conPSall, double &snpRate) {
+
+ // perform synced read
+ vector <bool> hapsRef; // M*2*Nref
+ vector <uchar> genosTarget; // M*Ntarget
+ vector < pair <int, int> > chrBps =
+ processVcfs(vcfRef, vcfTarget, allowRefAltSwap, chrom, chromX, bpStart, bpEnd, hapsRef,
+ genosTarget, tmpFile, writeMode, usePS, conPSall);
+
+ // interpolate genetic coordinates
+ vector <double> cMs = processMap(chrBps, geneticMapFile);
+
+ uint64 physRange = 0; double cMrange = 0;
+ for (uint64 m = 0; m+1 < chrBps.size(); m++)
+ if (chrBps[m+1].first == chrBps[m].first) {
+ physRange += chrBps[m+1].second - chrBps[m].second;
+ cMrange += cMs[m+1] - cMs[m];
+ }
+ cout << "Physical distance range: " << physRange << " base pairs" << endl;
+ cout << "Genetic distance range: " << cMrange << " cM" << endl;
+ cout << "Average # SNPs per cM: " << (int) (M/cMrange + 0.5) << endl;
+ snpRate = M/cMrange;
+
+ if (physRange == 0 || cMrange == 0) {
+ cerr << "ERROR: Physical and genetic distance ranges must be positive" << endl;
+ cerr << " First SNP: chr=" << chrBps[0].first << " pos=" << chrBps[0].second
+ << " cM=" << cMs[0] << endl;
+ cerr << " Last SNP: chr=" << chrBps.back().first << " pos=" << chrBps.back().second
+ << " cM=" << cMs.back() << endl;
+ exit(1);
+ }
+
+ buildGenoBits(hapsRef, genosTarget, cMs, cMmax);
+ }
+
+ SyncedVcfData::~SyncedVcfData() {
+ ALIGNED_FREE(genoBits);
+ }
+
+ uint64 SyncedVcfData::getNref(void) const { return Nref; }
+ uint64 SyncedVcfData::getNtarget(void) const { return Ntarget; }
+ uint64 SyncedVcfData::getMseg64(void) const { return Mseg64; }
+ const uint64_masks *SyncedVcfData::getGenoBits(void) const { return genoBits; }
+ vector <vector <double> > SyncedVcfData::getSeg64cMvecs(void) const { return seg64cMvecs; }
+ const string &SyncedVcfData::getTargetID(int n) const { return targetIDs[n]; }
+
+};
diff --git a/src/SyncedVcfData.hpp b/src/SyncedVcfData.hpp
new file mode 100644
index 0000000..99dcd73
--- /dev/null
+++ b/src/SyncedVcfData.hpp
@@ -0,0 +1,76 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef SYNCEDVCFDATA_HPP
+#define SYNCEDVCFDATA_HPP
+
+#include <vector>
+#include <string>
+#include <utility>
+#include <boost/utility.hpp>
+
+#include "Types.hpp"
+
+namespace EAGLE {
+
+ class SyncedVcfData : boost::noncopyable {
+
+ private:
+ uint64 Nref, Ntarget, M; // N = Nref + Ntarget
+ uint64 Mseg64; // number of <=64-SNP chunks
+ uint64_masks *genoBits; // [[MATRIX]]: M64 x N (is0, is2, is9 64-bit masks; 3 bits/base)
+ // n = [0..Nref) contain ref haploBits in is0, is2
+ // n = [Nref..Nref+Ntarget) contain target genoBits
+ std::vector <std::vector <double> > seg64cMvecs;
+ std::vector <std::string> targetIDs;
+
+ std::vector < std::pair <int, int> > processVcfs
+ (const std::string &vcfRef, const std::string &vcfTarget, bool allowRefAltSwap, int chrom,
+ int chromX, double bpStart, double bpEnd, std::vector <bool> &hapsRef,
+ std::vector <uchar> &genosTarget, const std::string &tmpFile, const std::string &writeMode,
+ int usePS, std::vector < std::vector < std::pair <int, int> > > &conPSall);
+ std::vector <double> processMap(std::vector < std::pair <int, int> > &chrBps,
+ const std::string &geneticMapFile);
+ void buildGenoBits(const std::vector <bool> &hapsRef, const std::vector <uchar> &genosTarget,
+ const std::vector <double> &cMs, double cMmax);
+
+ public:
+ /**
+ * reads ref+target vcf data
+ * writes target[isec] to tmpFile
+ * fills in cM coordinates and seg64cMvecs, genoBits
+ */
+ SyncedVcfData(const std::string &vcfRef, const std::string &vcfTarget, bool allowRefAltSwap,
+ int chrom, int chromX, double bpStart, double bpEnd,
+ const std::string &geneticMapFile, double cMmax, const std::string &tmpFile,
+ const std::string &writeMode, int usePS,
+ std::vector < std::vector < std::pair <int, int> > > &conPSall, double &snpRate);
+
+ ~SyncedVcfData();
+
+ uint64 getNref(void) const;
+ uint64 getNtarget(void) const;
+ uint64 getMseg64(void) const;
+ const uint64_masks *getGenoBits(void) const;
+ std::vector <std::vector <double> > getSeg64cMvecs(void) const;
+ const std::string &getTargetID(int n) const;
+
+ };
+}
+
+#endif
diff --git a/src/Timer.cpp b/src/Timer.cpp
new file mode 100644
index 0000000..c2ce08e
--- /dev/null
+++ b/src/Timer.cpp
@@ -0,0 +1,46 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <cstdlib>
+#include <sys/time.h>
+
+#include "Timer.hpp"
+
+Timer::Timer(void) {
+ update_time();
+}
+
+double Timer::update_time(void) {
+ struct timeval tv;
+ gettimeofday(&tv, NULL);
+ prevtime = curtime;
+ curtime = tv.tv_sec + 1e-6 * tv.tv_usec;
+ return curtime - prevtime;
+}
+
+unsigned long long Timer::rdtsc(void) {
+ unsigned int hi, lo;
+ __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
+ return ((unsigned long long) lo) | (((unsigned long long) hi)<<32);
+}
+
+double Timer::get_time(void) {
+ struct timeval tv;
+ gettimeofday(&tv, NULL);
+ return tv.tv_sec + 1e-6 * tv.tv_usec;
+}
diff --git a/src/Timer.hpp b/src/Timer.hpp
new file mode 100644
index 0000000..292f067
--- /dev/null
+++ b/src/Timer.hpp
@@ -0,0 +1,34 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef TIMER_HPP
+#define TIMER_HPP
+
+class Timer {
+private:
+ double prevtime, curtime;
+
+public:
+ static unsigned long long rdtsc(void);
+
+ Timer(void);
+ double update_time(void);
+ double get_time(void);
+};
+
+#endif
diff --git a/src/Types.hpp b/src/Types.hpp
new file mode 100644
index 0000000..eeb648b
--- /dev/null
+++ b/src/Types.hpp
@@ -0,0 +1,31 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef TYPES_HPP
+#define TYPES_HPP
+
+typedef unsigned char uchar;
+typedef unsigned int uint;
+typedef unsigned long long uint64;
+typedef long long int64;
+
+struct uint64_masks {
+ uint64 is0, is2, is9;
+};
+
+#endif
diff --git a/src/Version.hpp b/src/Version.hpp
new file mode 100644
index 0000000..d527e6f
--- /dev/null
+++ b/src/Version.hpp
@@ -0,0 +1,25 @@
+/*
+ This file is part of the Eagle haplotype phasing software package
+ developed by Po-Ru Loh. Copyright (C) 2015-2016 Harvard University.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef VERSION_HPP
+#define VERSION_HPP
+
+const char EAGLE_VERSION[] = "2.3";
+const char EAGLE_VERSION_DATE[] = "July 22, 2016";
+
+#endif
diff --git a/tables/README.txt b/tables/README.txt
new file mode 100644
index 0000000..7ee5712
--- /dev/null
+++ b/tables/README.txt
@@ -0,0 +1,3 @@
+NOTE: This directory contains only a small piece of a genetic map to run the provided example. Full genetic maps can be downloaded at:
+
+http://data.broadinstitute.org/alkesgroup/Eagle/downloads/tables/
diff --git a/tables/genetic_map_hg19_example.txt.gz b/tables/genetic_map_hg19_example.txt.gz
new file mode 100644
index 0000000..2616832
Binary files /dev/null and b/tables/genetic_map_hg19_example.txt.gz differ
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/eagle.git
More information about the debian-med-commit
mailing list