[med-svn] [hisat2] 03/07: New upstream version 2.0.5

Andreas Tille tille at debian.org
Fri Dec 9 13:02:02 UTC 2016


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository hisat2.

commit b84b3bcdd39d0bfd4df24436237ca5a0e800f1e8
Author: Andreas Tille <tille at debian.org>
Date:   Fri Dec 9 13:44:23 2016 +0100

    New upstream version 2.0.5
---
 MANUAL                                             |   86 +-
 MANUAL.markdown                                    |  129 +-
 VERSION                                            |    2 +-
 aligner_seed_policy.cpp                            |    2 +-
 aln_sink.h                                         |    8 +-
 doc/manual.inc.html                                |   94 +-
 gfm.h                                              |   12 +-
 gp.h                                               |   65 +
 hi_aligner.h                                       |  483 ++-
 hisat2.cpp                                         |  146 +-
 hisat2_build_genotype_genome.py                    |   89 +-
 hisat2_inspect.cpp                                 |    3 +-
 hisat2_test_HLA_genotyping.py                      | 3547 ++++++++++++++------
 hisat_bp.cpp                                       |    4 +-
 ...ct_HLA_vars.py => hisatgenotype_extract_vars.py |  587 +++-
 hisatgenotype_typing.py                            | 1688 ++++++++++
 ...otyping.py => old_hisat2_test_HLA_genotyping.py |   83 +-
 opts.h                                             |    3 +
 pat.cpp                                            |   16 +
 ref_read.cpp                                       |   14 +
 sam.h                                              |    2 +-
 spliced_aligner.h                                  |  324 +-
 tp.h                                               |   69 +-
 23 files changed, 5519 insertions(+), 1937 deletions(-)

diff --git a/MANUAL b/MANUAL
index 9ca02fe..ba1e8d6 100644
--- a/MANUAL
+++ b/MANUAL
@@ -60,7 +60,7 @@ build since it has some clear advantages in real life research problems. In orde
 to simplify the MinGW setup it might be worth investigating popular MinGW personal 
 builds since these are coming already prepared with most of the toolchains needed.
 
-First, download the [source package] from the Releases secion on the right side.
+First, download the [source package] from the Releases section on the right side.
 Unzip the file, change to the unzipped directory, and build the
 HISAT2 tools by running GNU `make` (usually with the command `make`, but
 sometimes with `gmake`) with no arguments.  If building with MinGW, run `make`
@@ -122,6 +122,8 @@ In general, when we say that a read has an alignment, we mean that it has a
 [valid alignment].  When we say that a read has multiple alignments, we mean
 that it has multiple alignments that are valid and distinct from one another. 
 
+By default, HISAT2 may soft-clip reads near their 5' and 3' ends.  Users can control this behavior by setting different penalties for soft-clipping (`--sp`) or by disallowing soft-clipping (`--no-softclip`).
+
 ### Distinct alignments map a read to different places
 
 Two alignments for the same individual read are "distinct" if they map the same
@@ -139,18 +141,6 @@ Two alignments for the same pair are distinct if either the mate 1s in the two
 paired-end alignments are distinct or the mate 2s in the two alignments are
 distinct or both.
 
-### Default mode: search for multiple alignments, report the best one
-
-By default, HISAT2 searches for distinct, valid alignments for each read. When
-it finds a valid alignment, it generally will continue to look for alignments
-that are nearly as good or better.  It will eventually stop looking, either
-because it exceeded a limit placed on search effort (see `-D` and `-R`) or
-because it already knows all it needs to know to report an alignment.
-Information from the best alignments are used to estimate mapping quality (the
-`MAPQ` [SAM] field) and to set SAM optional fields, such as `AS:i` and
-`XS:i`.  HISAT2 does not gaurantee that the alignment reported is the best
-possible in terms of alignment score.
-
 ### Default mode: search for one or more alignments, report each
 
 HISAT2 searches for up to N distinct, primary alignments for
@@ -164,7 +154,7 @@ beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS
 field.  See the [SAM specification] for details.
 
 HISAT2 does not "find" alignments in any specific order, so for reads that
-have more than N distinct, valid alignments, HISAT2 does not gaurantee that
+have more than N distinct, valid alignments, HISAT2 does not guarantee that
 the N alignments reported are the best possible in terms of alignment score.
 Still, this mode can be effective and fast in situations where the user cares
 more about whether a read aligns (or aligns a certain number of times) than
@@ -172,7 +162,7 @@ where exactly it originated.
 
 [SAM specification]: http://samtools.sourceforge.net/SAM1.pdf
 
-Alignment summmary
+Alignment summary
 ------------------
 
 When HISAT2 finishes running, it prints messages summarizing what happened. 
@@ -214,7 +204,7 @@ wrapper scripts that call binary programs as appropriate.  The wrappers shield
 users from having to distinguish between "small" and "large" index formats,
 discussed briefly in the following section.  Also, the `hisat2` wrapper
 provides some key functionality, like the ability to handle compressed inputs,
-and the fucntionality for `--un`, `--al` and related options.
+and the functionality for `--un`, `--al` and related options.
 
 It is recommended that you always run the hisat2 wrappers and not run the
 binaries directly.
@@ -446,6 +436,10 @@ subtracted from the alignment score for each position.
 The number subtracted is `MN + floor( (MX-MN)(MIN(Q, 40.0)/40.0) )`
 where Q is the Phred quality value.  Default: `MX` = 2, `MN` = 1.
 
+    --no-softclip
+
+Disallow soft-clipping.
+
     --np <int>
 
 Sets penalty for positions where the read, reference, or both, contain an
@@ -548,7 +542,7 @@ Report only those alignments within known transcripts.
 Report alignments tailored for transcript assemblers including StringTie.
 With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites.
 This leads to fewer alignments with short-anchors, 
-which helps transcript assemblers improve significantly in computationa and memory usage.
+which helps transcript assemblers improve significantly in computation and memory usage.
 
     --dta-cufflinks
 
@@ -560,18 +554,23 @@ HISAT2 produces an optional field, XS:A:[+-], for every spliced alignment.
 
     -k <int>
 
-It searches for at most `<int>` distinct, primary alignments for each read.  Primary alignments mean alignments whose alignment score is equal or higher than any other alignments.
+It searches for at most `<int>` distinct, primary alignments for each read.
+Primary alignments mean alignments whose alignment score is equal or higher than any other alignments.
 The search terminates when it can't find more distinct valid alignments, or when it
 finds `<int>`, whichever happens first. The alignment score for a paired-end
 alignment equals the sum of the alignment scores of the individual mates. Each
 reported read or pair alignment beyond the first has the SAM 'secondary' bit
 (which equals 256) set in its FLAGS field.  For reads that have more than
-`<int>` distinct, valid alignments, `hisat2` does not gaurantee that the
+`<int>` distinct, valid alignments, `hisat2` does not guarantee that the
 `<int>` alignments reported are the best possible in terms of alignment score. Default: 5 (HFM) or 10 (HGFM)
 
 Note: HISAT2 is not designed with large values for `-k` in mind, and when
 aligning reads to long, repetitive genomes large `-k` can be very, very slow.
 
+    --max-seeds <int>
+
+HISAT2, like other aligners, uses seed-and-extend approaches.  HISAT2 tries to extend seeds to full-length alignments. In HISAT2, --max-seeds is used to control the maximum number of seeds that will be extended. HISAT2 extends up to these many seeds and skips the rest of the seeds. Large values for `--max-seeds` may improve alignment sensitivity, but HISAT2 is not designed with large values for `--max-seeds` in mind, and when aligning reads to long, repetitive genomes large `--max-seeds` [...]
+
     --secondary
 
 Report secondary alignments.
@@ -580,15 +579,15 @@ Report secondary alignments.
 
     -I/--minins <int>
 
-The minimum fragment length for valid paired-end alignments.  E.g. if `-I 60` is
-specified and a paired-end alignment consists of two 20-bp alignments in the
+The minimum fragment length for valid paired-end alignments.This option is valid only with --no-spliced-alignment.
+E.g. if `-I 60` is specified and a paired-end alignment consists of two 20-bp alignments in the
 appropriate orientation with a 20-bp gap between them, that alignment is
 considered valid (as long as `-X` is also satisfied).  A 19-bp gap would not
 be valid in that case.  If trimming options `-3` or `-5` are also used, the
 `-I` constraint is applied with respect to the untrimmed mates.
 
 The larger the difference between `-I` and `-X`, the slower HISAT2 will
-run.  This is because larger differences bewteen `-I` and `-X` require that
+run.  This is because larger differences between `-I` and `-X` require that
 HISAT2 scan a larger window to determine if a concordant alignment exists.
 For typical fragment length ranges (200 to 400 nucleotides), HISAT2 is very
 efficient.
@@ -597,8 +596,8 @@ Default: 0 (essentially imposing no minimum)
 
     -X/--maxins <int>
 
-The maximum fragment length for valid paired-end alignments.  E.g. if `-X 100`
-is specified and a paired-end alignment consists of two 20-bp alignments in the
+The maximum fragment length for valid paired-end alignments.  This option is valid only with --no-spliced-alignment.
+E.g. if `-X 100` is specified and a paired-end alignment consists of two 20-bp alignments in the
 proper orientation with a 60-bp gap between them, that alignment is considered
 valid (as long as `-I` is also satisfied).  A 61-bp gap would not be valid in
 that case.  If trimming options `-3` or `-5` are also used, the `-X`
@@ -606,7 +605,7 @@ constraint is applied with respect to the untrimmed mates, not the trimmed
 mates.
 
 The larger the difference between `-I` and `-X`, the slower HISAT2 will
-run.  This is because larger differences bewteen `-I` and `-X` require that
+run.  This is because larger differences between `-I` and `-X` require that
 HISAT2 scan a larger window to determine if a concordant alignment exists.
 For typical fragment length ranges (200 to 400 nucleotides), HISAT2 is very
 efficient.
@@ -639,25 +638,6 @@ concordant alignments.  A discordant alignment is an alignment where both mates
 align uniquely, but that does not satisfy the paired-end constraints
 (`--fr`/`--rf`/`--ff`, `-I`, `-X`).  This option disables that behavior.
 
-    --dovetail
-
-If the mates "dovetail", that is if one mate alignment extends past the
-beginning of the other such that the wrong mate begins upstream, consider that
-to be concordant.  See also: [Mates can overlap, contain or dovetail each
-other].  Default: mates cannot dovetail in a concordant alignment.
-
-    --no-contain
-
-If one mate alignment contains the other, consider that to be non-concordant.
-See also: [Mates can overlap, contain or dovetail each other].  Default: a mate
-can contain the other in a concordant alignment.
-
-    --no-overlap
-
-If one mate alignment overlaps the other at all, consider that to be
-non-concordant.  See also: [Mates can overlap, contain or dovetail each other]. 
-Default: mates can overlap in a concordant alignment.
-
 #### Output options
 
     -t/--time
@@ -784,7 +764,7 @@ Add 'chr' to reference names in alignment (e.g., 18 to chr18)
     --omit-sec-seq
 
 When printing secondary alignments, HISAT2 by default will write out the `SEQ`
-and `QUAL` strings.  Specifying this option causes HISAT2 to print an asterix
+and `QUAL` strings.  Specifying this option causes HISAT2 to print an asterisk
 in those fields instead.
 
 #### Performance options
@@ -870,7 +850,7 @@ When one or more `--rg` arguments are specified, `hisat2` will also print
 an `@RG` line that includes all user-specified `--rg` tokens separated by
 tabs.
 
-Each subsequnt line describes an alignment or, if the read failed to align, a
+Each subsequent line describes an alignment or, if the read failed to align, a
 read.  Each line is a collection of at least 12 fields separated by tabs; from
 left to right, the fields are:
 
@@ -1116,7 +1096,7 @@ a comma-separated list of sequences rather than a list of FASTA files.
     --large-index
 
 Force `hisat2-build` to build a [large index], even if the reference is less
-than ~ 4 billion nucleotides inlong.
+than ~ 4 billion nucleotides long.
 
     -a/--noauto
 
@@ -1206,7 +1186,7 @@ Launch `NTHREADS` parallel build threads (default: 1).
 
 Provide a list of SNPs (in the HISAT2's own format) as follows (five columns).
    
-   SNP ID `<tab>` chromosome name `<tab>` snp type (single, deletion, or insertion) `<tab>` zero-offset based genomic position of a SNP `<tab>` alternative base (single), the length of SNP (deletion), or insertion sequence (insertion)
+   SNP ID `<tab>` snp type (single, deletion, or insertion) `<tab>` chromosome name `<tab>` zero-offset based genomic position of a SNP `<tab>` alternative base (single), the length of SNP (deletion), or insertion sequence (insertion)
    
    For example,
        rs58784443      single  13      18447947        T
@@ -1227,7 +1207,7 @@ See the above option, --snp, about how to extract haplotypes.  This option is no
 
     --ss <path>
 
-Note this option should be used with the followig --exon option.
+Note this option should be used with the following --exon option.
 Provide a list of splice sites (in the HISAT2's own format) as follows (four columns).
    
    chromosome name `<tab>` zero-offset based genomic position of the flanking base on the left side of an intron `<tab>` zero-offset based genomic position of the flanking base on the right `<tab>` strand
@@ -1390,7 +1370,7 @@ Stay in the directory created in the previous step, which now contains the
     $HISAT2_HOME/hisat2 -f -x $HISAT2_HOME/example/index/22_20-21M_snp -U $HISAT2_HOME/example/reads/reads_1.fa -S eg1.sam
 
 This runs the HISAT2 aligner, which aligns a set of unpaired reads to the
-the genome region using the index generated in the previous step.
+genome region using the index generated in the previous step.
 The alignment results in SAM format are written to the file `eg1.sam`, and a
 short alignment summary is written to the console.  (Actually, the summary is
 written to the "standard error" or "stderr" filehandle, which is typically
@@ -1452,13 +1432,13 @@ binary format corresponding to the SAM text format.  Run:
 
     samtools view -bS eg2.sam > eg2.bam
 
-Use `samtools sort` to convert the BAM file to a sorted BAM file.
+Use `samtools sort` to convert the BAM file to a sorted BAM file. The following command requires samtools version 1.2 or higher.
 
-    samtools sort eg2.bam eg2.sorted
+    samtools sort eg2.bam -o eg2.sorted.bam
 
 We now have a sorted BAM file called `eg2.sorted.bam`. Sorted BAM is a useful
 format because the alignments are (a) compressed, which is convenient for
-long-term storage, and (b) sorted, which is conveneint for variant discovery.
+long-term storage, and (b) sorted, which is convenient for variant discovery.
 To generate variant calls in VCF format, run:
 
     samtools mpileup -uf $HISAT2_HOME/example/reference/22_20-21M.fa eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf
diff --git a/MANUAL.markdown b/MANUAL.markdown
index d45ed2f..e5f657e 100644
--- a/MANUAL.markdown
+++ b/MANUAL.markdown
@@ -66,7 +66,7 @@ build since it has some clear advantages in real life research problems. In orde
 to simplify the MinGW setup it might be worth investigating popular MinGW personal 
 builds since these are coming already prepared with most of the toolchains needed.
 
-First, download the [source package] from the Releases secion on the right side.
+First, download the [source package] from the Releases section on the right side.
 Unzip the file, change to the unzipped directory, and build the
 HISAT2 tools by running GNU `make` (usually with the command `make`, but
 sometimes with `gmake`) with no arguments.  If building with MinGW, run `make`
@@ -130,6 +130,8 @@ that it has multiple alignments that are valid and distinct from one another.
 
 [valid alignment]: #valid-alignments-meet-or-exceed-the-minimum-score-threshold
 
+By default, HISAT2 may soft-clip reads near their 5' and 3' ends.  Users can control this behavior by setting different penalties for soft-clipping ([`--sp`]) or by disallowing soft-clipping ([`--no-softclip`]).
+
 ### Distinct alignments map a read to different places
 
 Two alignments for the same individual read are "distinct" if they map the same
@@ -147,20 +149,6 @@ Two alignments for the same pair are distinct if either the mate 1s in the two
 paired-end alignments are distinct or the mate 2s in the two alignments are
 distinct or both.
 
-<!--
-### Default mode: search for multiple alignments, report the best one
-
-By default, HISAT2 searches for distinct, valid alignments for each read. When
-it finds a valid alignment, it generally will continue to look for alignments
-that are nearly as good or better.  It will eventually stop looking, either
-because it exceeded a limit placed on search effort (see [`-D`] and [`-R`]) or
-because it already knows all it needs to know to report an alignment.
-Information from the best alignments are used to estimate mapping quality (the
-`MAPQ` [SAM] field) and to set SAM optional fields, such as [`AS:i`] and
-[`XS:i`].  HISAT2 does not gaurantee that the alignment reported is the best
-possible in terms of alignment score.
--->
-
 ### Default mode: search for one or more alignments, report each
 
 HISAT2 searches for up to N distinct, primary alignments for
@@ -174,7 +162,7 @@ beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS
 field.  See the [SAM specification] for details.
 
 HISAT2 does not "find" alignments in any specific order, so for reads that
-have more than N distinct, valid alignments, HISAT2 does not gaurantee that
+have more than N distinct, valid alignments, HISAT2 does not guarantee that
 the N alignments reported are the best possible in terms of alignment score.
 Still, this mode can be effective and fast in situations where the user cares
 more about whether a read aligns (or aligns a certain number of times) than
@@ -183,7 +171,7 @@ where exactly it originated.
 
 [SAM specification]: http://samtools.sourceforge.net/SAM1.pdf
 
-Alignment summmary
+Alignment summary
 ------------------
 
 When HISAT2 finishes running, it prints messages summarizing what happened. 
@@ -225,7 +213,7 @@ wrapper scripts that call binary programs as appropriate.  The wrappers shield
 users from having to distinguish between "small" and "large" index formats,
 discussed briefly in the following section.  Also, the `hisat2` wrapper
 provides some key functionality, like the ability to handle compressed inputs,
-and the fucntionality for [`--un`], [`--al`] and related options.
+and the functionality for [`--un`], [`--al`] and related options.
 
 It is recommended that you always run the hisat2 wrappers and not run the
 binaries directly.
@@ -636,6 +624,18 @@ The number subtracted is `MN + floor( (MX-MN)(MIN(Q, 40.0)/40.0) )`
 where Q is the Phred quality value.  Default: `MX` = 2, `MN` = 1.
 
 </td></tr>
+<tr><td id="hisat2-options-no-softclip">
+
+[`--sp`]: #hisat2-options-no-softclip
+
+    --no-softclip
+
+</td><td>
+
+Disallow soft-clipping.
+
+</td></tr>
+
 <tr><td id="hisat2-options-np">
 
 [`--np`]: #hisat2-options-np
@@ -876,7 +876,7 @@ Report only those alignments within known transcripts.
 Report alignments tailored for transcript assemblers including StringTie.
 With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites.
 This leads to fewer alignments with short-anchors, 
-which helps transcript assemblers improve significantly in computationa and memory usage.
+which helps transcript assemblers improve significantly in computation and memory usage.
 
 </td></tr>
 
@@ -907,20 +907,33 @@ HISAT2 produces an optional field, XS:A:[+-], for every spliced alignment.
 
 </td><td>
 
-It searches for at most `<int>` distinct, primary alignments for each read.  Primary alignments mean alignments whose alignment score is equal or higher than any other alignments.
+It searches for at most `<int>` distinct, primary alignments for each read.
+Primary alignments mean alignments whose alignment score is equal or higher than any other alignments.
 The search terminates when it can't find more distinct valid alignments, or when it
 finds `<int>`, whichever happens first. The alignment score for a paired-end
 alignment equals the sum of the alignment scores of the individual mates. Each
 reported read or pair alignment beyond the first has the SAM 'secondary' bit
 (which equals 256) set in its FLAGS field.  For reads that have more than
-`<int>` distinct, valid alignments, `hisat2` does not gaurantee that the
+`<int>` distinct, valid alignments, `hisat2` does not guarantee that the
 `<int>` alignments reported are the best possible in terms of alignment score. Default: 5 (HFM) or 10 (HGFM)
 
 Note: HISAT2 is not designed with large values for `-k` in mind, and when
 aligning reads to long, repetitive genomes large `-k` can be very, very slow.
 
 </td></tr>
-<tr><td id="hisat2-options-k">
+<tr><td id="hisat2-options-max-seeds">
+
+[`--max-seeds`]: #hisat2-options-max-seeds
+
+    --max-seeds <int>
+
+</td><td>
+
+HISAT2, like other aligners, uses seed-and-extend approaches.  HISAT2 tries to extend seeds to full-length alignments. In HISAT2, --max-seeds is used to control the maximum number of seeds that will be extended. HISAT2 extends up to these many seeds and skips the rest of the seeds. Large values for `--max-seeds` may improve alignment sensitivity, but HISAT2 is not designed with large values for `--max-seeds` in mind, and when aligning reads to long, repetitive genomes large `--max-seeds` [...]
+
+</td></tr>
+
+<tr><td id="hisat2-options-secondary">
 
 [`--secondary`]: #hisat2-options-secondary
 
@@ -947,15 +960,15 @@ Report secondary alignments.
 
 </td><td>
 
-The minimum fragment length for valid paired-end alignments.  E.g. if `-I 60` is
-specified and a paired-end alignment consists of two 20-bp alignments in the
+The minimum fragment length for valid paired-end alignments.This option is valid only with --no-spliced-alignment.
+E.g. if `-I 60` is specified and a paired-end alignment consists of two 20-bp alignments in the
 appropriate orientation with a 20-bp gap between them, that alignment is
 considered valid (as long as [`-X`] is also satisfied).  A 19-bp gap would not
 be valid in that case.  If trimming options [`-3`] or [`-5`] are also used, the
 [`-I`] constraint is applied with respect to the untrimmed mates.
 
 The larger the difference between [`-I`] and [`-X`], the slower HISAT2 will
-run.  This is because larger differences bewteen [`-I`] and [`-X`] require that
+run.  This is because larger differences between [`-I`] and [`-X`] require that
 HISAT2 scan a larger window to determine if a concordant alignment exists.
 For typical fragment length ranges (200 to 400 nucleotides), HISAT2 is very
 efficient.
@@ -972,8 +985,8 @@ Default: 0 (essentially imposing no minimum)
 
 </td><td>
 
-The maximum fragment length for valid paired-end alignments.  E.g. if `-X 100`
-is specified and a paired-end alignment consists of two 20-bp alignments in the
+The maximum fragment length for valid paired-end alignments.  This option is valid only with --no-spliced-alignment.
+E.g. if `-X 100` is specified and a paired-end alignment consists of two 20-bp alignments in the
 proper orientation with a 60-bp gap between them, that alignment is considered
 valid (as long as [`-I`] is also satisfied).  A 61-bp gap would not be valid in
 that case.  If trimming options [`-3`] or [`-5`] are also used, the `-X`
@@ -981,7 +994,7 @@ constraint is applied with respect to the untrimmed mates, not the trimmed
 mates.
 
 The larger the difference between [`-I`] and [`-X`], the slower HISAT2 will
-run.  This is because larger differences bewteen [`-I`] and [`-X`] require that
+run.  This is because larger differences between [`-I`] and [`-X`] require that
 HISAT2 scan a larger window to determine if a concordant alignment exists.
 For typical fragment length ranges (200 to 400 nucleotides), HISAT2 is very
 efficient.
@@ -1038,48 +1051,6 @@ concordant alignments.  A discordant alignment is an alignment where both mates
 align uniquely, but that does not satisfy the paired-end constraints
 ([`--fr`/`--rf`/`--ff`], [`-I`], [`-X`]).  This option disables that behavior.
 
-</td></tr>
-<tr><td id="hisat2-options-dovetail">
-
-[`--dovetail`]: #hisat2-options-dovetail
-
-    --dovetail
-
-</td><td>
-
-If the mates "dovetail", that is if one mate alignment extends past the
-beginning of the other such that the wrong mate begins upstream, consider that
-to be concordant.  See also: [Mates can overlap, contain or dovetail each
-other].  Default: mates cannot dovetail in a concordant alignment.
-
-[Mates can overlap, contain or dovetail each other]: #mates-can-overlap-contain-or-dovetail-each-other
-
-</td></tr>
-<tr><td id="hisat2-options-no-contain">
-
-[`--no-contain`]: #hisat2-options-no-contain
-
-    --no-contain
-
-</td><td>
-
-If one mate alignment contains the other, consider that to be non-concordant.
-See also: [Mates can overlap, contain or dovetail each other].  Default: a mate
-can contain the other in a concordant alignment.
-
-</td></tr>
-<tr><td id="hisat2-options-no-overlap">
-
-[`--no-overlap`]: #hisat2-options-no-overlap
-
-    --no-overlap
-
-</td><td>
-
-If one mate alignment overlaps the other at all, consider that to be
-non-concordant.  See also: [Mates can overlap, contain or dovetail each other]. 
-Default: mates can overlap in a concordant alignment.
-
 </td></tr></table>
 
 #### Output options
@@ -1342,7 +1313,7 @@ Add 'chr' to reference names in alignment (e.g., 18 to chr18)
 </td><td>
 
 When printing secondary alignments, HISAT2 by default will write out the `SEQ`
-and `QUAL` strings.  Specifying this option causes HISAT2 to print an asterix
+and `QUAL` strings.  Specifying this option causes HISAT2 to print an asterisk
 in those fields instead.
 
 </td></tr>
@@ -1502,7 +1473,7 @@ When one or more [`--rg`] arguments are specified, `hisat2` will also print
 an `@RG` line that includes all user-specified [`--rg`] tokens separated by
 tabs.
 
-Each subsequnt line describes an alignment or, if the read failed to align, a
+Each subsequent line describes an alignment or, if the read failed to align, a
 read.  Each line is a collection of at least 12 fields separated by tabs; from
 left to right, the fields are:
 
@@ -1894,7 +1865,7 @@ a comma-separated list of sequences rather than a list of FASTA files.
 </td><td>
 
 Force `hisat2-build` to build a [large index](#small-and-large-indexes), even if the reference is less
-than ~ 4 billion nucleotides inlong.
+than ~ 4 billion nucleotides long.
 
 </td></tr>
 <tr><td id="hisat2-build-options-a">
@@ -2048,7 +2019,7 @@ Launch `NTHREADS` parallel build threads (default: 1).
 
 Provide a list of SNPs (in the HISAT2's own format) as follows (five columns).
    
-   SNP ID `<tab>` chromosome name `<tab>` snp type (single, deletion, or insertion) `<tab>` zero-offset based genomic position of a SNP `<tab>` alternative base (single), the length of SNP (deletion), or insertion sequence (insertion)
+   SNP ID `<tab>` snp type (single, deletion, or insertion) `<tab>` chromosome name `<tab>` zero-offset based genomic position of a SNP `<tab>` alternative base (single), the length of SNP (deletion), or insertion sequence (insertion)
    
    For example,
        rs58784443      single  13      18447947        T
@@ -2077,7 +2048,7 @@ See the above option, --snp, about how to extract haplotypes.  This option is no
 
 </td><td>
 
-Note this option should be used with the followig --exon option.
+Note this option should be used with the following --exon option.
 Provide a list of splice sites (in the HISAT2's own format) as follows (four columns).
    
    chromosome name `<tab>` zero-offset based genomic position of the flanking base on the left side of an intron `<tab>` zero-offset based genomic position of the flanking base on the right `<tab>` strand
@@ -2332,7 +2303,7 @@ Stay in the directory created in the previous step, which now contains the
     $HISAT2_HOME/hisat2 -f -x $HISAT2_HOME/example/index/22_20-21M_snp -U $HISAT2_HOME/example/reads/reads_1.fa -S eg1.sam
 
 This runs the HISAT2 aligner, which aligns a set of unpaired reads to the
-the genome region using the index generated in the previous step.
+genome region using the index generated in the previous step.
 The alignment results in SAM format are written to the file `eg1.sam`, and a
 short alignment summary is written to the console.  (Actually, the summary is
 written to the "standard error" or "stderr" filehandle, which is typically
@@ -2396,13 +2367,13 @@ binary format corresponding to the SAM text format.  Run:
 
     samtools view -bS eg2.sam > eg2.bam
 
-Use `samtools sort` to convert the BAM file to a sorted BAM file.
+Use `samtools sort` to convert the BAM file to a sorted BAM file. The following command requires samtools version 1.2 or higher.
 
-    samtools sort eg2.bam eg2.sorted
+    samtools sort eg2.bam -o eg2.sorted.bam
 
 We now have a sorted BAM file called `eg2.sorted.bam`. Sorted BAM is a useful
 format because the alignments are (a) compressed, which is convenient for
-long-term storage, and (b) sorted, which is conveneint for variant discovery.
+long-term storage, and (b) sorted, which is convenient for variant discovery.
 To generate variant calls in VCF format, run:
 
     samtools mpileup -uf $HISAT2_HOME/example/reference/22_20-21M.fa eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf
diff --git a/VERSION b/VERSION
index 26e3379..b9d2bdf 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-2.0.4
\ No newline at end of file
+2.0.5
\ No newline at end of file
diff --git a/aligner_seed_policy.cpp b/aligner_seed_policy.cpp
index a68291c..204e66e 100644
--- a/aligner_seed_policy.cpp
+++ b/aligner_seed_policy.cpp
@@ -284,7 +284,7 @@ void SeedAlignmentPolicy::parseString(
 	penN              = DEFAULT_N_PENALTY;
     
     penScMax          = DEFAULT_SC_PENALTY_MAX;
-    penScMax          = DEFAULT_SC_PENALTY_MIN;
+    penScMin          = DEFAULT_SC_PENALTY_MIN;
 	
 	const double DMAX = std::numeric_limits<double>::max();
     costMin.init(
diff --git a/aln_sink.h b/aln_sink.h
index 7136fd2..6239277 100644
--- a/aln_sink.h
+++ b/aln_sink.h
@@ -218,17 +218,19 @@ struct ReportingParams {
 
 	explicit ReportingParams(
                              THitInt khits_,
+                             THitInt kseeds_,
                              THitInt mhits_,
                              THitInt pengap_,
                              bool msample_,
                              bool discord_,
                              bool mixed_)
 	{
-		init(khits_, mhits_, pengap_, msample_, discord_, mixed_);
+		init(khits_, kseeds_, mhits_, pengap_, msample_, discord_, mixed_);
 	}
 
 	void init(
               THitInt khits_,
+              THitInt kseeds_,
               THitInt mhits_,
               THitInt pengap_,
               bool msample_,
@@ -236,6 +238,7 @@ struct ReportingParams {
               bool mixed_)
 	{
 		khits   = khits_;     // -k (or high if -a)
+        kseeds  = kseeds_;
 		mhits   = ((mhits_ == 0) ? std::numeric_limits<THitInt>::max() : mhits_);
 		pengap  = pengap_;
 		msample = msample_;
@@ -295,6 +298,9 @@ struct ReportingParams {
 
 	// Number of alignments to report
 	THitInt khits;
+    
+    // Number of seeds allowed to extend
+    THitInt kseeds;
 	
 	// Read is non-unique if mhits-1 next-best alignments are within
 	// pengap of the best alignment
diff --git a/doc/manual.inc.html b/doc/manual.inc.html
index cdfc6e3..59daf71 100644
--- a/doc/manual.inc.html
+++ b/doc/manual.inc.html
@@ -12,7 +12,7 @@
 <li><a href="#distinct-alignments-map-a-read-to-different-places">Distinct alignments map a read to different places</a></li>
 <li><a href="#default-mode-search-for-one-or-more-alignments-report-each">Default mode: search for one or more alignments, report each</a></li>
 </ul></li>
-<li><a href="#alignment-summmary">Alignment summmary</a></li>
+<li><a href="#alignment-summary">Alignment summary</a></li>
 <li><a href="#wrapper">Wrapper</a></li>
 <li><a href="#small-and-large-indexes">Small and large indexes</a></li>
 <li><a href="#performance-tuning">Performance tuning</a></li>
@@ -58,7 +58,7 @@
 <p>Download HISAT2 sources and binaries from the Releases sections on the right side. Binaries are available for Intel architectures (<code>x86_64</code>) running Linux, and Mac OS X.</p>
 <h2 id="building-from-source">Building from source</h2>
 <p>Building HISAT2 from source requires a GNU-like environment with GCC, GNU Make and other basics. It should be possible to build HISAT2 on most vanilla Linux installations or on a Mac installation with <a href="http://developer.apple.com/xcode/">Xcode</a> installed. HISAT2 can also be built on Windows using <a href="http://www.cygwin.com/">Cygwin</a> or <a href="http://www.mingw.org/">MinGW</a> (MinGW recommended). For a MinGW build the choice of what compiler is to be used is importan [...]
-<p>First, download the <a href="http://ccb.jhu.edu/software/hisat2/downloads/hisat2-2.0.0-beta-source.zip">source package</a> from the Releases secion on the right side. Unzip the file, change to the unzipped directory, and build the HISAT2 tools by running GNU <code>make</code> (usually with the command <code>make</code>, but sometimes with <code>gmake</code>) with no arguments. If building with MinGW, run <code>make</code> from the MSYS environment.</p>
+<p>First, download the <a href="http://ccb.jhu.edu/software/hisat2/downloads/hisat2-2.0.0-beta-source.zip">source package</a> from the Releases section on the right side. Unzip the file, change to the unzipped directory, and build the HISAT2 tools by running GNU <code>make</code> (usually with the command <code>make</code>, but sometimes with <code>gmake</code>) with no arguments. If building with MinGW, run <code>make</code> from the MSYS environment.</p>
 <p>HISAT2 is using the multithreading software model in order to speed up execution times on SMP architectures where this is possible. On POSIX platforms (like linux, Mac OS, etc) it needs the pthread library. Although it is possible to use pthread library on non-POSIX platform like Windows, due to performance reasons HISAT2 will try to use Windows native multithreading if possible.</p>
 <p>For the support of SRA data access in HISAT2, please download and install the <a href="https://github.com/ncbi/ngs/wiki/Downloads">NCBI-NGS</a> toolkit. When running <code>make</code>, specify additional variables as follow. <code>make USE_SRA=1 NCBI_NGS_DIR=/path/to/NCBI-NGS-directory NCBI_VDB_DIR=/path/to/NCBI-NGS-directory</code>, where <code>NCBI_NGS_DIR</code> and <code>NCBI_VDB_DIR</code> will be used in Makefile for -I and -L compilation options. For example, $(NCBI_NGS_DIR)/in [...]
 <h1 id="running-hisat2">Running HISAT2</h1>
@@ -68,27 +68,14 @@
 <h2 id="reporting">Reporting</h2>
 <p>The reporting mode governs how many alignments HISAT2 looks for, and how to report them.</p>
 <p>In general, when we say that a read has an alignment, we mean that it has a <a href="#valid-alignments-meet-or-exceed-the-minimum-score-threshold">valid alignment</a>. When we say that a read has multiple alignments, we mean that it has multiple alignments that are valid and distinct from one another.</p>
+<p>By default, HISAT2 may soft-clip reads near their 5' and 3' ends. Users can control this behavior by setting different penalties for soft-clipping (<a href="#hisat2-options-no-softclip"><code>--sp</code></a>) or by disallowing soft-clipping ([<code>--no-softclip</code>]).</p>
 <h3 id="distinct-alignments-map-a-read-to-different-places">Distinct alignments map a read to different places</h3>
 <p>Two alignments for the same individual read are "distinct" if they map the same read to different places. Specifically, we say that two alignments are distinct if there are no alignment positions where a particular read offset is aligned opposite a particular reference offset in both alignments with the same orientation. E.g. if the first alignment is in the forward orientation and aligns the read character at read offset 10 to the reference character at chromosome 3, offset [...]
 <p>Two alignments for the same pair are distinct if either the mate 1s in the two paired-end alignments are distinct or the mate 2s in the two alignments are distinct or both.</p>
-<!--
-### Default mode: search for multiple alignments, report the best one
-
-By default, HISAT2 searches for distinct, valid alignments for each read. When
-it finds a valid alignment, it generally will continue to look for alignments
-that are nearly as good or better.  It will eventually stop looking, either
-because it exceeded a limit placed on search effort (see [`-D`] and [`-R`]) or
-because it already knows all it needs to know to report an alignment.
-Information from the best alignments are used to estimate mapping quality (the
-`MAPQ` [SAM] field) and to set SAM optional fields, such as [`AS:i`] and
-[`XS:i`].  HISAT2 does not gaurantee that the alignment reported is the best
-possible in terms of alignment score.
--->
-
 <h3 id="default-mode-search-for-one-or-more-alignments-report-each">Default mode: search for one or more alignments, report each</h3>
 <p>HISAT2 searches for up to N distinct, primary alignments for each read, where N equals the integer specified with the <code>-k</code> parameter. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments. It is possible that multiple distinct alignments have the same score. That is, if <code>-k 2</code> is specified, HISAT2 will search for at most 2 distinct alignments. The alignment score for a paired-end alignment equals the sum of the alig [...]
-<p>HISAT2 does not "find" alignments in any specific order, so for reads that have more than N distinct, valid alignments, HISAT2 does not gaurantee that the N alignments reported are the best possible in terms of alignment score. Still, this mode can be effective and fast in situations where the user cares more about whether a read aligns (or aligns a certain number of times) than where exactly it originated.</p>
-<h2 id="alignment-summmary">Alignment summmary</h2>
+<p>HISAT2 does not "find" alignments in any specific order, so for reads that have more than N distinct, valid alignments, HISAT2 does not guarantee that the N alignments reported are the best possible in terms of alignment score. Still, this mode can be effective and fast in situations where the user cares more about whether a read aligns (or aligns a certain number of times) than where exactly it originated.</p>
+<h2 id="alignment-summary">Alignment summary</h2>
 <p>When HISAT2 finishes running, it prints messages summarizing what happened. These messages are printed to the "standard error" ("stderr") filehandle. For datasets consisting of unpaired reads, the summary might look like this:</p>
 <pre><code>20000 reads; of these:
   20000 (100.00%) were unpaired; of these:
@@ -114,7 +101,7 @@ possible in terms of alignment score.
 96.70% overall alignment rate</code></pre>
 <p>The indentation indicates how subtotals relate to totals.</p>
 <h2 id="wrapper">Wrapper</h2>
-<p>The <code>hisat2</code>, <code>hisat2-build</code> and <code>hisat2-inspect</code> executables are actually wrapper scripts that call binary programs as appropriate. The wrappers shield users from having to distinguish between "small" and "large" index formats, discussed briefly in the following section. Also, the <code>hisat2</code> wrapper provides some key functionality, like the ability to handle compressed inputs, and the fucntionality for <a href="#hisat2-opt [...]
+<p>The <code>hisat2</code>, <code>hisat2-build</code> and <code>hisat2-inspect</code> executables are actually wrapper scripts that call binary programs as appropriate. The wrappers shield users from having to distinguish between "small" and "large" index formats, discussed briefly in the following section. Also, the <code>hisat2</code> wrapper provides some key functionality, like the ability to handle compressed inputs, and the functionality for <a href="#hisat2-opt [...]
 <p>It is recommended that you always run the hisat2 wrappers and not run the binaries directly.</p>
 <h2 id="small-and-large-indexes">Small and large indexes</h2>
 <p><code>hisat2-build</code> can index reference genomes of any size. For genomes less than about 4 billion nucleotides in length, <code>hisat2-build</code> builds a "small" index using 32-bit numbers in various parts of the index. When the genome is longer, <code>hisat2-build</code> builds a "large" index using 64-bit numbers. Small indexes are stored in files with the <code>.ht2</code> extension, and large indexes are stored in files with the <code>.ht2l</code> exte [...]
@@ -311,6 +298,14 @@ possible in terms of alignment score.
 
 <p>Sets the maximum (<code>MX</code>) and minimum (<code>MN</code>) penalties for soft-clipping per base, both integers. A number less than or equal to <code>MX</code> and greater than or equal to <code>MN</code> is subtracted from the alignment score for each position. The number subtracted is <code>MN + floor( (MX-MN)(MIN(Q, 40.0)/40.0) )</code> where Q is the Phred quality value. Default: <code>MX</code> = 2, <code>MN</code> = 1.</p>
 </td></tr>
+<tr><td id="hisat2-options-no-softclip">
+
+<pre><code>--no-softclip</code></pre>
+</td><td>
+
+<p>Disallow soft-clipping.</p>
+</td></tr>
+
 <tr><td id="hisat2-options-np">
 
 <pre><code>--np <int></code></pre>
@@ -450,7 +445,7 @@ possible in terms of alignment score.
 <pre><code>--dta/--downstream-transcriptome-assembly</code></pre>
 </td><td>
 
-<p>Report alignments tailored for transcript assemblers including StringTie. With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computationa and memory usage.</p>
+<p>Report alignments tailored for transcript assemblers including StringTie. With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computation and memory usage.</p>
 </td></tr>
 
 <tr><td id="hisat2-options-dta-cufflinks">
@@ -470,10 +465,18 @@ possible in terms of alignment score.
 <pre><code>-k <int></code></pre>
 </td><td>
 
-<p>It searches for at most <code><int></code> distinct, primary alignments for each read. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments. The search terminates when it can't find more distinct valid alignments, or when it finds <code><int></code>, whichever happens first. The alignment score for a paired-end alignment equals the sum of the alignment scores of the individual mates. Each reported read or pair alignment beyo [...]
+<p>It searches for at most <code><int></code> distinct, primary alignments for each read. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments. The search terminates when it can't find more distinct valid alignments, or when it finds <code><int></code>, whichever happens first. The alignment score for a paired-end alignment equals the sum of the alignment scores of the individual mates. Each reported read or pair alignment beyo [...]
 <p>Note: HISAT2 is not designed with large values for <code>-k</code> in mind, and when aligning reads to long, repetitive genomes large <code>-k</code> can be very, very slow.</p>
 </td></tr>
-<tr><td id="hisat2-options-k">
+<tr><td id="hisat2-options-max-seeds">
+
+<pre><code>--max-seeds <int></code></pre>
+</td><td>
+
+<p>HISAT2, like other aligners, uses seed-and-extend approaches. HISAT2 tries to extend seeds to full-length alignments. In HISAT2, --max-seeds is used to control the maximum number of seeds that will be extended. HISAT2 extends up to these many seeds and skips the rest of the seeds. Large values for <code>--max-seeds</code> may improve alignment sensitivity, but HISAT2 is not designed with large values for <code>--max-seeds</code> in mind, and when aligning reads to long, repetitive gen [...]
+</td></tr>
+
+<tr><td id="hisat2-options-secondary">
 
 <pre><code>--secondary</code></pre>
 </td><td>
@@ -491,8 +494,8 @@ possible in terms of alignment score.
 <pre><code>-I/--minins <int></code></pre>
 </td><td>
 
-<p>The minimum fragment length for valid paired-end alignments. E.g. if <code>-I 60</code> is specified and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid (as long as <a href="#hisat2-options-X"><code>-X</code></a> is also satisfied). A 19-bp gap would not be valid in that case. If trimming options <a href="#hisat2-options-3"><code>-3</code></a> or <a href="#hisat2-options-5"><code>- [...]
-<p>The larger the difference between <a href="#hisat2-options-I"><code>-I</code></a> and <a href="#hisat2-options-X"><code>-X</code></a>, the slower HISAT2 will run. This is because larger differences bewteen <a href="#hisat2-options-I"><code>-I</code></a> and <a href="#hisat2-options-X"><code>-X</code></a> require that HISAT2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), HISAT2 is very efficient.</p>
+<p>The minimum fragment length for valid paired-end alignments.This option is valid only with --no-spliced-alignment. E.g. if <code>-I 60</code> is specified and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid (as long as <a href="#hisat2-options-X"><code>-X</code></a> is also satisfied). A 19-bp gap would not be valid in that case. If trimming options <a href="#hisat2-options-3"><cod [...]
+<p>The larger the difference between <a href="#hisat2-options-I"><code>-I</code></a> and <a href="#hisat2-options-X"><code>-X</code></a>, the slower HISAT2 will run. This is because larger differences between <a href="#hisat2-options-I"><code>-I</code></a> and <a href="#hisat2-options-X"><code>-X</code></a> require that HISAT2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), HISAT2 is very efficient.</p>
 <p>Default: 0 (essentially imposing no minimum)</p>
 </td></tr>
 <tr><td id="hisat2-options-X">
@@ -500,8 +503,8 @@ possible in terms of alignment score.
 <pre><code>-X/--maxins <int></code></pre>
 </td><td>
 
-<p>The maximum fragment length for valid paired-end alignments. E.g. if <code>-X 100</code> is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as <a href="#hisat2-options-I"><code>-I</code></a> is also satisfied). A 61-bp gap would not be valid in that case. If trimming options <a href="#hisat2-options-3"><code>-3</code></a> or <a href="#hisat2-options-5"><code>-5</c [...]
-<p>The larger the difference between <a href="#hisat2-options-I"><code>-I</code></a> and <a href="#hisat2-options-X"><code>-X</code></a>, the slower HISAT2 will run. This is because larger differences bewteen <a href="#hisat2-options-I"><code>-I</code></a> and <a href="#hisat2-options-X"><code>-X</code></a> require that HISAT2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), HISAT2 is very efficient.</p>
+<p>The maximum fragment length for valid paired-end alignments. This option is valid only with --no-spliced-alignment. E.g. if <code>-X 100</code> is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as <a href="#hisat2-options-I"><code>-I</code></a> is also satisfied). A 61-bp gap would not be valid in that case. If trimming options <a href="#hisat2-options-3"><code>- [...]
+<p>The larger the difference between <a href="#hisat2-options-I"><code>-I</code></a> and <a href="#hisat2-options-X"><code>-X</code></a>, the slower HISAT2 will run. This is because larger differences between <a href="#hisat2-options-I"><code>-I</code></a> and <a href="#hisat2-options-X"><code>-X</code></a> require that HISAT2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), HISAT2 is very efficient.</p>
 <p>Default: 500.</p>
 </td></tr>
 <tr><td id="hisat2-options-fr">
@@ -524,27 +527,6 @@ possible in terms of alignment score.
 </td><td>
 
 <p>By default, <code>hisat2</code> looks for discordant alignments if it cannot find any concordant alignments. A discordant alignment is an alignment where both mates align uniquely, but that does not satisfy the paired-end constraints (<a href="#hisat2-options-fr"><code>--fr</code>/<code>--rf</code>/<code>--ff</code></a>, <a href="#hisat2-options-I"><code>-I</code></a>, <a href="#hisat2-options-X"><code>-X</code></a>). This option disables that behavior.</p>
-</td></tr>
-<tr><td id="hisat2-options-dovetail">
-
-<pre><code>--dovetail</code></pre>
-</td><td>
-
-<p>If the mates "dovetail", that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. See also: <a href="#mates-can-overlap-contain-or-dovetail-each-other">Mates can overlap, contain or dovetail each other</a>. Default: mates cannot dovetail in a concordant alignment.</p>
-</td></tr>
-<tr><td id="hisat2-options-no-contain">
-
-<pre><code>--no-contain</code></pre>
-</td><td>
-
-<p>If one mate alignment contains the other, consider that to be non-concordant. See also: <a href="#mates-can-overlap-contain-or-dovetail-each-other">Mates can overlap, contain or dovetail each other</a>. Default: a mate can contain the other in a concordant alignment.</p>
-</td></tr>
-<tr><td id="hisat2-options-no-overlap">
-
-<pre><code>--no-overlap</code></pre>
-</td><td>
-
-<p>If one mate alignment overlaps the other at all, consider that to be non-concordant. See also: <a href="#mates-can-overlap-contain-or-dovetail-each-other">Mates can overlap, contain or dovetail each other</a>. Default: mates can overlap in a concordant alignment.</p>
 </td></tr></table>
 
 <h4 id="output-options">Output options</h4>
@@ -680,7 +662,7 @@ possible in terms of alignment score.
 <pre><code>--omit-sec-seq</code></pre>
 </td><td>
 
-<p>When printing secondary alignments, HISAT2 by default will write out the <code>SEQ</code> and <code>QUAL</code> strings. Specifying this option causes HISAT2 to print an asterix in those fields instead.</p>
+<p>When printing secondary alignments, HISAT2 by default will write out the <code>SEQ</code> and <code>QUAL</code> strings. Specifying this option causes HISAT2 to print an asterisk in those fields instead.</p>
 </td></tr>
 
 
@@ -759,7 +741,7 @@ possible in terms of alignment score.
 <h2 id="sam-output">SAM output</h2>
 <p>Following is a brief description of the <a href="http://samtools.sourceforge.net/SAM1.pdf">SAM</a> format as output by <code>hisat2</code>. For more details, see the <a href="http://samtools.sourceforge.net/SAM1.pdf">SAM format specification</a>.</p>
 <p>By default, <code>hisat2</code> prints a SAM header with <code>@HD</code>, <code>@SQ</code> and <code>@PG</code> lines. When one or more <a href="#hisat2-options-rg"><code>--rg</code></a> arguments are specified, <code>hisat2</code> will also print an <code>@RG</code> line that includes all user-specified <a href="#hisat2-options-rg"><code>--rg</code></a> tokens separated by tabs.</p>
-<p>Each subsequnt line describes an alignment or, if the read failed to align, a read. Each line is a collection of at least 12 fields separated by tabs; from left to right, the fields are:</p>
+<p>Each subsequent line describes an alignment or, if the read failed to align, a read. Each line is a collection of at least 12 fields separated by tabs; from left to right, the fields are:</p>
 <ol style="list-style-type: decimal">
 <li><p>Name of read that aligned.</p>
 <p>Note that the <a href="http://samtools.sourceforge.net/SAM1.pdf">SAM specification</a> disallows whitespace in the read name. If the read name contains any whitespace characters, HISAT2 will truncate the name at the first whitespace character. This is similar to the behavior of other tools.</p></li>
@@ -983,7 +965,7 @@ Otherwise, you will be able to build an index on your desktop with 8GB RAM.</cod
 <pre><code>--large-index</code></pre>
 </td><td>
 
-<p>Force <code>hisat2-build</code> to build a <a href="#small-and-large-indexes">large index</a>, even if the reference is less than ~ 4 billion nucleotides inlong.</p>
+<p>Force <code>hisat2-build</code> to build a <a href="#small-and-large-indexes">large index</a>, even if the reference is less than ~ 4 billion nucleotides long.</p>
 </td></tr>
 <tr><td id="hisat2-build-options-a">
 
@@ -1063,7 +1045,7 @@ Otherwise, you will be able to build an index on your desktop with 8GB RAM.</cod
 </td><td>
 
 <p>Provide a list of SNPs (in the HISAT2's own format) as follows (five columns).</p>
-<p>SNP ID <code><tab></code> chromosome name <code><tab></code> snp type (single, deletion, or insertion) <code><tab></code> zero-offset based genomic position of a SNP <code><tab></code> alternative base (single), the length of SNP (deletion), or insertion sequence (insertion)</p>
+<p>SNP ID <code><tab></code> snp type (single, deletion, or insertion) <code><tab></code> chromosome name <code><tab></code> zero-offset based genomic position of a SNP <code><tab></code> alternative base (single), the length of SNP (deletion), or insertion sequence (insertion)</p>
 <p>For example, rs58784443 single 13 18447947 T</p>
 <p>Use <code>hisat2_extract_snps_haplotypes_UCSC.py</code> (in the HISAT2 package) to extract SNPs and haplotypes from a dbSNP file (e.g. http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/snp144Common.txt.gz). or <code>hisat2_extract_snps_haplotypes_VCF.py</code> to extract SNPs and haplotypes from a VCF file (e.g. ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/ALL.chr22.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh3 [...]
 </td></tr><tr><td>
@@ -1080,7 +1062,7 @@ Otherwise, you will be able to build an index on your desktop with 8GB RAM.</cod
 <pre><code>--ss <path></code></pre>
 </td><td>
 
-<p>Note this option should be used with the followig --exon option. Provide a list of splice sites (in the HISAT2's own format) as follows (four columns).</p>
+<p>Note this option should be used with the following --exon option. Provide a list of splice sites (in the HISAT2's own format) as follows (four columns).</p>
 <p>chromosome name <code><tab></code> zero-offset based genomic position of the flanking base on the left side of an intron <code><tab></code> zero-offset based genomic position of the flanking base on the right <code><tab></code> strand</p>
 <p>Use <code>hisat2_extract_splice_sites.py</code> (in the HISAT2 package) to extract splice sites from a GTF file.</p>
 </td></tr><tr><td>
@@ -1219,7 +1201,7 @@ Sequence-N  <name>  <len></code></pre>
 <h2 id="aligning-example-reads">Aligning example reads</h2>
 <p>Stay in the directory created in the previous step, which now contains the <code>22_20-21M</code> index files. Next, run:</p>
 <pre><code>$HISAT2_HOME/hisat2 -f -x $HISAT2_HOME/example/index/22_20-21M_snp -U $HISAT2_HOME/example/reads/reads_1.fa -S eg1.sam</code></pre>
-<p>This runs the HISAT2 aligner, which aligns a set of unpaired reads to the the genome region using the index generated in the previous step. The alignment results in SAM format are written to the file <code>eg1.sam</code>, and a short alignment summary is written to the console. (Actually, the summary is written to the "standard error" or "stderr" filehandle, which is typically printed to the console.)</p>
+<p>This runs the HISAT2 aligner, which aligns a set of unpaired reads to the genome region using the index generated in the previous step. The alignment results in SAM format are written to the file <code>eg1.sam</code>, and a short alignment summary is written to the console. (Actually, the summary is written to the "standard error" or "stderr" filehandle, which is typically printed to the console.)</p>
 <p>To see the first few lines of the SAM output, run:</p>
 <pre><code>head eg1.sam</code></pre>
 <p>You will see something like this:</p>
@@ -1247,9 +1229,9 @@ Sequence-N  <name>  <len></code></pre>
 <pre><code>$HISAT2_HOME/hisat -f -x $HISAT2_HOME/example/index/22_20-21M_snp -1 $HISAT2_HOME/example/reads/reads_1.fa -2 $HISAT2_HOME/example/reads/reads_2.fa -S eg2.sam</code></pre>
 <p>Use <code>samtools view</code> to convert the SAM file into a BAM file. BAM is a the binary format corresponding to the SAM text format. Run:</p>
 <pre><code>samtools view -bS eg2.sam > eg2.bam</code></pre>
-<p>Use <code>samtools sort</code> to convert the BAM file to a sorted BAM file.</p>
-<pre><code>samtools sort eg2.bam eg2.sorted</code></pre>
-<p>We now have a sorted BAM file called <code>eg2.sorted.bam</code>. Sorted BAM is a useful format because the alignments are (a) compressed, which is convenient for long-term storage, and (b) sorted, which is conveneint for variant discovery. To generate variant calls in VCF format, run:</p>
+<p>Use <code>samtools sort</code> to convert the BAM file to a sorted BAM file. The following command requires samtools version 1.2 or higher.</p>
+<pre><code>samtools sort eg2.bam -o eg2.sorted.bam</code></pre>
+<p>We now have a sorted BAM file called <code>eg2.sorted.bam</code>. Sorted BAM is a useful format because the alignments are (a) compressed, which is convenient for long-term storage, and (b) sorted, which is convenient for variant discovery. To generate variant calls in VCF format, run:</p>
 <pre><code>samtools mpileup -uf $HISAT2_HOME/example/reference/22_20-21M.fa eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf</code></pre>
 <p>Then to view the variants, run:</p>
 <pre><code>bcftools view eg2.raw.bcf</code></pre>
diff --git a/gfm.h b/gfm.h
index 5e7f9b1..1049099 100644
--- a/gfm.h
+++ b/gfm.h
@@ -4117,15 +4117,15 @@ void GFM<index_t>::joinToDisk(
 				//assert_eq(0, _refnames.back().length());
 				_refnames.pop_back();
 			}
-			assert_lt(szsi, szs.size());
-			assert_eq(rec.off, szs[szsi].off);
-			assert_eq(rec.len, szs[szsi].len);
-			assert_eq(rec.first, szs[szsi].first);
-			assert(rec.first || rec.off > 0);
-			ASSERT_ONLY(szsi++);
 			// Increment seqsRead if this is the first fragment
 			if(rec.first && rec.len > 0) seqsRead++;
 			if(bases == 0) continue;
+            assert_lt(szsi, szs.size());
+            assert_eq(rec.off, szs[szsi].off);
+            assert_eq(rec.len, szs[szsi].len);
+            assert_eq(rec.first, szs[szsi].first);
+            assert(rec.first || rec.off > 0);
+            ASSERT_ONLY(szsi++);
 			assert_leq(bases, this->plen()[seqsRead-1]);
 			// Reset the patoff if this is the first fragment
 			if(rec.first) patoff = 0;
diff --git a/gp.h b/gp.h
new file mode 100644
index 0000000..35f8f44
--- /dev/null
+++ b/gp.h
@@ -0,0 +1,65 @@
+/*
+ * Copyright 2016, Daehwan Kim <infphilo at gmail.com>
+ *
+ * This file is part of HISAT 2.
+ *
+ * HISAT 2 is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * HISAT 2 is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with HISAT 2.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ *  gp.h
+ *
+ */
+
+#ifndef GP_H_
+#define GP_H_
+
+#include <iostream>
+#include <stdint.h>
+
+/**
+ * Encapsulates alignment policy for graph
+ */
+class GraphPolicy {
+    
+public:
+    
+    GraphPolicy() { reset(); }
+    
+    GraphPolicy(size_t maxAltsTried)
+    {
+        init(maxAltsTried);
+    }
+    
+    /**
+     */
+    void reset() {
+        init(0);
+    }
+    
+    /**
+     */
+    void init(size_t maxAltsTried)
+    {
+        maxAltsTried_ = maxAltsTried;
+    }
+    
+    size_t maxAltsTried() const { return maxAltsTried_; }
+    
+    
+private:
+    size_t maxAltsTried_;
+};
+
+#endif /*ndef GP_H_*/
diff --git a/hi_aligner.h b/hi_aligner.h
index 72d240f..7183e17 100644
--- a/hi_aligner.h
+++ b/hi_aligner.h
@@ -42,6 +42,7 @@
 #include "aligner_sw_driver.h"
 #include "group_walk.h"
 #include "tp.h"
+#include "gp.h"
 
 // Allow longer introns for long anchored reads involving canonical splice sites
 inline uint32_t MaxIntronLen(uint32_t anchor, uint32_t minAnchorLen) {
@@ -565,6 +566,7 @@ struct GenomeHit {
                      index_t                    maxIntronLen,
                      index_t                    minAnchorLen,         // minimum anchor length for canonical splice site
                      index_t                    minAnchorLen_noncan,  // minimum anchor length for non-canonical splice site
+                     const index_t              maxAltsTried,
                      const SpliceSite*          spliceSite = NULL,    // penalty for splice site
                      bool                       no_spliced_alignment = false);
     
@@ -588,6 +590,7 @@ struct GenomeHit {
                 index_t                 maxIntronLen,
                 index_t                 minAnchorLen,
                 index_t                 minAnchorLen_noncan,
+                const index_t           maxAltsTried,
                 index_t&                leftext,
                 index_t&                rightext,
                 index_t                 mm = 0);
@@ -605,7 +608,8 @@ struct GenomeHit {
                               const Read&                 rd,
                               const GFM<index_t>&         gfm,
                               const ALTDB<index_t>&       altdb,
-                              const BitPairReference&     ref);
+                              const BitPairReference&     ref,
+                              const index_t               maxAltsTried);
     
     /**
      * Adjust alignment with respect to SNPs, usually updating Edits
@@ -615,7 +619,8 @@ struct GenomeHit {
                        const Read&             rd,
                        const GFM<index_t>&     gfm,
                        const ALTDB<index_t>&   altdb,
-                       const BitPairReference& ref);
+                       const BitPairReference& ref,
+                       const index_t           maxAltsTried);
    
     /*
      *
@@ -653,6 +658,7 @@ struct GenomeHit {
                                  int                         rfoff,
                                  index_t                     rflen,
                                  bool                        left,
+                                 const index_t               max_numALTsTried,
                                  EList<Edit>&                edits,
                                  ELList<Edit, 128, 4>*       candidate_edits = NULL,
                                  index_t                     mm = 0,
@@ -688,6 +694,7 @@ struct GenomeHit {
                             0, /* tmp_numNs */
                             numNs,
                             0,    /* dep */
+                            max_numALTsTried,
                             numALTsTried);
         index_t extlen = 0;
         if(left) {
@@ -755,7 +762,8 @@ struct GenomeHit {
                                        index_t                     tmp_numNs,
                                        index_t*                    numNs,
                                        index_t                     dep,
-                                       index_t&                    numSnpsTried,
+                                       const index_t               max_numALTsTried,
+                                       index_t&                    numALTsTried,
                                        ALT_TYPE                    prev_alt_type = ALT_NONE);
     
     /**
@@ -1220,8 +1228,7 @@ bool GenomeHit<index_t>::compatibleWith(
     
     // check if there is a deletion, an insertion, or a potential intron
     // between the two partial alignments
-    if(refdif >= rddif + minIntronLen) {
-        if(no_spliced_alignment) return false;
+    if(!no_spliced_alignment) {
         if(refdif > rddif + maxIntronLen) {
             return false;
         }
@@ -1239,7 +1246,7 @@ bool GenomeHit<index_t>::combineWith(
                                      const Read&                rd,
                                      const GFM<index_t>&        gfm,
                                      const BitPairReference&    ref,
-                                     const ALTDB<index_t>&      snpdb,
+                                     const ALTDB<index_t>&      altdb,
                                      SpliceSiteDB&              ssdb,
                                      SwAligner&                 swa,
                                      SwMetrics&                 swm,
@@ -1251,6 +1258,7 @@ bool GenomeHit<index_t>::combineWith(
                                      index_t                    maxIntronLen,
                                      index_t                    minAnchorLen,           // minimum anchor length for canonical splice site
                                      index_t                    minAnchorLen_noncan,    // minimum anchor length for non-canonical splice site
+                                     const index_t              maxAltsTried,
                                      const SpliceSite*          spliceSite,             // penalty for splice site
                                      bool                       no_spliced_alignment)
 {
@@ -1294,7 +1302,7 @@ bool GenomeHit<index_t>::combineWith(
     bool spliced = false, ins = false, del = false;
     if(refdif != rddif) {
         if(refdif > rddif) {
-            if(refdif - rddif >= minIntronLen) {
+            if(!no_spliced_alignment && refdif - rddif >= minIntronLen) {
                 assert_leq(refdif - rddif, maxIntronLen);
                 spliced = true;
             } else {
@@ -1842,6 +1850,7 @@ bool GenomeHit<index_t>::extend(
                                 index_t                 maxIntronLen,
                                 index_t                 minAnchorLen,
                                 index_t                 minAnchorLen_noncan,
+                                const index_t           maxAltsTried,
                                 index_t&                leftext,
                                 index_t&                rightext,
                                 index_t                 mm)
@@ -1874,23 +1883,24 @@ bool GenomeHit<index_t>::extend(
         }
         index_t numNs = 0;
         index_t num_prev_edits = (index_t)_edits->size();
-        index_t best_ext =  alignWithALTs(
-                                          altdb.alts(),
-                                          this->_joinedOff,
-                                          seq,
-                                          this->_rdoff - 1,
-                                          this->_rdoff - 1,
-                                          this->_rdoff,
-                                          ref,
-                                          *_sharedVars,
-                                          _tidx,
-                                          rl,
-                                          reflen,
-                                          true, /* left? */
-                                          *this->_edits,
-                                          NULL,
-                                          mm,
-                                          &numNs);
+        index_t best_ext = alignWithALTs(
+                                         altdb.alts(),
+                                         this->_joinedOff,
+                                         seq,
+                                         this->_rdoff - 1,
+                                         this->_rdoff - 1,
+                                         this->_rdoff,
+                                         ref,
+                                         *_sharedVars,
+                                         _tidx,
+                                         rl,
+                                         reflen,
+                                         true, /* left? */
+                                         maxAltsTried,
+                                         *this->_edits,
+                                         NULL,
+                                         mm,
+                                         &numNs);
         // Do not allow for any edits including known snps and splice sites when extending zero-length hit
         if(_len == 0 && _edits->size() > 0) {
             _edits->clear();
@@ -1962,6 +1972,7 @@ bool GenomeHit<index_t>::extend(
                                              (int)rl,
                                              reflen,
                                              false,
+                                             maxAltsTried,
                                              *this->_edits,
                                              NULL,
                                              mm);
@@ -2024,7 +2035,8 @@ bool GenomeHit<index_t>::adjustWithALT(
                                        const Read&                 rd,
                                        const GFM<index_t>&         gfm,
                                        const ALTDB<index_t>&       altdb,
-                                       const BitPairReference&     ref)
+                                       const BitPairReference&     ref,
+                                       const index_t               maxAltsTried)
 {
     if(gfm.gh().linearFM()) {
         genomeHits.expand();
@@ -2081,7 +2093,10 @@ bool GenomeHit<index_t>::adjustWithALT(
         index_t orig_joinedOff = genomeHit._joinedOff;
         index_t orig_toff = genomeHit._toff;
         bool found2 = false;
-        if(offDiffs.size() > 4) offDiffs.resize(4);
+        // maxAltsTried is not directly related to the size of offDiffs,
+        // but let's make the size of offDiffs is determined by maxAltsTried
+        const index_t max_offDiffs_size = max<index_t>(4, maxAltsTried / 4);
+        if(offDiffs.size() > max_offDiffs_size) offDiffs.resize(max_offDiffs_size);
         for(index_t o = 0; o < offDiffs.size() && !found2; o++) {
             const pair<index_t, int>& offDiff = offDiffs[o];
 #ifndef NDEBUG
@@ -2121,6 +2136,7 @@ bool GenomeHit<index_t>::adjustWithALT(
                                                (int)genomeHit._toff,
                                                reflen,
                                                false, /* left? */
+                                               maxAltsTried,
                                                *genomeHit._edits,
                                                &candidate_edits);
             if(alignedLen == genomeHit._len) {
@@ -2164,7 +2180,8 @@ bool GenomeHit<index_t>::adjustWithALT(
                                        const Read&             rd,
                                        const GFM<index_t>&     gfm,
                                        const ALTDB<index_t>&   altdb,
-                                       const BitPairReference& ref)
+                                       const BitPairReference& ref,
+                                       const index_t           maxAltsTried)
 {
     if(gfm.gh().linearFM()) return true;
     assert_lt(this->_tidx, ref.numRefs());
@@ -2180,9 +2197,10 @@ bool GenomeHit<index_t>::adjustWithALT(
     index_t orig_joinedOff = this->_joinedOff;
     index_t orig_toff = this->_toff;
     bool found = false;
-    // daehwan - for debugging purposes
-    if(offDiffs.size() > 16) offDiffs.resize(16);
-    // if(offDiffs.size() > 4) offDiffs.resize(4);
+    // maxAltsTried is not directly related to the size of offDiffs,
+    // but let's make the size of offDiffs is determined by maxAltsTried
+    const index_t max_offDiffs_size = max<index_t>(4, maxAltsTried / 4);
+    if(offDiffs.size() > max_offDiffs_size) offDiffs.resize(max_offDiffs_size);
     for(index_t o = 0; o < offDiffs.size() && !found; o++) {
         const pair<index_t, int>& offDiff = offDiffs[o];
 #ifndef NDEBUG
@@ -2219,6 +2237,7 @@ bool GenomeHit<index_t>::adjustWithALT(
                                            (int)this->_toff,
                                            reflen,
                                            false, /* left? */
+                                           maxAltsTried,
                                            *this->_edits,
                                            &_sharedVars->candidate_edits);
         if(alignedLen == this->_len) {
@@ -2236,7 +2255,7 @@ bool GenomeHit<index_t>::adjustWithALT(
 }
 
 /*
- * Find offset differences due to deletions
+ * Find offset differences due to splice sites
  */
 template <typename index_t>
 void GenomeHit<index_t>::findSSOffs(
@@ -2299,7 +2318,7 @@ void GenomeHit<index_t>::findSSOffs(
 
 
 /*
- *
+ * Find offset differences due to indels
  */
 template <typename index_t>
 void GenomeHit<index_t>::findOffDiffs(
@@ -2394,10 +2413,11 @@ index_t GenomeHit<index_t>::alignWithALTs_recur(
                                                 index_t                     tmp_numNs,
                                                 index_t*                    numNs,
                                                 index_t                     dep,
+                                                const index_t               max_numALTsTried,
                                                 index_t&                    numALTsTried,
                                                 ALT_TYPE                    prev_alt_type)
 {
-    if(numALTsTried > 16 + dep) return 0;
+    if(numALTsTried > max_numALTsTried + dep) return 0;
     assert_gt(rdlen, 0);
     assert_gt(rflen, 0);
     if(raw_refbufs.size() <= dep) raw_refbufs.expand();
@@ -2456,8 +2476,8 @@ index_t GenomeHit<index_t>::alignWithALTs_recur(
             tmp_mm = 0;
         }
         // Find SNPs included in this region
-        pair<index_t, index_t> alt_range;
-        {
+        pair<int, int> alt_range(0, 0);
+        if(alts.size() > 0) {
             ALT<index_t> cmp_alt;
             const index_t minK = 16;
             assert_leq(mm_min_rd_i, rdoff);
@@ -2468,13 +2488,16 @@ index_t GenomeHit<index_t>::alignWithALTs_recur(
             } else {
                 cmp_alt.pos = joinedOff - rd_diff;
             }
-            alt_range.first = alt_range.second = (index_t)alts.bsearchLoBound(cmp_alt);
-            if(alt_range.first >= alts.size()) return 0;
-            for(; alt_range.first > 0; alt_range.first--) {
+            alt_range.first = alt_range.second = (int)alts.bsearchLoBound(cmp_alt);
+            if(alt_range.first >= alts.size()) {
+                assert_gt(alts.size(), 0);
+                alt_range.first = alt_range.second = alt_range.second - 1;
+            }
+            for(; alt_range.first >= 0; alt_range.first--) {
                 const ALT<index_t>& alt = alts[alt_range.first];
                 if(alt.snp()) {
                     if(alt.deletion() && !alt.reversed) continue;
-                    if(alt.pos + rdlen - 1 < joinedOff) break;
+                    if(alt.pos + rdlen < joinedOff) break;
                 } else if(alt.splicesite()) {
                     if(alt.left < alt.right) continue;
                     if(alt.left + rdlen - 1 < joinedOff) break;
@@ -2540,17 +2563,47 @@ index_t GenomeHit<index_t>::alignWithALTs_recur(
                     alt_compatible = true;
                 }
             } else if(alt.type == ALT_SNP_DEL) {
-                if(rf_i > (int)alt.len) {
-                    for(index_t i = 0; i < alt.len; i++) {
-                        int rf_bp = rfseq[rf_i - i];
-                        Edit e(
-                               rd_i + 1,
-                               "ACGTN"[rf_bp],
-                               '-',
-                               EDIT_TYPE_READ_GAP,
-                               true, /* chars? */
-                               alt_range.second);
-                        tmp_edits.insert(e, 0);
+                if(rfoff + rf_i > (int)alt.len) {
+                    if(rf_i > (int)alt.len) {
+                        for(index_t i = 0; i < alt.len; i++) {
+                            int rf_bp = rfseq[rf_i - i];
+                            Edit e(
+                                   rd_i + 1,
+                                   "ACGTN"[rf_bp],
+                                   '-',
+                                   EDIT_TYPE_READ_GAP,
+                                   true, /* chars? */
+                                   alt_range.second);
+                            tmp_edits.insert(e, 0);
+                        }
+                        
+                    } else {
+                        // long deletions
+                        int new_rfoff = rfoff - alt.len;
+                        index_t new_rflen = rf_i + alt.len + 10;
+                        if(raw_refbufs.size() <= dep + 1) raw_refbufs.expand();
+                        SStringExpandable<char>& raw_refbuf = raw_refbufs[dep + 1];
+                        raw_refbuf.resize(new_rflen + 16 + 16);
+                        raw_refbuf.fill(0x4);
+                        int off = ref.getStretch(
+                                                 reinterpret_cast<uint32_t*>(raw_refbuf.wbuf() + 16),
+                                                 tidx,
+                                                 max<int>(new_rfoff, 0),
+                                                 new_rfoff > 0 ? new_rflen : new_rflen + new_rfoff
+                                                 ASSERT_ONLY(, destU32));
+                        assert_lt(off, 16);
+                        const char* new_rfseq = raw_refbuf.wbuf() + 16 + off + min<int>(new_rfoff, 0);
+                        for(int i = 0; i < alt.len; i++) {
+                            int rf_bp = new_rfseq[rf_i - i + alt.len];
+                            Edit e(
+                                   rd_i + 1,
+                                   "ACGTN"[rf_bp],
+                                   '-',
+                                   EDIT_TYPE_READ_GAP,
+                                   true, /* chars? */
+                                   alt_range.second);
+                            tmp_edits.insert(e, 0);
+                        }
                     }
                     rf_i -= (int)alt.len;
                     alt_compatible = true;
@@ -2611,7 +2664,7 @@ index_t GenomeHit<index_t>::alignWithALTs_recur(
                 index_t next_joinedOff = alt.pos;
                 int next_rfoff = rfoff, next_rdoff = rd_i;
                 const char* next_rfseq = rfseq;
-                index_t next_rflen = rf_i + 1, next_rdlen = rd_i + 1;
+                int next_rflen = rf_i + 1, next_rdlen = rd_i + 1;
                 if(alt.splicesite()) {
                     assert_lt(alt.left, alt.right);
                     next_joinedOff = alt.left;
@@ -2621,7 +2674,7 @@ index_t GenomeHit<index_t>::alignWithALTs_recur(
                     next_rfseq = NULL;
                 }
                 if(next_rflen < next_rdlen) {
-                    index_t add_len = next_rdlen + 10 - next_rflen;
+                    int add_len = next_rdlen + 10 - next_rflen;
                     if(next_rfoff < add_len) add_len = next_rfoff;
                     next_rfoff -= add_len;
                     next_rflen += add_len;
@@ -2650,6 +2703,7 @@ index_t GenomeHit<index_t>::alignWithALTs_recur(
                                                          tmp_numNs,
                                                          numNs,
                                                          dep + 1,
+                                                         max_numALTsTried,
                                                          numALTsTried,
                                                          alt.type);
                 if(alignedLen == next_rdlen) return rdlen;
@@ -2770,20 +2824,50 @@ index_t GenomeHit<index_t>::alignWithALTs_recur(
                     alt_compatible = true;
                 }
             } else if(alt.type == ALT_SNP_DEL) {
-                if(rf_i + alt.len <= rflen && rd_i > 0) {
-                    for(index_t i = 0; i < alt.len; i++) {
-                        rf_bp = rfseq[rf_i + i];
-                        Edit e(
-                               rd_i + rdoff_add,
-                               "ACGTN"[rf_bp],
-                               '-',
-                               EDIT_TYPE_READ_GAP,
-                               true, /* chars? */
-                               alt_range.first);
-                        tmp_edits.push_back(e);
+                if(rd_i > 0) {
+                    if(rf_i + alt.len <= rflen) {
+                        for(index_t i = 0; i < alt.len; i++) {
+                            rf_bp = rfseq[rf_i + i];
+                            Edit e(
+                                   rd_i + rdoff_add,
+                                   "ACGTN"[rf_bp],
+                                   '-',
+                                   EDIT_TYPE_READ_GAP,
+                                   true, /* chars? */
+                                   alt_range.first);
+                            tmp_edits.push_back(e);
+                        }
+                        rf_i += alt.len;
+                        alt_compatible = true;
+                    } else {
+                        // long deletions
+                        index_t new_rflen = rf_i + alt.len + 10;
+                        if(raw_refbufs.size() <= dep + 1) raw_refbufs.expand();
+                        SStringExpandable<char>& raw_refbuf = raw_refbufs[dep + 1];
+                        raw_refbuf.resize(new_rflen + 16 + 16);
+                        raw_refbuf.fill(0x4);
+                        int off = ref.getStretch(
+                                                 reinterpret_cast<uint32_t*>(raw_refbuf.wbuf() + 16),
+                                                 tidx,
+                                                 max<int>(rfoff, 0),
+                                                 rfoff > 0 ? new_rflen : new_rflen + rfoff
+                                                 ASSERT_ONLY(, destU32));
+                        assert_lt(off, 16);
+                        const char* new_rfseq = raw_refbuf.wbuf() + 16 + off + min<int>(rfoff, 0);
+                        for(index_t i = 0; i < alt.len; i++) {
+                            rf_bp = new_rfseq[rf_i + i];
+                            Edit e(
+                                   rd_i + rdoff_add,
+                                   "ACGTN"[rf_bp],
+                                   '-',
+                                   EDIT_TYPE_READ_GAP,
+                                   true, /* chars? */
+                                   alt_range.first);
+                            tmp_edits.push_back(e);
+                        }
+                        rf_i += alt.len;
+                        alt_compatible = true;
                     }
-                    rf_i += alt.len;
-                    alt_compatible = true;
                 }
             } else if(alt.type == ALT_SNP_INS) {
                 if(rd_i + alt.len <= rdlen && rf_i > 0) {
@@ -2848,6 +2932,9 @@ index_t GenomeHit<index_t>::alignWithALTs_recur(
                     next_joinedOff = alt.pos + 1;
                 } else if(alt.type == ALT_SNP_DEL) {
                     next_joinedOff = alt.pos + alt.len;
+                    if(rflen <= rf_i) {
+                        next_rflen = 0; // Will reset next_rfseq and next_rflen below
+                    }
                 } else if(alt.type == ALT_SNP_INS) {
                     next_joinedOff = alt.pos;
                 } else if(alt.type == ALT_SPLICESITE) {
@@ -2885,6 +2972,7 @@ index_t GenomeHit<index_t>::alignWithALTs_recur(
                                                          tmp_numNs,
                                                          numNs,
                                                          dep + 1,
+                                                         max_numALTsTried,
                                                          numALTsTried,
                                                          alt.type);
                 if(alignedLen > 0) {
@@ -3339,19 +3427,11 @@ public:
 	 */
 	HI_Aligner(
                const GFM<index_t>& gfm,
-               const TranscriptomePolicy& tpol,
                bool anchorStop = true,
-               size_t minIntronLen = 20,
-               size_t maxIntronLen = 500000,
                bool secondary = false,
                bool local = false,
                uint64_t threads_rids_mindist = 0) :
-    _tpol(tpol),
     _anchorStop(anchorStop),
-    _minIntronLen(minIntronLen),
-    _maxIntronLen(maxIntronLen),
-    _minAnchorLen(7),
-    _minAnchorLen_noncan(14),
     _secondary(secondary),
     _local(local),
     _gwstate(GW_CAT),
@@ -3365,11 +3445,6 @@ public:
             _minK++;
         }
         _minK_local = 8;
-        
-        if(_tpol.transcriptome_assembly()) {
-            _minAnchorLen = 15;
-            _minAnchorLen_noncan = 20;
-        }
     }
     
     HI_Aligner() {
@@ -3431,18 +3506,21 @@ public:
      */
     virtual
     int go(
-           const Scoring&           sc,
-           const GFM<index_t>&      gfm,
-           const ALTDB<index_t>&    altdb,
-           const BitPairReference&  ref,
-           SwAligner&               swa,
-           SpliceSiteDB&            ssdb,
-           WalkMetrics&             wlm,
-           PerReadMetrics&          prm,
-           SwMetrics&               swm,
-           HIMetrics&               him,
-           RandomSource&            rnd,
-           AlnSinkWrap<index_t>&    sink)
+           const Scoring&             sc,
+           const PairedEndPolicy&     pepol, // paired-end policy
+           const TranscriptomePolicy& tpol,
+           const GraphPolicy&         gpol,
+           const GFM<index_t>&        gfm,
+           const ALTDB<index_t>&      altdb,
+           const BitPairReference&    ref,
+           SwAligner&                 swa,
+           SpliceSiteDB&              ssdb,
+           WalkMetrics&               wlm,
+           PerReadMetrics&            prm,
+           SwMetrics&                 swm,
+           HIMetrics&                 him,
+           RandomSource&              rnd,
+           AlnSinkWrap<index_t>&      sink)
     {
         index_t rdi;
         bool fw;
@@ -3450,9 +3528,9 @@ public:
         // given read and its reverse complement
         //  (and mate and the reverse complement of mate in case of pair alignment),
         // pick up one with best partial alignment
-        while(nextBWT(sc, gfm, altdb, ref, rdi, fw, wlm, prm, him, rnd, sink)) {
+        while(nextBWT(sc, pepol, tpol, gpol, gfm, altdb, ref, rdi, fw, wlm, prm, him, rnd, sink)) {
             // given the partial alignment, try to extend it to full alignments
-        	found[rdi] = align(sc, gfm, altdb, ref, swa, ssdb, rdi, fw, wlm, prm, swm, him, rnd, sink);
+        	found[rdi] = align(sc, pepol, tpol, gpol, gfm, altdb, ref, swa, ssdb, rdi, fw, wlm, prm, swm, him, rnd, sink);
             if(!found[0] && !found[1]) {
                 break;
             }
@@ -3460,7 +3538,7 @@ public:
             // try to combine this alignment with some of mate alignments
             // to produce pair alignment
             if(this->_paired) {
-                pairReads(sc, gfm, altdb, ref, wlm, prm, him, rnd, sink);
+                pairReads(sc, pepol, tpol, gpol, gfm, altdb, ref, wlm, prm, him, rnd, sink);
                 // if(sink.bestPair() >= _minsc[0] + _minsc[1]) break;
             }
         }
@@ -3481,6 +3559,9 @@ public:
                         bool fw = (res.orient() == 1);
                         mate_found |= alignMate(
                                                 sc,
+                                                pepol,
+                                                tpol,
+                                                gpol,
                                                 gfm,
                                                 altdb,
                                                 ref,
@@ -3500,7 +3581,7 @@ public:
                 }
                 
                 if(mate_found) {
-                    pairReads(sc, gfm, altdb, ref, wlm, prm, him, rnd, sink);
+                    pairReads(sc, pepol, tpol, gpol, gfm, altdb, ref, wlm, prm, him, rnd, sink);
                 }
             }
         }
@@ -3514,17 +3595,20 @@ public:
      */
     virtual
     bool nextBWT(
-                 const Scoring&          sc,
-                 const GFM<index_t>&     gfm,
-                 const ALTDB<index_t>&   altdb,
-                 const BitPairReference& ref,
-                 index_t&                rdi,
-                 bool&                   fw,
-                 WalkMetrics&            wlm,
-                 PerReadMetrics&         prm,
-                 HIMetrics&              him,
-                 RandomSource&           rnd,
-                 AlnSinkWrap<index_t>&   sink)
+                 const Scoring&             sc,
+                 const PairedEndPolicy&     pepol, // paired-end policy
+                 const TranscriptomePolicy& tpol,
+                 const GraphPolicy&         gpol,
+                 const GFM<index_t>&        gfm,
+                 const ALTDB<index_t>&      altdb,
+                 const BitPairReference&    ref,
+                 index_t&                   rdi,
+                 bool&                      fw,
+                 WalkMetrics&               wlm,
+                 PerReadMetrics&            prm,
+                 HIMetrics&                 him,
+                 RandomSource&              rnd,
+                 AlnSinkWrap<index_t>&      sink)
     {
         // Pick up a candidate from a read or its reverse complement
         // (for pair, also consider mate and its reverse complement)
@@ -3533,7 +3617,7 @@ public:
             index_t fwi = (fw ? 0 : 1);
             ReadBWTHit<index_t>& hit = _hits[rdi][fwi];
             assert(!hit.done());
-            bool pseudogeneStop = gfm.gh().linearFM() && !_tpol.no_spliced_alignment();
+            bool pseudogeneStop = gfm.gh().linearFM() && !tpol.no_spliced_alignment();
             bool anchorStop = _anchorStop;
             if(!_secondary) {
                 index_t numSearched = hit.numActualPartialSearch();
@@ -3625,6 +3709,9 @@ public:
     virtual
     bool align(
                const Scoring&                   sc,
+               const PairedEndPolicy&           pepol, // paired-end policy
+               const TranscriptomePolicy&       tpol,
+               const GraphPolicy&               gpol,
                const GFM<index_t>&              gfm,
                const ALTDB<index_t>&            altdb,
                const BitPairReference&          ref,
@@ -3646,6 +3733,9 @@ public:
     virtual
     bool alignMate(
                    const Scoring&                   sc,
+                   const PairedEndPolicy&           pepol, // paired-end policy
+                   const TranscriptomePolicy&       tpol,
+                   const GraphPolicy&               gpol,
                    const GFM<index_t>&              gfm,
                    const ALTDB<index_t>&            altdb,
                    const BitPairReference&          ref,
@@ -3670,6 +3760,9 @@ public:
     virtual
     void hybridSearch(
                       const Scoring&                     sc,
+                      const PairedEndPolicy&             pepol, // paired-end policy
+                      const TranscriptomePolicy&         tpol,
+                      const GraphPolicy&                 gpol,
                       const GFM<index_t>&                gfm,
                       const ALTDB<index_t>&              altdb,
                       const BitPairReference&            ref,
@@ -3693,6 +3786,9 @@ public:
     virtual
     int64_t hybridSearch_recur(
                                const Scoring&                   sc,
+                               const PairedEndPolicy&           pepol, // paired-end policy
+                               const TranscriptomePolicy&       tpol,
+                               const GraphPolicy&               gpol,
                                const GFM<index_t>&              gfm,
                                const ALTDB<index_t>&            altdb,
                                const BitPairReference&          ref,
@@ -3854,6 +3950,9 @@ public:
      */
     index_t getAnchorHits(
                           const GFM<index_t>&               gfm,
+                          const PairedEndPolicy&            pepol, // paired-end policy
+                          const TranscriptomePolicy&        tpol,
+                          const GraphPolicy&                gpol,
                           const ALTDB<index_t>&             altdb,
                           const BitPairReference&           ref,
                           RandomSource&                     rnd,
@@ -3947,7 +4046,7 @@ public:
                     assert_lt(rdoff, hit._len);
                     index_t hitoff = genomeHit.refoff() + hit._len - genomeHit.rdoff();
                     index_t hitoff2 = (index_t)coord.off() + hit._len - rdoff;
-                    if(abs((int64_t)hitoff - (int64_t)hitoff2) <= (int64_t)_maxIntronLen) {
+                    if(abs((int64_t)hitoff - (int64_t)hitoff2) <= (int64_t)tpol.maxIntronLen()) {
                         overlapped = true;
                         genomeHit._hitcount++;
                         break;
@@ -3963,7 +4062,8 @@ public:
                                                       *_rds[rdi],
                                                       gfm,
                                                       altdb,
-                                                      ref);
+                                                      ref,
+                                                      gpol.maxAltsTried());
                 }
                 if(partialHit._hit_type == CANDIDATE_HIT && genomeHits.size() >= maxGenomeHitSize) break;
             }
@@ -3973,21 +4073,27 @@ public:
     }
     
     bool pairReads(
-                   const Scoring&          sc,
-                   const GFM<index_t>&     gfm,
-                   const ALTDB<index_t>&   altdb,
-                   const BitPairReference& ref,
-                   WalkMetrics&            wlm,
-                   PerReadMetrics&         prm,
-                   HIMetrics&              him,
-                   RandomSource&           rnd,
-                   AlnSinkWrap<index_t>&   sink);
+                   const Scoring&             sc,
+                   const PairedEndPolicy&     pepol, // paired-end policy
+                   const TranscriptomePolicy& tpol,
+                   const GraphPolicy&         gpol,
+                   const GFM<index_t>&        gfm,
+                   const ALTDB<index_t>&      altdb,
+                   const BitPairReference&    ref,
+                   WalkMetrics&               wlm,
+                   PerReadMetrics&            prm,
+                   HIMetrics&                 him,
+                   RandomSource&              rnd,
+                   AlnSinkWrap<index_t>&      sink);
 
     /**
      *
      **/
     bool reportHit(
                    const Scoring&                   sc,
+                   const PairedEndPolicy&           pepol, // paired-end policy
+                   const TranscriptomePolicy&       tpol,
+                   const GraphPolicy&               gpol,
                    const GFM<index_t>&              gfm,
                    const ALTDB<index_t>&            altdb,
                    const BitPairReference&          ref,
@@ -4039,16 +4145,7 @@ protected:
     TAlScore _minsc[2];
     TAlScore _maxpen[2];
     
-    TranscriptomePolicy _tpol;
-    
     bool     _anchorStop;
-    size_t   _minIntronLen;
-    size_t   _maxIntronLen;
-    
-    // Minimum anchor length required for canonical splice sites
-    uint32_t _minAnchorLen;
-    // Minimum anchor length required for non-canonical splice sites
-    uint32_t _minAnchorLen_noncan;
     
     bool     _secondary;  // allow secondary alignments
     bool     _local;      // perform local alignments
@@ -4142,6 +4239,9 @@ protected:
 template <typename index_t, typename local_index_t>
 bool HI_Aligner<index_t, local_index_t>::align(
                                                const Scoring&                   sc,
+                                               const PairedEndPolicy&           pepol, // paired-end policy
+                                               const TranscriptomePolicy&       tpol,
+                                               const GraphPolicy&               gpol,
                                                const GFM<index_t>&              gfm,
                                                const ALTDB<index_t>&            altdb,
                                                const BitPairReference&          ref,
@@ -4163,7 +4263,7 @@ bool HI_Aligner<index_t, local_index_t>::align(
     ReadBWTHit<index_t>& hit = _hits[rdi][fwi];
     assert(hit.done());
     index_t minOff = 0;
-    if(hit.minWidth(minOff) > (index_t)(rp.khits * 2)) return false;
+    if(hit.minWidth(minOff) > (index_t)(rp.kseeds * 2)) return false;
     
     // Don't try to align if the potential alignment for this read might be
     // worse than the best alignment of its reverse complement
@@ -4175,9 +4275,11 @@ bool HI_Aligner<index_t, local_index_t>::align(
     if(!_secondary && numActualPartialSearch > maxmm + num_spliced + 1) return true;
     
     // choose candidate partial alignments for further alignment
-    const index_t maxsize = (index_t)rp.khits;
-    index_t numHits = getAnchorHits(
-                                    gfm,
+    const index_t maxsize = max<index_t>(rp.khits, rp.kseeds);
+    index_t numHits = getAnchorHits(gfm,
+                                    pepol,
+                                    tpol,
+                                    gpol,
                                     altdb,
                                     ref,
                                     rnd,
@@ -4198,8 +4300,10 @@ bool HI_Aligner<index_t, local_index_t>::align(
     max_localindexatts = him.localindexatts + max<uint64_t>(10, add);
     // extend the partial alignments bidirectionally using
     // local search, extension, and (less often) global search
-    hybridSearch(
-                 sc,
+    hybridSearch(sc,
+                 pepol,
+                 tpol,
+                 gpol,
                  gfm,
                  altdb,
                  ref,
@@ -4225,6 +4329,9 @@ bool HI_Aligner<index_t, local_index_t>::align(
 template <typename index_t, typename local_index_t>
 bool HI_Aligner<index_t, local_index_t>::alignMate(
                                                    const Scoring&                   sc,
+                                                   const PairedEndPolicy&           pepol, // paired-end policy
+                                                   const TranscriptomePolicy&       tpol,
+                                                   const GraphPolicy&               gpol,
                                                    const GFM<index_t>&              gfm,
                                                    const ALTDB<index_t>&            altdb,
                                                    const BitPairReference&          ref,
@@ -4329,7 +4436,8 @@ bool HI_Aligner<index_t, local_index_t>::alignMate(
                                                       *this->_rds[ordi],
                                                       gfm,
                                                       altdb,
-                                                      ref);
+                                                      ref,
+                                                      gpol.maxAltsTried());
                 }
                 max_hitlen = hitlen;
             }
@@ -4367,14 +4475,18 @@ bool HI_Aligner<index_t, local_index_t>::alignMate(
                          _minsc[ordi],
                          rnd,
                          (index_t)_minK_local,
-                         (index_t)_minIntronLen,
-                         (index_t)_maxIntronLen,
-                         _minAnchorLen,
-                         _minAnchorLen_noncan,
+                         (index_t)tpol.minIntronLen(),
+                         (index_t)tpol.maxIntronLen(),
+                         tpol.minAnchorLen(),
+                         tpol.minAnchorLen_noncan(),
+                         gpol.maxAltsTried(),
                          leftext,
                          rightext);
         hybridSearch_recur(
                            sc,
+                           pepol,
+                           tpol,
+                           gpol,
                            gfm,
                            altdb,
                            ref,
@@ -4573,15 +4685,18 @@ bool HI_Aligner<index_t, local_index_t>::getGenomeCoords_local(
  **/
 template <typename index_t, typename local_index_t>
 bool HI_Aligner<index_t, local_index_t>::pairReads(
-                                                   const Scoring&          sc,
-                                                   const GFM<index_t>&     gfm,
-                                                   const ALTDB<index_t>&   altdb,
-                                                   const BitPairReference& ref,
-                                                   WalkMetrics&            wlm,
-                                                   PerReadMetrics&         prm,
-                                                   HIMetrics&              him,
-                                                   RandomSource&           rnd,
-                                                   AlnSinkWrap<index_t>&   sink)
+                                                   const Scoring&             sc,
+                                                   const PairedEndPolicy&     pepol, // paired-end policy
+                                                   const TranscriptomePolicy& tpol,
+                                                   const GraphPolicy&         gpol,
+                                                   const GFM<index_t>&        gfm,
+                                                   const ALTDB<index_t>&      altdb,
+                                                   const BitPairReference&    ref,
+                                                   WalkMetrics&               wlm,
+                                                   PerReadMetrics&            prm,
+                                                   HIMetrics&                 him,
+                                                   RandomSource&              rnd,
+                                                   AlnSinkWrap<index_t>&      sink)
 {
     assert(_paired);
     const EList<AlnRes> *rs1 = NULL, *rs2 = NULL;
@@ -4617,14 +4732,43 @@ bool HI_Aligner<index_t, local_index_t>::pairReads(
             }
             if(left.off() > left2.off()) continue;
             if(right.off() > right2.off()) continue;
-            if(right.off() + (int)_maxIntronLen < left2.off()) continue;
+            if(right.off() + (int)tpol.maxIntronLen() < left2.off()) continue;
             assert_geq(r1.score().score(), _minsc[0]);
             assert_geq(r2.score().score(), _minsc[1]);
-            if(r1.score().score() + r2.score().score() >= sink.bestPair() || _secondary) {
-                sink.report(0, &r1, &r2);
-                _concordantPairs.expand();
-                _concordantPairs.back().first = i;
-                _concordantPairs.back().second = j;
+            bool dna_frag_pass = true;
+            if(tpol.no_spliced_alignment()) {
+                int pairCl = PE_ALS_DISCORD;
+                assert_eq(r1.refid(), r2.refid());
+                index_t off1, off2, len1, len2;
+                bool fw1, fw2;
+                if(r1.refoff() < r2.refoff()) {
+                    off1 = r1.refoff(); off2 = r2.refoff();
+                    len1 = r1.refExtent(); len2 = r2.refExtent();
+                    fw1 = r1.fw(); fw2 = r2.fw();
+                } else {
+                    off1 = r2.refoff(); off2 = r1.refoff();
+                    len1 = r2.refExtent(); len2 = r1.refExtent();
+                    fw1 = r2.fw(); fw2 = r1.fw();
+                }
+                // Check that final mate alignments are consistent with
+                // paired-end fragment constraints
+                pairCl = pepol.peClassifyPair(
+                                              off1,
+                                              len1,
+                                              fw1,
+                                              off2,
+                                              len2,
+                                              fw2);
+                dna_frag_pass = (pairCl != PE_ALS_DISCORD);
+            }
+        
+            if(!tpol.no_spliced_alignment() || dna_frag_pass) {
+                if(r1.score().score() + r2.score().score() >= sink.bestPair() || _secondary) {
+                    sink.report(0, &r1, &r2);
+                    _concordantPairs.expand();
+                    _concordantPairs.back().first = i;
+                    _concordantPairs.back().second = j;
+                }
             }
         }
     }
@@ -4638,6 +4782,9 @@ bool HI_Aligner<index_t, local_index_t>::pairReads(
 template <typename index_t, typename local_index_t>
 bool HI_Aligner<index_t, local_index_t>::reportHit(
                                                    const Scoring&                   sc,
+                                                   const PairedEndPolicy&           pepol, // paired-end policy
+                                                   const TranscriptomePolicy&       tpol,
+                                                   const GraphPolicy&               gpol,
                                                    const GFM<index_t>&              gfm,
                                                    const ALTDB<index_t>&            altdb,
                                                    const BitPairReference&          ref,
@@ -4667,30 +4814,32 @@ bool HI_Aligner<index_t, local_index_t>::reportHit(
     // in case of multiple exonic alignments, choose the ones near (known) splice sites
     // this helps eliminate cases of reads being mapped to pseudogenes
     pair<bool, bool> spliced = hit.spliced(); // pair<spliced, spliced_to_known>
-    if(this->_tpol.xs_only() && spliced.first) {
+    if(tpol.xs_only() && spliced.first) {
         if(hit.splicing_dir() == SPL_UNKNOWN)
             return false;
     }
-    if(!this->_tpol.no_spliced_alignment()) {
+    if(tpol.no_spliced_alignment()) {
         if(!spliced.first) {
             assert(!spliced.second);
             const index_t max_exon_size = 10000;
-            index_t left1 = 0, right1 = hit.refoff();
-            if(right1 > max_exon_size) left1 = right1 - max_exon_size;
-            index_t left2 = hit.refoff() + hit.len() - 1, right2 = left2 + max_exon_size;
+            index_t left = 0;
+            if(hit.refoff() > max_exon_size) {
+                left = hit.refoff() - max_exon_size;
+            }
+            index_t right = hit.refoff() + hit.len() + max_exon_size;
             spliced.first = ssdb.hasSpliceSites(
                                                 hit.ref(),
-                                                left1,
-                                                right1,
-                                                left2,
-                                                right2,
+                                                left,
+                                                right,
+                                                left,
+                                                right,
                                                 true); // include novel splice sites
             if(altdb.hasExons()) {
                 spliced.second = ssdb.insideExon(hit.ref(), hit.refoff(), hit.refoff() + hit.len() - 1);
             }
         }
     }
-    if(this->_tpol.transcriptome_mapping_only() && !spliced.second)
+    if(tpol.transcriptome_mapping_only() && !spliced.second)
         return false;
     
     AlnScore asc(
@@ -5028,7 +5177,7 @@ index_t HI_Aligner<index_t, local_index_t>::partialSearch(
                 if(linearFM) {
                     rangeTemp = gfm.mapLF(tloc, bloc, c, &node_rangeTemp);
                 } else {
-                    rangeTemp = gfm.mapGLF(tloc, bloc, c, &node_rangeTemp, &_tmp_node_iedge_count, (index_t)rp.khits);
+                    rangeTemp = gfm.mapGLF(tloc, bloc, c, &node_rangeTemp, &_tmp_node_iedge_count, (index_t)rp.kseeds);
                 }
             } else {
                 bwops_++;
@@ -5238,7 +5387,7 @@ index_t HI_Aligner<index_t, local_index_t>::globalGFMSearch(
                 if(linearFM) {
                     rangeTemp = gfm.mapLF(tloc, bloc, c, &node_rangeTemp);
                 } else {
-                    rangeTemp = gfm.mapGLF(tloc, bloc, c, &node_rangeTemp, &_tmp_node_iedge_count, (index_t)rp.khits);
+                    rangeTemp = gfm.mapGLF(tloc, bloc, c, &node_rangeTemp, &_tmp_node_iedge_count, (index_t)rp.kseeds);
                 }
             } else {
                 bwops_++;
@@ -5277,7 +5426,7 @@ index_t HI_Aligner<index_t, local_index_t>::globalGFMSearch(
     }
     
     // Done
-    if(node_range.first < node_range.second && node_range.second - node_range.first <= rp.khits) {
+    if(node_range.first < node_range.second && node_range.second - node_range.first <= rp.kseeds) {
         assert_leq(node_range.second - node_range.first, range.second - range.first);
 #ifndef NDEBUG
         if(node_range.second - node_range.first < range.second - range.first) {
@@ -5382,7 +5531,7 @@ index_t HI_Aligner<index_t, local_index_t>::localGFMSearch(
                 if(linearFM) {
                     rangeTemp = gfm.mapLF(tloc, bloc, c, &node_rangeTemp);
                 } else {
-                    rangeTemp = gfm.mapGLF(tloc, bloc, c, &node_rangeTemp, &_tmp_local_node_iedge_count, rp.khits);
+                    rangeTemp = gfm.mapGLF(tloc, bloc, c, &node_rangeTemp, &_tmp_local_node_iedge_count, rp.kseeds);
                 }
             } else {
                 bwops_++;
@@ -5422,7 +5571,7 @@ index_t HI_Aligner<index_t, local_index_t>::localGFMSearch(
     }
     
     // Done
-    if(node_range.first < node_range.second && node_range.second - node_range.first <= rp.khits) {
+    if(node_range.first < node_range.second && node_range.second - node_range.first <= rp.kseeds) {
         assert_leq(node_range.second - node_range.first, range.second - range.first);
 #ifndef NDEBUG
         if(node_range.second - node_range.first < range.second - range.first) {
diff --git a/hisat2.cpp b/hisat2.cpp
index de3b37f..b87f063 100644
--- a/hisat2.cpp
+++ b/hisat2.cpp
@@ -50,6 +50,7 @@
 #include "util.h"
 #include "pe.h"
 #include "tp.h"
+#include "gp.h"
 #include "simple_func.h"
 #include "presets.h"
 #include "opts.h"
@@ -233,6 +234,7 @@ static bool doExactUpFront;   // do exact search up front if seeds seem good eno
 static bool do1mmUpFront;     // do 1mm search up front if seeds seem good enough
 static size_t do1mmMinLen;    // length below which we disable 1mm e2e search
 static int seedBoostThresh;   // if average non-zero position has more than this many elements
+static size_t maxSeeds;       // maximum number of seeds allowed
 static size_t nSeedRounds;    // # seed rounds
 static bool reorder;          // true -> reorder SAM recs in -p mode
 static float sampleFrac;      // only align random fraction of input reads
@@ -275,6 +277,8 @@ static uint64_t        thread_rids_mindist;
 static bool rmChrName;  // remove "chr" from reference names (e.g., chr18 to 18)
 static bool addChrName; // add "chr" to reference names (e.g., 18 to chr18)
 
+static size_t max_alts_tried;
+
 #define DMAX std::numeric_limits<double>::max()
 
 static void resetOptions() {
@@ -458,6 +462,7 @@ static void resetOptions() {
 	do1mmUpFront = true;     // do 1mm search up front if seeds seem good enough
 	seedBoostThresh = 300;   // if average non-zero position has more than this many elements
 	nSeedRounds = 2;         // # rounds of seed searches to do for repetitive reads
+    maxSeeds = 0;            // maximum number of seeds allowed
 	do1mmMinLen = 60;        // length below which we disable 1mm search
 	reorder = false;         // reorder SAM records with -p > 1
 	sampleFrac = 1.1f;       // align all reads
@@ -490,6 +495,8 @@ static void resetOptions() {
     
     rmChrName = false;
     addChrName = false;
+    
+    max_alts_tried = 16;
 }
 
 static const char *short_options = "fF:qbzhcu:rv:s:aP:t3:5:w:p:k:M:1:2:I:X:CQ:N:i:L:U:x:S:g:O:D:R:";
@@ -605,6 +612,7 @@ static struct option long_options[] = {
 	{(char*)"ma",               required_argument, 0,        ARG_SCORE_MA},
 	{(char*)"mp",               required_argument, 0,        ARG_SCORE_MMP},
     {(char*)"sp",               required_argument, 0,        ARG_SCORE_SCP},
+    {(char*)"no-softclip",      no_argument,       0,        ARG_NO_SOFTCLIP},
 	{(char*)"np",               required_argument, 0,        ARG_SCORE_NP},
 	{(char*)"rdg",              required_argument, 0,        ARG_SCORE_RDG},
 	{(char*)"rfg",              required_argument, 0,        ARG_SCORE_RFG},
@@ -646,6 +654,7 @@ static struct option long_options[] = {
 	{(char*)"1mm-minlen",       required_argument, 0,        ARG_1MM_MINLEN},
 	{(char*)"seed-off",         required_argument, 0,        'O'},
 	{(char*)"seed-boost",       required_argument, 0,        ARG_SEED_BOOST_THRESH},
+    {(char*)"max-seeds",        required_argument, 0,        ARG_MAX_SEEDS},
 	{(char*)"read-times",       no_argument,       0,        ARG_READ_TIMES},
 	{(char*)"show-rand-seed",   no_argument,       0,        ARG_SHOW_RAND_SEED},
 	{(char*)"dp-fail-streak",   required_argument, 0,        ARG_DP_FAIL_STREAK_THRESH},
@@ -675,33 +684,34 @@ static struct option long_options[] = {
 	{(char*)"desc-landing",     required_argument, 0,        ARG_DESC_LANDING},
 	{(char*)"desc-exp",         required_argument, 0,        ARG_DESC_EXP},
 	{(char*)"desc-fmops",       required_argument, 0,        ARG_DESC_FMOPS},
-    {(char*)"no-temp-splicesite",  no_argument, 0,     ARG_NO_TEMPSPLICESITE},
-    {(char*)"pen-cansplice",  required_argument, 0,        ARG_PEN_CANSPLICE},
-    {(char*)"pen-noncansplice",  required_argument, 0,     ARG_PEN_NONCANSPLICE},
+    {(char*)"no-temp-splicesite",  no_argument,    0,        ARG_NO_TEMPSPLICESITE},
+    {(char*)"pen-cansplice",  required_argument,   0,        ARG_PEN_CANSPLICE},
+    {(char*)"pen-noncansplice",  required_argument, 0,       ARG_PEN_NONCANSPLICE},
     {(char*)"pen-conflictsplice",  required_argument, 0,     ARG_PEN_CONFLICTSPLICE},
-    {(char*)"pen-intronlen",  required_argument, 0,     ARG_PEN_CANINTRONLEN},
-    {(char*)"pen-canintronlen",  required_argument, 0,     ARG_PEN_CANINTRONLEN},
-    {(char*)"pen-noncanintronlen",  required_argument, 0,     ARG_PEN_NONCANINTRONLEN},
-    {(char*)"min-intronlen",  required_argument, 0,     ARG_MIN_INTRONLEN},
-    {(char*)"max-intronlen",  required_argument, 0,     ARG_MAX_INTRONLEN},
+    {(char*)"pen-intronlen",  required_argument,   0,        ARG_PEN_CANINTRONLEN},
+    {(char*)"pen-canintronlen",  required_argument, 0,       ARG_PEN_CANINTRONLEN},
+    {(char*)"pen-noncanintronlen",  required_argument, 0,    ARG_PEN_NONCANINTRONLEN},
+    {(char*)"min-intronlen",  required_argument,   0,        ARG_MIN_INTRONLEN},
+    {(char*)"max-intronlen",  required_argument,   0,        ARG_MAX_INTRONLEN},
     {(char*)"known-splicesite-infile",       required_argument, 0,        ARG_KNOWN_SPLICESITE_INFILE},
     {(char*)"novel-splicesite-infile",       required_argument, 0,        ARG_NOVEL_SPLICESITE_INFILE},
     {(char*)"novel-splicesite-outfile",      required_argument, 0,        ARG_NOVEL_SPLICESITE_OUTFILE},
-    {(char*)"secondary",   no_argument, 0,        ARG_SECONDARY},
+    {(char*)"secondary",        no_argument,       0,        ARG_SECONDARY},
     {(char*)"no-spliced-alignment",   no_argument, 0,        ARG_NO_SPLICED_ALIGNMENT},
     {(char*)"rna-strandness",   required_argument, 0,        ARG_RNA_STRANDNESS},
-    {(char*)"splicesite-db-only",   no_argument, 0,        ARG_SPLICESITE_DB_ONLY},
-    {(char*)"no-anchorstop",   no_argument, 0,        ARG_NO_ANCHORSTOP},
-    {(char*)"transcriptome-mapping-only",   no_argument, 0,        ARG_TRANSCRIPTOME_MAPPING_ONLY},
-    {(char*)"tmo",   no_argument, 0,        ARG_TRANSCRIPTOME_MAPPING_ONLY},
+    {(char*)"splicesite-db-only",   no_argument,   0,        ARG_SPLICESITE_DB_ONLY},
+    {(char*)"no-anchorstop",   no_argument,        0,        ARG_NO_ANCHORSTOP},
+    {(char*)"transcriptome-mapping-only", no_argument, 0,    ARG_TRANSCRIPTOME_MAPPING_ONLY},
+    {(char*)"tmo",             no_argument,        0,        ARG_TRANSCRIPTOME_MAPPING_ONLY},
     {(char*)"downstream-transcriptome-assembly",   no_argument, 0,        ARG_TRANSCRIPTOME_ASSEMBLY},
-    {(char*)"dta",   no_argument, 0,        ARG_TRANSCRIPTOME_ASSEMBLY},
-    {(char*)"dta-cufflinks",   no_argument, 0,        ARG_TRANSCRIPTOME_ASSEMBLY_CUFFLINKS},
+    {(char*)"dta",             no_argument,        0,        ARG_TRANSCRIPTOME_ASSEMBLY},
+    {(char*)"dta-cufflinks",   no_argument,        0,        ARG_TRANSCRIPTOME_ASSEMBLY_CUFFLINKS},
 #ifdef USE_SRA
-    {(char*)"sra-acc",   required_argument, 0,        ARG_SRA_ACC},
+    {(char*)"sra-acc",         required_argument,  0,        ARG_SRA_ACC},
 #endif
-    {(char*)"remove-chrname",   no_argument, 0,        ARG_REMOVE_CHRNAME},
-    {(char*)"add-chrname",   no_argument, 0,        ARG_ADD_CHRNAME},
+    {(char*)"remove-chrname",  no_argument,        0,        ARG_REMOVE_CHRNAME},
+    {(char*)"add-chrname",     no_argument,        0,        ARG_ADD_CHRNAME},
+    {(char*)"max-altstried",   required_argument,  0,        ARG_MAX_ALTSTRIED},
 	{(char*)0, 0, 0, 0} // terminator
 };
 
@@ -847,6 +857,7 @@ static void printUsage(ostream& out) {
 		<< "  --ma <int>         match bonus (0 for --end-to-end, 2 for --local) " << endl
 		<< "  --mp <int>,<int>   max and min penalties for mismatch; lower qual = lower penalty <2,6>" << endl
         << "  --sp <int>,<int>   max and min penalties for soft-clipping; lower qual = lower penalty <1,2>" << endl
+        << "  --no-softclip      no soft-clipping" << endl
 		<< "  --np <int>         penalty for non-A/C/G/Ts in read/ref (1)" << endl
 		<< "  --rdg <int>,<int>  read gap open, extend penalties (5,3)" << endl
 		<< "  --rfg <int>,<int>  reference gap open, extend penalties (5,3)" << endl
@@ -854,29 +865,18 @@ static void printUsage(ostream& out) {
 		<< "                     (L,0.0,-0.2)" << endl
 		<< endl
 	    << " Reporting:" << endl
-	    << "  (default)          look for multiple alignments, report best, with MAPQ" << endl
-		<< "   OR" << endl
-	    << "  -k <int>           report up to <int> alns per read; MAPQ not meaningful" << endl
-		<< "   OR" << endl
-	    << "  -a/--all           report all alignments; very slow, MAPQ not meaningful" << endl
+	    << "  -k <int> (default: 5) report up to <int> alns per read" << endl
 		<< endl
 	    //<< " Effort:" << endl
 	    //<< "  -D <int>           give up extending after <int> failed extends in a row (15)" << endl
 	    //<< "  -R <int>           for reads w/ repetitive seeds, try <int> sets of seeds (2)" << endl
 		//<< endl
 		<< " Paired-end:" << endl
-#if 0
-	    << "  -I/--minins <int>  minimum fragment length (0)" << endl
-	    << "  -X/--maxins <int>  maximum fragment length (500)" << endl
-#endif
+	    << "  -I/--minins <int>  minimum fragment length (0), only valid with --no-spliced-alignment" << endl
+	    << "  -X/--maxins <int>  maximum fragment length (500), only valid with --no-spliced-alignment" << endl
 	    << "  --fr/--rf/--ff     -1, -2 mates align fw/rev, rev/fw, fw/fw (--fr)" << endl
 		<< "  --no-mixed         suppress unpaired alignments for paired reads" << endl
 		<< "  --no-discordant    suppress discordant alignments for paired reads" << endl
-#if 0
-		<< "  --no-dovetail      not concordant when mates extend past each other" << endl
-		<< "  --no-contain       not concordant when one mate alignment contains other" << endl
-		<< "  --no-overlap       not concordant when mates overlap at all" << endl
-#endif
 		<< endl
 	    << " Output:" << endl;
 	//if(wrapper == "basic-0") {
@@ -910,10 +910,10 @@ static void printUsage(ostream& out) {
 	    << "  -p/--threads <int> number of alignment threads to launch (1)" << endl
 	    << "  --reorder          force SAM output order to match order of input reads" << endl
 #ifdef BOWTIE_MM
-	    << "  --mm               use memory-mapped I/O for index; many 'bowtie's can share" << endl
+	    << "  --mm               use memory-mapped I/O for index; many 'hisat2's can share" << endl
 #endif
 #ifdef BOWTIE_SHARED_MEM
-		//<< "  --shmem            use shared mem for index; many 'bowtie's can share" << endl
+		//<< "  --shmem            use shared mem for index; many 'hisat2's can share" << endl
 #endif
 		<< endl
 	    << " Other:" << endl
@@ -1218,6 +1218,10 @@ static void parseOption(int next_option, const char *arg) {
 			maxUg = parse<size_t>(arg);
 			break;
 		}
+        case ARG_MAX_SEEDS: {
+            maxSeeds = parse<size_t>(arg);
+            break;
+        }
 		case ARG_SEED_BOOST_THRESH: {
 			seedBoostThresh = parse<int>(arg);
 			break;
@@ -1345,12 +1349,12 @@ static void parseOption(int next_option, const char *arg) {
 		case ARG_SSE8_NO: enable8 = false; break;
 		case ARG_UNGAPPED: doUngapped = true; break;
 		case ARG_UNGAPPED_NO: doUngapped = false; break;
-		case ARG_NO_DOVETAIL: gDovetailMatesOK = false; break;
-		case ARG_NO_CONTAIN:  gContainMatesOK  = false; break;
-		case ARG_NO_OVERLAP:  gOlapMatesOK     = false; break;
-		case ARG_DOVETAIL:    gDovetailMatesOK = true;  break;
-		case ARG_CONTAIN:     gContainMatesOK  = true;  break;
-		case ARG_OVERLAP:     gOlapMatesOK     = true;  break;
+		// case ARG_NO_DOVETAIL: gDovetailMatesOK = false; break;
+		// case ARG_NO_CONTAIN:  gContainMatesOK  = false; break;
+		// case ARG_NO_OVERLAP:  gOlapMatesOK     = false; break;
+		// case ARG_DOVETAIL:    gDovetailMatesOK = true;  break;
+		// case ARG_CONTAIN:     gContainMatesOK  = true;  break;
+		// case ARG_OVERLAP:     gOlapMatesOK     = true;  break;
 		case ARG_QC_FILTER: qcFilter = true; break;
 		case ARG_NO_SCORE_PRIORITY: sortByScore = false; break;
 		case ARG_IGNORE_QUALS: ignoreQuals = true; break;
@@ -1504,6 +1508,15 @@ static void parseOption(int next_option, const char *arg) {
             }
             break;
         }
+        case ARG_NO_SOFTCLIP: {
+            ostringstream convert;
+            convert << std::numeric_limits<typeof(penScMax)>::max();
+            polstr += ";SCP=Q,";
+            polstr += convert.str();
+            polstr += ",";
+            polstr += convert.str();
+            break;
+        }
 		case ARG_SCORE_NP:  polstr += ";NP=C";   polstr += arg; break;
 		case ARG_SCORE_RDG: polstr += ";RDG=";   polstr += arg; break;
 		case ARG_SCORE_RFG: polstr += ";RFG=";   polstr += arg; break;
@@ -1648,6 +1661,10 @@ static void parseOption(int next_option, const char *arg) {
             addChrName = true;
             break;
         }
+        case ARG_MAX_ALTSTRIED: {
+            max_alts_tried = parseInt(8, "--max-altstried arg must be at least 8", arg);
+            break;
+        }
 		default:
 			printUsage(cerr);
 			throw 1;
@@ -1851,6 +1868,7 @@ static OutFileBuf*                       multiseed_metricsOfb;
 static SpliceSiteDB*                     ssdb;
 static ALTDB<index_t>*                   altdb;
 static TranscriptomePolicy*              multiseed_tpol;
+static GraphPolicy*                      gpol;
 
 /**
  * Metrics for measuring the work done by the outer read alignment
@@ -3001,8 +3019,12 @@ static void multiseedSearchWorker_hisat2(void *vp) {
 	auto_ptr<PatternSourcePerThread> ps(patsrcFact->create());
 	
     // Instantiate an object for holding reporting-related parameters.
+    if(maxSeeds == 0) {
+        maxSeeds = khits;
+    }
     ReportingParams rp(
                        (allHits ? std::numeric_limits<THitInt>::max() : khits), // -k
+                       (allHits ? std::numeric_limits<THitInt>::max() : maxSeeds), // --max-seeds
                        mhits,             // -m/-M
                        0,                 // penalty gap (not used now)
                        msample,           // true -> -M was specified, otherwise assume -m
@@ -3024,10 +3046,7 @@ static void multiseedSearchWorker_hisat2(void *vp) {
     
     SplicedAligner<index_t, local_index_t> splicedAligner(
                                                           gfm,
-                                                          *multiseed_tpol,
                                                           anchorStop,
-                                                          minIntronLen,
-                                                          maxIntronLen,
                                                           secondary,
                                                           localAlign,
                                                           thread_rids_mindist);
@@ -3363,7 +3382,7 @@ static void multiseedSearchWorker_hisat2(void *vp) {
                     splicedAligner.initRead(rds[1], nofw[1], norc[1], minsc[1], maxpen[1], true);
                 }
                 if(filt[0] || filt[1]) {
-                    int ret = splicedAligner.go(sc, gfm, *altdb, ref, sw, *ssdb, wlm, prm, swmSeed, him, rnd, msinkwrap);
+                    int ret = splicedAligner.go(sc, pepol, *multiseed_tpol, *gpol, gfm, *altdb, ref, sw, *ssdb, wlm, prm, swmSeed, him, rnd, msinkwrap);
                     MERGE_SW(sw);
                     // daehwan
                     size_t mate = 0;
@@ -3461,19 +3480,21 @@ static void multiseedSearchWorker_hisat2(void *vp) {
  * enters the search loop.
  */
 static void multiseedSearch(
-	Scoring& sc,
-    TranscriptomePolicy& tpol,
-	PairedPatternSource& patsrc,  // pattern source
-	AlnSink<index_t>& msink,      // hit sink
-	HGFM<index_t>& gfm,           // index of original text
-    BitPairReference* refs,
-	OutFileBuf *metricsOfb)
+                            Scoring& sc,
+                            TranscriptomePolicy& tpol,
+                            GraphPolicy& gp,
+                            PairedPatternSource& patsrc,  // pattern source
+                            AlnSink<index_t>& msink,      // hit sink
+                            HGFM<index_t>& gfm,           // index of original text
+                            BitPairReference* refs,
+                            OutFileBuf *metricsOfb)
 {
     multiseed_patsrc       = &patsrc;
 	multiseed_msink        = &msink;
 	multiseed_gfm          = &gfm;
 	multiseed_sc           = ≻
     multiseed_tpol         = &tpol;
+    gpol                   = &gp;
 	multiseed_metricsOfb   = metricsOfb;
 	multiseed_refs = refs;
 	AutoArray<tthread::thread*> threads(nthreads);
@@ -3768,11 +3789,17 @@ static void driver(
         if(!refs->loaded()) throw 1;
         
         bool xsOnly = (tranAssm_program == "cufflinks");
-        TranscriptomePolicy tpol(no_spliced_alignment,
+        TranscriptomePolicy tpol(minIntronLen,
+                                 maxIntronLen,
+                                 tranAssm ? 15 : 7,
+                                 tranAssm ? 20 : 14,
+                                 no_spliced_alignment,
                                  tranMapOnly,
                                  tranAssm,
                                  xsOnly);
         
+        GraphPolicy gpol(max_alts_tried);
+        
         init_junction_prob();
         bool write = novelSpliceSiteOutfile != "" || useTempSpliceSite;
         bool read = knownSpliceSiteInfile != "" || novelSpliceSiteInfile != "" || useTempSpliceSite || altdb->hasSpliceSites();
@@ -3832,13 +3859,14 @@ static void driver(
 		assert(patsrc != NULL);
 		assert(mssink != NULL);
 		multiseedSearch(
-			sc,      // scoring scheme
-            tpol,
-			*patsrc, // pattern source
-			*mssink, // hit sink
-			gfm,     // BWT
-            refs.get(),
-			metricsOfb);
+                        sc,      // scoring scheme
+                        tpol,
+                        gpol,
+                        *patsrc, // pattern source
+                        *mssink, // hit sink
+                        gfm,     // BWT
+                        refs.get(),
+                        metricsOfb);
 		// Evict any loaded indexes from memory
 		if(gfm.isInMemory()) {
 			gfm.evictFromMemory();
diff --git a/hisat2_build_genotype_genome.py b/hisat2_build_genotype_genome.py
index 67d55a1..4ca9b27 100755
--- a/hisat2_build_genotype_genome.py
+++ b/hisat2_build_genotype_genome.py
@@ -145,9 +145,11 @@ def compare_vars(a, b):
 """
 def build_genotype_genome(reference,
                           base_fname,                          
+                          partial,
                           inter_gap,
                           intra_gap,
                           threads,
+                          use_clinvar,
                           verbose):    
     # Current script directory
     curr_script = os.path.realpath(inspect.getsourcefile(build_genotype_genome))
@@ -172,39 +174,42 @@ def build_genotype_genome(reference,
     # Load genomic sequences
     chr_dic, chr_names, chr_full_names = read_genome(open(reference))
 
-    # Extract variants from the ClinVar database
-    CLINVAR_fnames = ["clinvar.vcf.gz",
-                      "clinvar.snp",
-                      "clinvar.haplotype",
-                      "clinvar.clnsig"]
+    if use_clinvar:
+        # Extract variants from the ClinVar database
+        CLINVAR_fnames = ["clinvar.vcf.gz",
+                          "clinvar.snp",
+                          "clinvar.haplotype",
+                          "clinvar.clnsig"]
 
-    if not check_files(CLINVAR_fnames):
-        if not os.path.exists("clinvar.vcf.gz"):
-            os.system("wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz")
-        assert os.path.exists("clinvar.vcf.gz")
-            
-        extract_clinvar_script = os.path.join(ex_path, "hisat2_extract_snps_haplotypes_VCF.py")
-        extract_cmd = [extract_clinvar_script]
-        extract_cmd += ["--inter-gap", str(inter_gap),
-                        "--intra-gap", str(intra_gap),
-                        "--genotype-vcf", "clinvar.vcf.gz",
-                        reference, "/dev/null", "clinvar"]
-        if verbose:
-            print >> sys.stderr, "\tRunning:", ' '.join(extract_cmd)
-        proc = subprocess.Popen(extract_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
-        proc.communicate()
         if not check_files(CLINVAR_fnames):
-            print >> sys.stderr, "Error: extract variants from clinvar failed!"
-            sys.exit(1)
-
-    # Read variants to be genotyped
-    genotype_vars = read_variants("clinvar.snp")
-
-    # Read haplotypes
-    genotype_haplotypes = read_haplotypes("clinvar.haplotype")
-
-    # Read information about clinical significance
-    genotype_clnsig = read_clnsig("clinvar.clnsig")
+            if not os.path.exists("clinvar.vcf.gz"):
+                os.system("wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz")
+            assert os.path.exists("clinvar.vcf.gz")
+
+            extract_clinvar_script = os.path.join(ex_path, "hisat2_extract_snps_haplotypes_VCF.py")
+            extract_cmd = [extract_clinvar_script]
+            extract_cmd += ["--inter-gap", str(inter_gap),
+                            "--intra-gap", str(intra_gap),
+                            "--genotype-vcf", "clinvar.vcf.gz",
+                            reference, "/dev/null", "clinvar"]
+            if verbose:
+                print >> sys.stderr, "\tRunning:", ' '.join(extract_cmd)
+            proc = subprocess.Popen(extract_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
+            proc.communicate()
+            if not check_files(CLINVAR_fnames):
+                print >> sys.stderr, "Error: extract variants from clinvar failed!"
+                sys.exit(1)
+
+        # Read variants to be genotyped
+        genotype_vars = read_variants("clinvar.snp")
+
+        # Read haplotypes
+        genotype_haplotypes = read_haplotypes("clinvar.haplotype")
+
+        # Read information about clinical significance
+        genotype_clnsig = read_clnsig("clinvar.clnsig")
+    else:
+        genotype_vars, genotype_haplotypes, genotype_clnsig = {}, {}, {}
 
     # Genes to be genotyped
     genotype_genes = {}
@@ -223,8 +228,8 @@ def build_genotype_genome(reference,
     if not check_files(HLA_fnames):
         extract_hla_script = os.path.join(ex_path, "hisat2_extract_HLA_vars.py")
         extract_cmd = [extract_hla_script]
-        # if partial:
-        #    extract_cmd += ["--partial"]
+        if partial:
+            extract_cmd += ["--partial"]
         extract_cmd += ["--inter-gap", str(inter_gap),
                         "--intra-gap", str(intra_gap)]
         if verbose:
@@ -371,8 +376,8 @@ def build_genotype_genome(reference,
                 (family.upper(), name, chr, len(out_chr_seq), len(out_chr_seq) + length - 1)
 
             # Output coord (genotype_genome.coord)
-            print >> coord_out_file, "%d\t%d\t%d" % \
-                (len(out_chr_seq), left, right - left + 1)
+            print >> coord_out_file, "%s\t%d\t%d\t%d" % \
+                (chr, len(out_chr_seq), left, right - left + 1)
             out_chr_seq += allele_seq
 
             # Output variants (genotype_genome.snp)
@@ -424,8 +429,8 @@ def build_genotype_genome(reference,
         # Write the rest of the Vars
         chr_genotype_vari, chr_genotype_hti, haplotype_num = add_vars(10000000000, 10000000000, chr_genotype_vari, chr_genotype_hti, haplotype_num)            
             
-        print >> coord_out_file, "%d\t%d\t%d" % \
-            (len(out_chr_seq), prev_right, len(chr_seq) - prev_right)
+        print >> coord_out_file, "%s\t%d\t%d\t%d" % \
+            (chr, len(out_chr_seq), prev_right, len(chr_seq) - prev_right)
         out_chr_seq += chr_seq[prev_right:]
 
         assert len(out_chr_seq) == len(chr_seq) + off
@@ -476,6 +481,10 @@ if __name__ == '__main__':
                         nargs='?',
                         type=str,
                         help="base filename for genotype genome")
+    parser.add_argument('--partial',
+                        dest='partial',
+                        action='store_true',
+                        help='Include partial alleles (e.g. A_nuc.fasta)')
     parser.add_argument("--inter-gap",
                         dest="inter_gap",
                         type=int,
@@ -491,6 +500,10 @@ if __name__ == '__main__':
                         type=int,
                         default=1,
                         help="Number of threads") 
+    parser.add_argument("--no-clinvar",
+                        dest="use_clinvar",
+                        action="store_false",
+                        help="")
     parser.add_argument("-v", "--verbose",
                         dest="verbose",
                         action="store_true",
@@ -505,7 +518,9 @@ if __name__ == '__main__':
         sys.exit(1)
     build_genotype_genome(args.reference,
                           args.base_fname,
+                          args.partial,
                           args.inter_gap,
                           args.intra_gap,
                           args.threads,
+                          args.use_clinvar,
                           args.verbose)
diff --git a/hisat2_inspect.cpp b/hisat2_inspect.cpp
index 86095e1..88cc284 100644
--- a/hisat2_inspect.cpp
+++ b/hisat2_inspect.cpp
@@ -743,7 +743,7 @@ int main(int argc, char **argv) {
 
 		// Optionally summarize
 		if(verbose) {
-			cout << "Input ebwt file: \"" << ebwtFile.c_str() << "\"" << endl;
+			cout << "Input ht2 file: \"" << ebwtFile.c_str() << "\"" << endl;
 			cout << "Output file: \"" << outfile.c_str() << "\"" << endl;
 			cout << "Local endianness: " << (currentlyBigEndian()? "big":"little") << endl;
 #ifdef NDEBUG
@@ -770,3 +770,4 @@ int main(int argc, char **argv) {
 		return e;
 	}
 }
+
diff --git a/hisat2_test_HLA_genotyping.py b/hisat2_test_HLA_genotyping.py
index 3baa823..dca5f61 100755
--- a/hisat2_test_HLA_genotyping.py
+++ b/hisat2_test_HLA_genotyping.py
@@ -1,34 +1,2022 @@
 #!/usr/bin/env python
+#
+# Copyright 2015, Daehwan Kim <infphilo at gmail.com>
+#
+# This file is part of HISAT 2.
+#
+# HISAT 2 is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# HISAT 2 is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with HISAT 2.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+import sys, os, subprocess, re
+import inspect, random
+import math
+from datetime import datetime, date, time
+from argparse import ArgumentParser, FileType
+from hisat2_modules import assembly_graph
+
+
+"""
+"""
+def simulate_reads(HLAs,
+                   test_HLA_list,
+                   Vars,
+                   Links,
+                   simulate_interval = 1,
+                   perbase_errorrate = 0.0):
+    HLA_reads_1, HLA_reads_2 = [], []
+    num_pairs = []
+    for test_HLA_names in test_HLA_list:
+        gene = test_HLA_names[0].split('*')[0]
+        num_pairs.append([])
+
+        # Simulate reads from two HLA alleles
+        def simulate_reads_impl(seq,
+                                seq_map,
+                                ex_seq,
+                                ex_desc,
+                                simulate_interval = 1,
+                                perbase_errorrate = 0.0,
+                                frag_len = 250,
+                                read_len = 100):
+            # Introduce sequencing errors
+            def introduce_seq_err(read_seq, pos):
+                read_seq = list(read_seq)
+                for i in range(read_len):
+                    map_pos = seq_map[pos + i]
+                    if ex_desc[map_pos] != "":
+                        continue
+                    if random.random() * 100 < perbase_errorrate:
+                        if read_seq[i] == 'A':
+                            alt_bases = ['C', 'G', 'T']
+                        elif read_seq[i] == 'C':
+                            alt_bases = ['A', 'G', 'T']
+                        elif read_seq[i] == 'G':
+                            alt_bases = ['A', 'C', 'T']
+                        else:
+                            assert read_seq[i] == 'T'
+                            alt_bases = ['A', 'C', 'G']
+                        random.shuffle(alt_bases)
+                        alt_base = alt_bases[0]
+                        read_seq[i] = alt_base
+                read_seq = ''.join(read_seq)
+                return read_seq                            
+                            
+            # Get read alignment, e.g., 260|R_483_61M5D38M23D1M_46|S|hv154,3|S|hv162,10|D|hv185,38|D|hv266
+            def get_info(read_seq, pos):
+                info = "%d_" % (seq_map[pos] + 1)
+                total_match, match, sub_match = 0, 0, 0
+                var_str = ""
+                for i in range(pos, pos + read_len):
+                    map_i = seq_map[i]
+                    assert ex_seq[map_i] != 'D'
+                    total_match += 1
+                    match += 1
+                    if ex_desc[map_i] != "" or read_seq[i-pos] != ex_seq[map_i]:
+                        if var_str != "":
+                            var_str += ','
+                        var_str += ("%d|S|%s" % (sub_match, ex_desc[map_i] if ex_desc[map_i] != "" else "unknown"))
+                        sub_match = 0
+                    else:
+                        sub_match += 1
+                    if i + 1 < pos + read_len and ex_seq[map_i+1] == 'D':
+                        assert match > 0
+                        info += ("%dM" % match)
+                        match = 0
+                        del_len = 1
+                        while map_i + 1 + del_len < len(ex_seq):
+                            if ex_seq[map_i + 1 + del_len] != 'D':
+                                break
+                            del_len += 1
+                        info += ("%dD" % del_len)
+                        if var_str != "":
+                            var_str += ','
+                        var_str += ("%s|D|%s" % (sub_match, ex_desc[map_i + 1]))
+                        sub_match = 0
+                assert match > 0
+                info += ("%dM" % match)
+                assert total_match == read_len
+                if var_str:
+                    info += "_"
+                    info += var_str                
+                return info
+                
+            comp_table = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
+            reads_1, reads_2 = [], []
+            for i in range(0, len(seq) - frag_len + 1, simulate_interval):
+                pos1 = i
+                seq1 = seq[pos1:pos1+read_len]
+                if perbase_errorrate > 0.0:
+                    seq1 = introduce_seq_err(seq1, pos1)
+                info1 = get_info(seq1, pos1)
+                reads_1.append([seq1, info1])
+                
+                pos2 = i + frag_len - read_len
+                seq2 = seq[pos2:pos2+read_len]
+                if perbase_errorrate > 0.0:
+                    seq2 = introduce_seq_err(seq2, pos2)                
+                info2 = get_info(seq2, pos2)
+                tmp_read_2 = reversed(seq2)
+                read_2 = ""
+                for s in tmp_read_2:
+                    if s in comp_table:
+                        read_2 += comp_table[s]
+                    else:
+                        read_2 += s
+                reads_2.append([read_2, info2])
+            return reads_1, reads_2
+
+        # for each allele in a list of alleles such as ['A*32:29', 'B*07:02:01']
+        for test_HLA_name in test_HLA_names:
+            HLA_seq = HLAs[gene][test_HLA_name]
+            HLA_ex_seq = list(HLAs[gene]["%s*BACKBONE" % gene])
+            HLA_ex_desc = [''] * len(HLA_ex_seq)
+            HLA_seq_map = [i for i in range(len(HLA_seq))]
+
+            # Extract variants included in the allele
+            vars = []
+            for var, allele_list in Links.items():
+                if test_HLA_name in allele_list:
+                    vars.append(var)
+
+            # Build annotated sequence for the allele w.r.t backbone sequence
+            for var in vars:
+                var_type, var_pos, var_data = Vars[gene][var]
+                if var_type == "single":
+                    HLA_ex_seq[var_pos] = var_data
+                    HLA_ex_desc[var_pos] = var
+                else:
+                    assert var_type == "deletion"
+                    del_len = int(var_data)
+                    assert var_pos + del_len <= len(HLA_ex_seq)
+                    HLA_ex_seq[var_pos:var_pos+del_len] = ['D'] * del_len
+                    HLA_ex_desc[var_pos:var_pos+del_len] = [var] * del_len
+            HLA_ex_seq = ''.join(HLA_ex_seq)
+
+            # Build mapping from the allele to the annotated sequence
+            prev_j = 0
+            for i in range(len(HLA_seq)):
+                for j in range(prev_j, len(HLA_ex_seq)):
+                    if HLA_ex_seq[j] != 'D':
+                        break
+                HLA_seq_map[i] = j
+                prev_j = j + 1
+            
+            tmp_reads_1, tmp_reads_2 = simulate_reads_impl(HLA_seq,
+                                                           HLA_seq_map,
+                                                           HLA_ex_seq,
+                                                           HLA_ex_desc,                                                           
+                                                           simulate_interval,
+                                                           perbase_errorrate)
+            HLA_reads_1 += tmp_reads_1
+            HLA_reads_2 += tmp_reads_2
+            num_pairs[-1].append(len(tmp_reads_1))
+
+    # Write reads into a fasta read file
+    def write_reads(reads, idx):
+        read_file = open('hla_input_%d.fa' % idx, 'w')
+        for read_i in range(len(reads)):
+            print >> read_file, ">%d|%s_%s" % (read_i + 1, "LR"[idx-1], reads[read_i][1])
+            print >> read_file, reads[read_i][0]
+        read_file.close()
+    write_reads(HLA_reads_1, 1)
+    write_reads(HLA_reads_2, 2)
+
+    return num_pairs
+
+
+"""
+Align reads, and sort the alignments into a BAM file
+"""
+def align_reads(ex_path,
+                aligner,
+                simulation,
+                index_type,
+                read_fname,
+                fastq,
+                threads,
+                out_fname,
+                verbose):
+    if aligner == "hisat2":
+        hisat2 = os.path.join(ex_path, "hisat2")
+        aligner_cmd = [hisat2, "--mm"]
+        if not simulation:
+            aligner_cmd += ["--no-unal"]            
+        # No detection of novel insertions and deletions
+        aligner_cmd += ["--rdg", "10000,10000"] # deletion
+        aligner_cmd += ["--rfg", "10000,10000"] # insertion
+        DNA = True
+        if DNA:
+            aligner_cmd += ["--no-spliced-alignment"] # no spliced alignment
+            # aligner_cmd += ["--min-intronlen", "100000"]
+        if index_type == "linear":
+            aligner_cmd += ["-k", "10"]
+        else:
+            aligner_cmd += ["--max-altstried", "64"]
+        aligner_cmd += ["-x", "hla.%s" % index_type]
+    elif aligner == "bowtie2":
+        aligner_cmd = [aligner,
+                       "--no-unal",
+                       "-k", "10",
+                       "-x", "hla"]
+    else:
+        assert False
+    assert len(read_fname) in [1,2]
+    aligner_cmd += ["-p", str(threads)]
+    if not fastq:
+        aligner_cmd += ["-f"]
+    if len(read_fname) == 1:
+        aligner_cmd += ["-U", read_fname[0]]
+    else:
+        aligner_cmd += ["-1", "%s" % read_fname[0],
+                        "-2", "%s" % read_fname[1]]
+    if verbose >= 1:
+        print >> sys.stderr, ' '.join(aligner_cmd)
+    align_proc = subprocess.Popen(aligner_cmd,
+                                  stdout=subprocess.PIPE,
+                                  stderr=open("/dev/null", 'w'))
+
+    sambam_cmd = ["samtools",
+                  "view",
+                  "-bS",
+                  "-"]
+    sambam_proc = subprocess.Popen(sambam_cmd,
+                                   stdin=align_proc.stdout,
+                                   stdout=open(out_fname + ".unsorted", 'w'),
+                                   stderr=open("/dev/null", 'w'))
+    sambam_proc.communicate()
+    if index_type == "graph":
+        bamsort_cmd = ["samtools",
+                       "sort",
+                       out_fname + ".unsorted",
+                       "-o", out_fname]
+        bamsort_proc = subprocess.Popen(bamsort_cmd,
+                                        stderr=open("/dev/null", 'w'))
+        bamsort_proc.communicate()
+
+        bamindex_cmd = ["samtools",
+                        "index",
+                        out_fname]
+        bamindex_proc = subprocess.Popen(bamindex_cmd,
+                                         stderr=open("/dev/null", 'w'))
+        bamindex_proc.communicate()
+
+    os.system("rm %s" % (out_fname + ".unsorted"))            
+
+
+"""
+""" 
+def normalize(prob):
+    total = sum(prob.values())
+    for allele, mass in prob.items():
+        prob[allele] = mass / total
+
+        
+"""
+"""
+def prob_diff(prob1, prob2):
+    diff = 0.0
+    for allele in prob1.keys():
+        if allele in prob2:
+            diff += abs(prob1[allele] - prob2[allele])
+        else:
+            diff += prob1[allele]
+    return diff
+
+
+"""
+"""
+def HLA_prob_cmp(a, b):
+    if a[1] != b[1]:
+        if a[1] < b[1]:
+            return 1
+        else:
+            return -1
+    assert a[0] != b[0]
+    if a[0] < b[0]:
+        return -1
+    else:
+        return 1
+
+
+"""
+"""
+def single_abundance(HLA_cmpt,
+                     HLA_length):
+    def normalize2(prob, length):
+        total = 0
+        for allele, mass in prob.items():
+            assert allele in length
+            total += (mass / length[allele])
+        for allele, mass in prob.items():
+            assert allele in length
+            prob[allele] = mass / length[allele] / total
+
+    HLA_prob, HLA_prob_next = {}, {}
+    for cmpt, count in HLA_cmpt.items():
+        alleles = cmpt.split('-')
+        for allele in alleles:
+            if allele not in HLA_prob:
+                HLA_prob[allele] = 0.0
+            HLA_prob[allele] += (float(count) / len(alleles))
+
+    normalize(HLA_prob)
+    def next_prob(HLA_cmpt, HLA_prob, HLA_length):
+        HLA_prob_next = {}
+        for cmpt, count in HLA_cmpt.items():
+            alleles = cmpt.split('-')
+            alleles_prob = 0.0
+            for allele in alleles:
+                if allele not in HLA_prob:
+                    continue
+                alleles_prob += HLA_prob[allele]
+            if alleles_prob <= 0.0:
+                continue
+            for allele in alleles:
+                if allele not in HLA_prob:
+                    continue
+                if allele not in HLA_prob_next:
+                    HLA_prob_next[allele] = 0.0
+                HLA_prob_next[allele] += (float(count) * HLA_prob[allele] / alleles_prob)
+        normalize(HLA_prob_next)
+        return HLA_prob_next
+
+    diff, iter = 1.0, 0
+    while diff > 0.0001 and iter < 1000:
+        HLA_prob_next = next_prob(HLA_cmpt, HLA_prob, HLA_length)
+        diff = prob_diff(HLA_prob, HLA_prob_next)
+        HLA_prob = HLA_prob_next
+
+        if iter >= 10:
+            HLA_prob2 = {}
+            for allele, prob in HLA_prob.items():
+                if prob >= 0.005:
+                    HLA_prob2[allele] = prob
+            HLA_prob = HLA_prob2
+
+        # DK - debugging purposes
+        if iter % 10 == 0 and False:
+            print "iter", iter
+            for allele, prob in HLA_prob.items():
+                if prob >= 0.01:
+                    print >> sys.stderr, "\t", iter, allele, prob, str(datetime.now())
+        
+        iter += 1
+        
+    """
+    for allele, prob in HLA_prob.items():
+        allele_len = HLA_length[allele]
+        HLA_prob[allele] /= float(allele_len)
+    """
+    
+    normalize(HLA_prob)
+    HLA_prob = [[allele, prob] for allele, prob in HLA_prob.items()]
+    HLA_prob = sorted(HLA_prob, cmp=HLA_prob_cmp)
+    return HLA_prob
+
+    
+"""
+"""
+def joint_abundance(HLA_cmpt,
+                    HLA_length):
+    allele_names = set()
+    for cmpt in HLA_cmpt.keys():
+        allele_names |= set(cmpt.split('-'))
+    
+    HLA_prob, HLA_prob_next = {}, {}
+    for cmpt, count in HLA_cmpt.items():
+        alleles = cmpt.split('-')
+        for allele1 in alleles:
+            for allele2 in allele_names:
+                if allele1 < allele2:
+                    allele_pair = "%s-%s" % (allele1, allele2)
+                else:
+                    allele_pair = "%s-%s" % (allele2, allele1)
+                if not allele_pair in HLA_prob:
+                    HLA_prob[allele_pair] = 0.0
+                HLA_prob[allele_pair] += (float(count) / len(alleles))
+
+    if len(HLA_prob) <= 0:
+        return HLA_prob
+
+    # Choose top allele pairs
+    def choose_top_alleles(HLA_prob):
+        HLA_prob_list = [[allele_pair, prob] for allele_pair, prob in HLA_prob.items()]
+        HLA_prob_list = sorted(HLA_prob_list, cmp=HLA_prob_cmp)
+        HLA_prob = {}
+        best_prob = HLA_prob_list[0][1]
+        for i in range(len(HLA_prob_list)):
+            allele_pair, prob = HLA_prob_list[i]
+            if prob * 2 <= best_prob:
+                break                        
+            HLA_prob[allele_pair] = prob
+        normalize(HLA_prob)
+        return HLA_prob
+    HLA_prob = choose_top_alleles(HLA_prob)
+
+    def next_prob(HLA_cmpt, HLA_prob):
+        HLA_prob_next = {}
+        for cmpt, count in HLA_cmpt.items():
+            alleles = cmpt.split('-')
+            prob = 0.0
+            for allele in alleles:
+                for allele_pair in HLA_prob.keys():
+                    if allele in allele_pair:
+                        prob += HLA_prob[allele_pair]
+            for allele in alleles:
+                for allele_pair in HLA_prob.keys():
+                    if not allele in allele_pair:
+                        continue
+                    if allele_pair not in HLA_prob_next:
+                        HLA_prob_next[allele_pair] = 0.0
+                    HLA_prob_next[allele_pair] += (float(count) * HLA_prob[allele_pair] / prob)
+        normalize(HLA_prob_next)
+        return HLA_prob_next
+
+    diff, iter = 1.0, 0
+    while diff > 0.0001 and iter < 1000:
+        HLA_prob_next = next_prob(HLA_cmpt, HLA_prob)
+        diff = prob_diff(HLA_prob, HLA_prob_next)
+        HLA_prob = HLA_prob_next
+        HLA_prob = choose_top_alleles(HLA_prob)
+        iter += 1
+
+    HLA_prob = [[allele_pair, prob] for allele_pair, prob in HLA_prob.items()]
+    HLA_prob = sorted(HLA_prob, cmp=HLA_prob_cmp)
+    return HLA_prob
+
+
+"""
+"""
+def lower_bound(Var_list, pos):
+    low, high = 0, len(Var_list)
+    while low < high:
+        m = (low + high) / 2
+        m_pos = Var_list[m][0]
+        if m_pos < pos:
+            low = m + 1
+        elif m_pos > pos:
+            high = m
+        else:
+            assert m_pos == pos
+            while m > 0:
+                if Var_list[m-1][0] < pos:
+                    break
+                m -= 1
+            return m
+    return low
+
+
+"""
+   var: ['single', 3300, 'G']
+   exons: [[301, 373], [504, 822], [1084, 1417], [2019, 2301], [2404, 2520], [2965, 2997], [3140, 3187], [3357, 3361]]
+"""
+def var_in_exon(var, exons):
+    exonic = False
+    var_type, var_left, var_data = var
+    var_right = var_left
+    if var_type == "deletion":
+        var_right = var_left + int(var_data) - 1
+    for exon_left, exon_right in exons:
+        if var_left >= exon_left and var_right <= exon_right:
+            return True
+    return False
+
+
+"""
+Report variant IDs whose var is within exonic regions
+"""
+def get_exonic_vars(Vars, exons):
+    vars = set()
+    for var_id, var in Vars.items():
+        var_type, var_left, var_data = var
+        var_right = var_left
+        if var_type == "deletion":
+            var_right = var_left + int(var_data) - 1
+        for exon_left, exon_right in exons:
+            if var_left >= exon_left and var_right <= exon_right:
+                vars.add(var_id)
+    return vars
+
+
+"""
+Get representative alleles among those that share the same exonic sequences
+"""
+def get_rep_alleles(Links, exon_vars):
+    allele_vars = {}
+    for var, alleles in Links.items():
+        if var not in exon_vars:
+            continue
+        for allele in alleles:
+            if allele not in allele_vars:
+                allele_vars[allele] = set()
+            allele_vars[allele].add(var)
+
+    allele_groups = {}
+    for allele, vars in allele_vars.items():
+        vars = '-'.join(vars)
+        if vars not in allele_groups:
+            allele_groups[vars] = []
+        allele_groups[vars].append(allele)
+
+    allele_reps = {} # allele representatives
+    allele_rep_groups = {} # allele groups by allele representatives
+    for allele_members in allele_groups.values():
+        assert len(allele_members) > 0
+        allele_rep = allele_members[0]
+        allele_rep_groups[allele_rep] = allele_members
+        for allele_member in allele_members:
+            assert allele_member not in allele_reps
+            allele_reps[allele_member] = allele_rep
+
+    return allele_reps, allele_rep_groups
+    
+
+"""
+Identify alternative alignments
+"""
+def get_alternatives(ref_seq, Vars, Var_list, verbose):
+    # Check deletions' alternatives
+    def get_alternatives_recur(ref_seq,
+                               Vars,
+                               Var_list,
+                               Alts,
+                               var_id,
+                               left,
+                               alt_list,
+                               var_j,
+                               latest_pos,
+                               debug = False):
+        def add_alt(Alts, alt_list, var_id, j_id):
+            if j_id.isdigit():
+                if var_id not in Alts:
+                    Alts[var_id] = [["1"]]
+                else:
+                    if Alts[var_id][-1][-1].isdigit():
+                        Alts[var_id][-1][-1] = str(int(Alts[var_id][-1][-1]) + 1)
+                    else:
+                        Alts[var_id][-1].append("1")
+            else:
+                if var_id not in Alts:
+                    Alts[var_id] = [[j_id]]
+                else:
+                    if Alts[var_id][-1][-1].isdigit():
+                        Alts[var_id][-1][-1] = j_id
+                    else:
+                        Alts[var_id][-1].append(j_id)
+                Alts[var_id][-1].append("0")
+                        
+            if not j_id.isdigit():
+                alt_list.append(j_id)
+                alts = '-'.join(alt_list)
+                if alts not in Alts:
+                    Alts[alts] = [[var_id]]
+                else:
+                    Alts[alts].append([var_id])
+                
+        var_type, var_pos, var_data = Vars[var_id]
+        if left: # Look in left direction
+            if var_j < 0:
+                return
+            j_pos, j_id = Var_list[var_j]
+            alt_del = []
+            if var_id != j_id and j_pos < var_pos + del_len:
+                # Check bases between SNPs
+                while latest_pos > j_pos:
+                    if debug: print latest_pos - 1, ref_seq[latest_pos - 1], latest_pos - 1 - del_len, ref_seq[latest_pos - 1 - del_len]
+                    if ref_seq[latest_pos - 1] != ref_seq[latest_pos - 1 - del_len]:
+                        break
+                    latest_pos -= 1
+                    add_alt(Alts, alt_list, var_id, str(latest_pos))
+                if latest_pos - 1 > j_pos:
+                    return
+                if j_pos == latest_pos - 1:
+                    j_type, _, j_data = Vars[j_id]
+                    if j_type == "single":
+                        if debug: print Vars[j_id]
+                        off = var_pos + del_len - j_pos
+                        if debug: print var_pos - off, ref_seq[var_pos - off]
+                        if debug: print j_pos, ref_seq[j_pos]
+                        if j_data == ref_seq[var_pos - off]:
+                            add_alt(Alts, alt_list, var_id, j_id)
+                            latest_pos = j_pos
+                    elif j_type == "deletion":
+                        j_del_len = int(j_data)
+                        if var_pos < j_pos and var_pos + del_len >= j_pos + j_del_len:
+                            alt_list2 = alt_list[:] + [j_id]
+                            latest_pos2 = j_pos
+                            alt_del = [alt_list2, latest_pos2]
+                
+            get_alternatives_recur(ref_seq,
+                                   Vars,
+                                   Var_list,
+                                   Alts,
+                                   var_id,
+                                   left,
+                                   alt_list,
+                                   var_j - 1,
+                                   latest_pos,
+                                   debug)
+
+            if alt_del:
+                alt_list2, latest_pos2 = alt_del
+                if var_id not in Alts:
+                    Alts[var_id] = [alt_list2[:]]
+                else:
+                    Alts[var_id].append(alt_list2[:])
+                alt_idx = len(Alts[var_id]) - 1
+                get_alternatives_recur(ref_seq,
+                                       Vars,
+                                       Var_list,
+                                       Alts,
+                                       var_id,
+                                       left,
+                                       alt_list2,
+                                       var_j - 1,
+                                       latest_pos2,
+                                       debug)
+                # Remove this Deletion if not supported by additional bases?
+                assert alt_idx < len(Alts[var_id])
+                # DK - for debugging purposes
+                if Alts[var_id][alt_idx][-1] == j_id:
+                    Alts[var_id] = Alts[var_id][:alt_idx] + Alts[var_id][alt_idx+1:]
+              
+        else: # Look in right direction
+            if var_j >= len(Var_list):
+                return
+            j_pos, j_id = Var_list[var_j]
+            alt_del = []
+            if var_id != j_id and j_pos >= var_pos:
+                # Check bases between SNPs
+                while latest_pos < j_pos:
+                    if ref_seq[latest_pos + 1] != ref_seq[var_pos + del_len - 1 - (latest_pos - var_pos)]:
+                        break
+                    latest_pos += 1
+                    add_alt(Alts, alt_list, var_id, str(latest_pos))
+                if latest_pos + 1 < j_pos:
+                    return
+                if j_pos == latest_pos + 1:
+                    j_type, _, j_data = Vars[j_id]
+                    if j_type == "single":
+                        if debug: print Vars[j_id]
+                        off = j_pos - var_pos
+                        if debug: print var_pos + off, ref_seq[var_pos + off]
+                        if debug: print var_pos + del_len + off, ref_seq[var_pos + del_len + off]
+
+                        # DK - for debugging purposes
+                        if var_pos + del_len + off >= len(ref_seq):
+                            print >> sys.stderr, var_id, var
+                            print >> sys.stderr, "var_pos: %d, del_len: %d, off: %d" % (var_pos, del_len, off)
+                            print >> sys.stderr, "ref_seq: %d, %d" % (len(ref_seq), var_pos + del_len + off)
+                            sys.exit(1)
+                        
+                        if j_data == ref_seq[var_pos + del_len + off]:
+                            add_alt(Alts, alt_list, var_id, j_id)
+                            latest_pos = j_pos
+                    elif j_type == "deletion":
+                        j_del_len = int(j_data)
+                        if j_pos + j_del_len < var_pos + del_len:
+                            alt_list2 = alt_list[:] + [j_id]
+                            latest_pos2 = j_pos + j_del_len - 1
+                            alt_del = [alt_list2, latest_pos2]
+
+            get_alternatives_recur(ref_seq,
+                                   Vars,
+                                   Var_list,
+                                   Alts,
+                                   var_id,
+                                   left,
+                                   alt_list,
+                                   var_j + 1,
+                                   latest_pos,
+                                   debug)
+
+            if alt_del:
+                alt_list2, latest_pos2 = alt_del
+                if var_id not in Alts:
+                    Alts[var_id] = [alt_list2[:]]
+                else:
+                    Alts[var_id].append(alt_list2[:])
+                alt_idx = len(Alts[var_id]) - 1
+                get_alternatives_recur(ref_seq,
+                                       Vars,
+                                       Var_list,
+                                       Alts,
+                                       var_id,
+                                       left,
+                                       alt_list2,
+                                       var_j + 1,
+                                       latest_pos2,
+                                       debug)
+                # Remove this Deletion if not supported by additional bases?
+                assert alt_idx < len(Alts[var_id])
+                if Alts[var_id][alt_idx][-1] == j_id:
+                    Alts[var_id] = Alts[var_id][:alt_idx] + Alts[var_id][alt_idx+1:]
+
+    # Check deletions' alternatives
+    Alts_left, Alts_right = {}, {}
+    for var_i, var_id in Var_list:
+        var_type, var_pos, var_data = var = Vars[var_id]
+        if var_type != "deletion" or var_pos == 0:
+            continue
+        del_len = int(var_data)
+        if var_pos + del_len >= len(ref_seq):
+            assert var_pos + del_len == len(ref_seq)
+            continue
+        debug = (var_id == "hv1096a")
+        if debug:
+            print Vars[var_id]
+
+        alt_list = []
+        var_j = lower_bound(Var_list, var_pos + del_len - 1)
+        latest_pos = var_pos + del_len
+        if var_j < len(Var_list):
+            get_alternatives_recur(ref_seq,
+                                   Vars,
+                                   Var_list,
+                                   Alts_left,
+                                   var_id,
+                                   True, # left
+                                   alt_list,
+                                   var_j,
+                                   latest_pos,
+                                   debug)
+        alt_list = []
+        var_j = lower_bound(Var_list, var_pos)
+        latest_pos = var_pos - 1
+        assert var_j >= 0
+        get_alternatives_recur(ref_seq,
+                               Vars,
+                               Var_list,
+                               Alts_right,
+                               var_id,
+                               False, # right
+                               alt_list,
+                               var_j,
+                               latest_pos,
+                               debug)
+
+        if debug:
+            print "DK :-)"
+            sys.exit(1)
+
+    def debug_print_alts(Alts, dir):
+        for alt_list1, alt_list2 in Alts.items():
+            print "\t", dir, ":", alt_list1, alt_list2
+            out_str = "\t\t"
+            alt_list1 = alt_list1.split('-')
+            for i in range(len(alt_list1)):
+                alt = alt_list1[i]
+                var_type, var_pos, var_data = Vars[alt]
+                out_str += ("%s-%d-%s" % (var_type, var_pos, var_data))
+                if i + 1 < len(alt_list1):
+                    out_str += " "
+            for i in range(len(alt_list2)):
+                alt_list3 = alt_list2[i]
+                out_str += "\t["
+                for j in range(len(alt_list3)):
+                    alt = alt_list3[j]
+                    if alt.isdigit():
+                        out_str += alt
+                    else:
+                        var_type, var_pos, var_data = Vars[alt]
+                        out_str += ("%s-%d-%s" % (var_type, var_pos, var_data))
+                    if j + 1 < len(alt_list3):
+                        out_str += ", "
+                out_str += "]"
+            print out_str
+    if verbose >= 2: debug_print_alts(Alts_left, "left")
+    if verbose >= 2: debug_print_alts(Alts_right, "right")
+
+    return Alts_left, Alts_right
+
+
+"""
+Identify ambigious differences that may account for other alleles,
+  given a list of differences (cmp_list) between a read and a potential allele   
+"""
+def identify_ambigious_diffs(Vars, Alts_left, Alts_right, cmp_list, verbose):
+    cmp_left, cmp_right = 0, len(cmp_list) - 1
+    i = 0
+    while i < len(cmp_list):
+        cmp_i = cmp_list[i]
+        type, pos, length = cmp_i[:3]
+        # Check alternative alignments
+        if type in ["mismatch", "deletion"]:
+            var_id = cmp_i[3]
+            if var_id == "unknown":
+                i += 1
+                continue
+            
+            # Left direction
+            id_str = var_id
+            total_del_len = length if type == "deletion" else 0
+            for j in reversed(range(0, i)):
+                cmp_j = cmp_list[j]
+                j_type, j_pos, j_len = cmp_j[:3]
+                if j_type != "match":
+                    if len(cmp_j) < 4:
+                        continue
+                    j_var_id = cmp_j[3]
+                    id_str += ("-%s" % j_var_id)
+                    if j_type == "deletion":
+                        total_del_len += j_len
+            last_type, last_pos, last_len = cmp_list[0][:3]
+            assert last_type in ["match", "mismatch"]
+            left_pos = last_pos + total_del_len
+            if id_str in Alts_left:
+                orig_alts = id_str.split('-')
+                alts_list = Alts_left[id_str]
+                for alts in alts_list:
+                    if alts[-1].isdigit():
+                        assert type == "deletion"
+                        assert len(orig_alts) == 1
+                        alts_id_str = '-'.join(alts[:-1])
+                        alt_left_pos = pos
+                        alt_total_del_len = 0
+                        for alt in alts[:-1]:
+                            assert alt in Vars
+                            alt_type, alt_pos, alt_data = Vars[alt]
+                            alt_left_pos = alt_pos - 1
+                            if alt_type == "deletion":
+                                alt_total_del_len += int(alt_data)
+                        alt_left_pos = alt_left_pos + alt_total_del_len - int(alts[-1]) + 1
+                    else:
+                        alts_id_str = '-'.join(alts)
+                        assert alts_id_str in Alts_left
+                        for back_alts in Alts_left[alts_id_str]:
+                            back_id_str = '-'.join(back_alts)
+                            if back_id_str.find(id_str) != 0:
+                                continue
+                            assert len(orig_alts) < len(back_alts)
+                            assert back_alts[-1].isdigit()
+                            alt_left_pos = pos
+                            alt_total_del_len = 0
+                            for alt in back_alts[:len(orig_alts) + 1]:
+                                if alt.isdigit():
+                                    alt_left_pos = alt_left_pos - int(alt) + 1
+                                else:
+                                    assert alt in Vars
+                                    alt_type, alt_pos, alt_data = Vars[alt]
+                                    alt_left_pos = alt_pos - 1
+                                    if alt_type == "deletion":
+                                        alt_total_del_len += int(alt_data)
+                            alt_left_pos += alt_total_del_len
+                        if left_pos >= alt_left_pos:
+                            if verbose >= 2:
+                                print "LEFT:", cmp_list
+                                print "\t", type, "id_str:", id_str, "=>", alts_id_str, "=>", back_alts, "left_pos:", left_pos, "alt_left_pos:", alt_left_pos
+                            cmp_left = i + 1
+                            break
+    
+            # Right direction
+            if cmp_right + 1 == len(cmp_list):
+                id_str = var_id
+                total_del_len = length if type == "deletion" else 0
+                for j in range(i + 1, len(cmp_list)):
+                    cmp_j = cmp_list[j]
+                    j_type, j_pos, j_len = cmp_j[:3]
+                    if j_type != "match":
+                        if len(cmp_j) < 4:
+                            continue
+                        j_var_id = cmp_j[3]
+                        id_str += ("-%s" % j_var_id)
+                        if j_type == "deletion":
+                            total_del_len += j_len                        
+                last_type, last_pos, last_len = cmp_list[-1][:3]
+                assert last_type in ["match", "mismatch"]
+                right_pos = last_pos + last_len - 1 - total_del_len
+                if id_str in Alts_right:
+                    orig_alts = id_str.split('-')
+                    alts_list = Alts_right[id_str]
+                    for alts in alts_list:
+                        if alts[-1].isdigit():
+                            assert type == "deletion"
+                            assert len(orig_alts) == 1
+                            alts_id_str = '-'.join(alts[:-1])
+                            alt_right_pos = pos
+                            alt_total_del_len = 0
+                            for alt in alts[:-1]:
+                                assert alt in Vars
+                                alt_type, alt_pos, alt_data = Vars[alt]
+                                alt_right_pos = alt_pos
+                                if alt_type == "single":
+                                    alt_right_pos += 1
+                                else:
+                                    assert alt_type == "deletion"
+                                    alt_del_len = int(alt_data)
+                                    alt_right_pos += alt_del_len
+                                    alt_total_del_len += alt_del_len
+                            alt_right_pos = alt_right_pos - alt_total_del_len + int(alts[-1]) - 1
+                        else:
+                            alts_id_str = '-'.join(alts)
+                            assert alts_id_str in Alts_right
+                            for back_alts in Alts_right[alts_id_str]:
+                                back_id_str = '-'.join(back_alts)
+                                if back_id_str.find(id_str) != 0:
+                                    continue
+                                assert len(orig_alts) < len(back_alts)
+                                assert back_alts[-1].isdigit()
+                                alt_right_pos = pos
+                                alt_total_del_len = 0
+                                for alt in back_alts[:len(orig_alts) + 1]:
+                                    if alt.isdigit():
+                                        alt_right_pos = alt_right_pos + int(alt) - 1
+                                    else:
+                                        assert alt in Vars
+                                        alt_type, alt_pos, alt_data = Vars[alt]
+                                        alt_right_pos = alt_pos
+                                        if alt_type == "single":
+                                            alt_right_pos += 1
+                                        else:
+                                            assert alt_type == "deletion"
+                                            alt_del_len = int(alt_data)
+                                            alt_right_pos += alt_del_len
+                                            alt_total_del_len += alt_del_len
+                                alt_right_pos -= alt_total_del_len
+                                    
+                        if right_pos <= alt_right_pos:
+                            if verbose >= 2:
+                                print "RIGHT:", cmp_list
+                                print "\t", type, "id_str:", id_str, "=>", alts_id_str, "right_pos:", right_pos, "alt_right_pos:", alt_right_pos
+                            cmp_right = i - 1
+                            break
+        i += 1
+
+    return cmp_left, cmp_right
+
+
+"""
+Example,
+   gene_name, allele_name (input): A, A*32:01:01
+   allele (output): single-136-G-hv47,deletion-285-1-hv57, ... ,single-3473-T-hv1756,deletion-3495-1-hv1763,single-3613-C-hv1799 
+"""
+def get_allele(gene_name, allele_name, Vars, Var_list, Links):    
+    allele_haplotype = []
+    for _var_pos, _var_id in Var_list[gene_name]:
+        if allele_name in Links[_var_id]:
+            _var = Vars[gene_name][_var_id]
+            allele_haplotype.append("%s-%d-%s-%s" % (_var[0], _var[1], _var[2], _var_id))                                
+    allele_haplotype = ','.join(allele_haplotype)
+    return allele_haplotype
+
+
+"""
+"""
+def calculate_allele_coverage(allele_haplotype,
+                              N_haplotypes,
+                              exons,
+                              partial,
+                              exonic_only,
+                              output):
+    _var_count = {}
+    for read_haplotypes in N_haplotypes.values():
+        for read_haplotype in read_haplotypes:
+            if haplotype_cmp(allele_haplotype, read_haplotype) <= 0:
+                _, assembled = assemble_two_haplotypes(allele_haplotype.split(','), read_haplotype.split(','))
+            else:
+                _, assembled = assemble_two_haplotypes(read_haplotype.split(','), allele_haplotype.split(','))
+            read_vars = read_haplotype.split(',')
+            for read_var in read_vars:
+                _type, _left, _data, _id = read_var.split('-')
+                if _type in ["left", "right", "unknown"]:
+                    continue
+                if _id not in _var_count:
+                    _var_count[_id] = 1
+                else:
+                    _var_count[_id] += 1
+    total_var, covered_var = 0, 0
+    for allele_var in allele_haplotype.split(',')[1:-1]:
+        _type, _left, _data, _id = allele_var.split('-')
+        _left = int(_left)
+        _count = 0
+
+        if partial and \
+                exonic_only and \
+                not var_in_exon([_type, _left, _data], exons):
+            continue
+        
+        total_var += 1
+        if _id in _var_count:
+            _count = _var_count[_id]
+            covered_var += 1
+        if output:
+            print "\t %d %s %s (%s - %d)" % (_left, _type, _data, _id, _count)
+            
+    return covered_var, total_var
+
+
+"""
+"""
+def HLA_typing(ex_path,
+               simulation,
+               reference_type,
+               hla_list,
+               partial,
+               partial_alleles,
+               refHLAs,
+               HLAs,
+               HLA_names,
+               HLA_lengths,
+               refHLA_loci,
+               Vars,
+               Var_list,
+               Links,
+               HLAs_default,
+               Vars_default,
+               Var_list_default,
+               Links_default,
+               exclude_allele_list,
+               aligners,
+               num_mismatch,
+               assembly,
+               concordant_assembly,
+               exonic_only,
+               fastq,
+               read_fname,
+               alignment_fname,
+               num_frag_list,
+               threads,
+               enable_coverage,
+               best_alleles,
+               verbose):    
+    if simulation:
+        test_passed = {}
+    for aligner, index_type in aligners:
+        if index_type == "graph":
+            print >> sys.stderr, "\n\t\t%s %s on %s" % (aligner, index_type, reference_type)
+        else:
+            print >> sys.stderr, "\n\t\t%s %s" % (aligner, index_type)
+
+        remove_alignment_file = False
+        if alignment_fname == "":
+            # Align reads, and sort the alignments into a BAM file
+            remove_alignment_file = True
+            if simulation:
+                alignment_fname = "hla_output.bam"
+            else:
+                alignment_fname = read_fname[0].split('/')[-1]
+                alignment_fname = alignment_fname.split('.')[0] + ".bam"
+                
+            align_reads(ex_path,
+                        aligner,
+                        simulation,
+                        index_type,
+                        read_fname,
+                        fastq,
+                        threads,
+                        alignment_fname,
+                        verbose)
+            
+        for test_HLA_names in hla_list:
+            if simulation:
+                gene = test_HLA_names[0].split('*')[0]
+            else:
+                gene = test_HLA_names
+            ref_allele = refHLAs[gene]
+            ref_seq = HLAs[gene][ref_allele]
+            ref_exons = refHLA_loci[gene][-1]
+
+            if not os.path.exists(alignment_fname + ".bai"):
+                os.system("samtools index %s" % alignment_fname)
+            # Read alignments
+            alignview_cmd = ["samtools",
+                             "view",
+                             alignment_fname]
+            base_locus = 0
+            if index_type == "graph":
+                if reference_type == "gene":
+                    alignview_cmd += ["%s" % ref_allele]
+                else:
+                    assert reference_type in ["chromosome", "genome"]
+                    _, chr, left, right, _ = refHLA_loci[gene]
+                    base_locus = left
+                    alignview_cmd += ["%s:%d-%d" % (chr, left + 1, right + 1)]
+
+                bamview_proc = subprocess.Popen(alignview_cmd,
+                                                stdout=subprocess.PIPE,
+                                                stderr=open("/dev/null", 'w'))
+
+                sort_read_cmd = ["sort", "-k", "1,1", "-s"] # -s for stable sorting
+                alignview_proc = subprocess.Popen(sort_read_cmd,
+                                                  stdin=bamview_proc.stdout,
+                                                  stdout=subprocess.PIPE,
+                                                  stderr=open("/dev/null", 'w'))
+            else:
+                alignview_proc = subprocess.Popen(alignview_cmd,
+                                             stdout=subprocess.PIPE,
+                                             stderr=open("/dev/null", 'w'))
+
+            # Assembly graph
+            asm_graph = assembly_graph.Graph(ref_seq)
+
+            # List of nodes that represent alleles
+            allele_vars = {}
+            for var_id, allele_list in Links_default.items():
+                for allele_id in allele_list:
+                    if allele_id not in HLAs[gene]:
+                        continue
+                    if allele_id not in allele_vars:
+                        allele_vars[allele_id] = [var_id]
+                    else:
+                        allele_vars[allele_id].append(var_id)
+
+            allele_nodes = {}
+            for allele_id, var_ids in allele_vars.items():
+                seq = list(ref_seq)  # sequence that node represents
+                var = ["" for i in range(len(ref_seq))]  # how sequence is related to backbone
+                for var_id in var_ids:
+                    assert var_id in Vars[gene]
+                    var_type, var_pos, var_data = Vars[gene][var_id]
+                    assert var_pos >= 0 and var_pos < len(ref_seq)
+                    if var_type == "single":
+                        seq[var_pos] = var_data
+                        var[var_pos] = var_id
+                    else:
+                        assert var_type == "deletion"
+                        del_len = int(var_data)
+                        assert var_pos + del_len <= len(ref_seq)
+                        seq[var_pos:var_pos + del_len] = ['D'] * del_len
+                        var[var_pos:var_pos + del_len] = [var_id] * del_len
+
+                seq = ''.join(seq)
+                allele_nodes[allele_id] = assembly_graph.Node(0, seq, var)
+
+            # Extract variants that are within exons
+            exon_vars = get_exonic_vars(Vars[gene], ref_exons)
+
+            # Choose allele representives from those that share the same exonic sequences
+            allele_reps, allele_rep_groups = get_rep_alleles(Links, exon_vars)
+            allele_rep_set = set(allele_reps.values())
+
+            # For checking alternative alignments near the ends of alignments
+            Alts_left, Alts_right = get_alternatives(ref_seq, Vars[gene], Var_list[gene], verbose)
+
+            # Count alleles
+            HLA_counts, HLA_cmpt = {}, {}
+            HLA_gen_counts, HLA_gen_cmpt = {}, {}
+            num_reads, total_read_len = 0, 0
+
+            # For debugging purposes
+            debug_allele_names = set(test_HLA_names) if simulation and verbose >= 2 else set()
+
+            # Read information
+            prev_read_id = None
+            prev_right_pos = 0
+            prev_lines = []
+            if index_type == "graph":
+                # nodes for reads
+                read_nodes = []
+                read_vars_list = []
+                
+                # Cigar regular expression
+                cigar_re = re.compile('\d+\w')
+                for line in alignview_proc.stdout:
+                    line = line.strip()
+                    cols = line.split()
+                    read_id, flag, chr, pos, mapQ, cigar_str = cols[:6]
+                    orig_read_id = read_id
+                    if simulation:
+                        read_id = read_id.split('|')[0]
+                    read_seq, qual = cols[9], cols[10]
+                    num_reads += 1
+                    total_read_len += len(read_seq)
+                    flag, pos = int(flag), int(pos)
+                    pos -= (base_locus + 1)
+                    if pos < 0:
+                        continue
+
+                    # Unalined?
+                    if flag & 0x4 != 0:
+                        if simulation and verbose >= 2:
+                            print "Unaligned"
+                            print "\t", line                            
+                        continue
+
+                    # Concordantly mapped?
+                    if flag & 0x2 != 0:
+                        concordant = True
+                    else:
+                        concordant = False
+
+                    NM, Zs, MD, NH = "", "", "", ""
+                    for i in range(11, len(cols)):
+                        col = cols[i]
+                        if col.startswith("Zs"):
+                            Zs = col[5:]
+                        elif col.startswith("MD"):
+                            MD = col[5:]
+                        elif col.startswith("NM"):
+                            NM = int(col[5:])
+                        elif col.startswith("NH"):
+                            NH = int(col[5:])
+
+                    if NM > num_mismatch:
+                        continue
+
+                    # Only consider unique alignment
+                    if NH > 1:
+                        continue
+
+                    if Zs:
+                        Zs = Zs.split(',')
+
+                    assert MD != ""
+                    MD_str_pos, MD_len = 0, 0
+                    Zs_pos, Zs_i = 0, 0
+                    for _i in range(len(Zs)):
+                        Zs[_i] = Zs[_i].split('|')
+                    if Zs_i < len(Zs):
+                        Zs_pos += int(Zs[Zs_i][0])
+                    read_pos, left_pos = 0, pos
+                    right_pos = left_pos
+                    cigars = cigar_re.findall(cigar_str)
+                    cigars = [[cigar[-1], int(cigar[:-1])] for cigar in cigars]
+                    cmp_list = []
+
+                    # Extract variants w.r.t backbone from CIGAR string
+                    softclip = [0, 0]
+                    for i in range(len(cigars)):
+                        cigar_op, length = cigars[i]
+                        if cigar_op == 'M':
+                            # Update coverage
+                            if enable_coverage:
+                                if right_pos + length < len(coverage):
+                                    coverage[right_pos] += 1
+                                    coverage[right_pos + length] -= 1
+                                elif right_pos < len(coverage):
+                                    coverage[right_pos] += 1
+                                    coverage[-1] -= 1
+
+                            first = True
+                            MD_len_used = 0
+                            while True:
+                                if not first or MD_len == 0:
+                                    if MD[MD_str_pos].isdigit():
+                                        num = int(MD[MD_str_pos])
+                                        MD_str_pos += 1
+                                        while MD_str_pos < len(MD):
+                                            if MD[MD_str_pos].isdigit():
+                                                num = num * 10 + int(MD[MD_str_pos])
+                                                MD_str_pos += 1
+                                            else:
+                                                break
+                                        MD_len += num
+                                # Insertion or full match followed
+                                if MD_len >= length:
+                                    MD_len -= length
+                                    cmp_list.append(["match", right_pos + MD_len_used, length - MD_len_used])
+                                    break
+                                first = False
+                                read_base = read_seq[read_pos + MD_len]
+                                MD_ref_base = MD[MD_str_pos]
+                                MD_str_pos += 1
+                                assert MD_ref_base in "ACGT"
+                                cmp_list.append(["match", right_pos + MD_len_used, MD_len - MD_len_used])
+
+                                _var_id = "unknown"
+                                if read_pos + MD_len == Zs_pos and Zs_i < len(Zs):
+                                    assert Zs[Zs_i][1] == 'S'
+                                    _var_id = Zs[Zs_i][2]
+                                    Zs_i += 1
+                                    Zs_pos += 1
+                                    if Zs_i < len(Zs):
+                                        Zs_pos += int(Zs[Zs_i][0])
+
+                                cmp_list.append(["mismatch", right_pos + MD_len, 1, _var_id])
+                                MD_len_used = MD_len + 1
+                                MD_len += 1
+                                # Full match
+                                if MD_len == length:
+                                    MD_len = 0
+                                    break
+                        elif cigar_op == 'I':
+                            cmp_list.append(["insertion", right_pos, length])
+                        elif cigar_op == 'D':
+                            if MD[MD_str_pos] == '0':
+                                MD_str_pos += 1
+                            assert MD[MD_str_pos] == '^'
+                            MD_str_pos += 1
+                            while MD_str_pos < len(MD):
+                                if not MD[MD_str_pos] in "ACGT":
+                                    break
+                                MD_str_pos += 1
+                            _var_id = "unknown"
+                            if read_pos == Zs_pos and Zs_i < len(Zs):
+                                assert Zs[Zs_i][1] == 'D'
+                                _var_id = Zs[Zs_i][2]
+                                Zs_i += 1
+                                if Zs_i < len(Zs):
+                                    Zs_pos += int(Zs[Zs_i][0])
+
+                            cmp_list.append(["deletion", right_pos, length, _var_id])
+                        elif cigar_op == 'S':
+                            if i == 0:
+                                softclip[0] = length
+                                Zs_pos += length
+                            else:
+                                assert i + 1 == len(cigars)
+                                softclip[1] = length
+                        else:                    
+                            assert cigar_op == 'N'
+                            cmp_list.append(["intron", right_pos, length])
+
+                        if cigar_op in "MND":
+                            right_pos += length
+
+                        if cigar_op in "MIS":
+                            read_pos += length
+
+                    # Remove softclip in cigar and modify read_seq and read_qual accordingly
+                    if sum(softclip) > 0:
+                        if softclip[0] > 0:
+                            cigars = cigars[1:]
+                            read_seq = read_seq[softclip[0]:]
+                            qual = qual[softclip[0]:]
+                        if softclip[1] > 0:
+                            cigars = cigars[:-1]
+                            read_seq = read_seq[:-softclip[1]]
+                            qual = qual[:-softclip[1]]
+
+                        cigar_str = ""
+                        for type, length in cigars:
+                            cigar_str += str(length)
+                            cigar_str += type
+                    
+                    if right_pos > len(ref_seq):
+                        continue
+
+                    def add_stat(HLA_cmpt, HLA_counts, HLA_count_per_read, include_alleles = set()):
+                        max_count = max(HLA_count_per_read.values())
+                        cur_cmpt = set()
+                        for allele, count in HLA_count_per_read.items():
+                            if count < max_count:
+                                continue
+
+                            if len(include_alleles) > 0 and allele not in include_alleles:
+                                continue
+                            
+                            cur_cmpt.add(allele)                    
+                            if allele not in HLA_counts:
+                                HLA_counts[allele] = 1
+                            else:
+                                HLA_counts[allele] += 1
+
+                        if len(cur_cmpt) == 0:
+                            return ""
+
+                        # DK - for debugging purposes                            
+                        alleles = ["", ""]
+                        # alleles = ["B*40:304", "B*40:02:01"]
+                        allele1_found, allele2_found = False, False
+                        if alleles[0] != "":
+                            for allele, count in HLA_count_per_read.items():
+                                if count < max_count:
+                                    continue
+                                if allele == alleles[0]:
+                                    allele1_found = True
+                                elif allele == alleles[1]:
+                                    allele2_found = True
+                            if allele1_found != allele2_found:
+                                print alleles[0], HLA_count_per_read[alleles[0]]
+                                print alleles[1], HLA_count_per_read[alleles[1]]
+                                if allele1_found:
+                                    print ("%s\tread_id %s - %d vs. %d]" % (alleles[0], prev_read_id, max_count, HLA_count_per_read[alleles[1]]))
+                                else:
+                                    print ("%s\tread_id %s - %d vs. %d]" % (alleles[1], prev_read_id, max_count, HLA_count_per_read[alleles[0]]))
+                                print read_seq
+
+                        cur_cmpt = sorted(list(cur_cmpt))
+                        cur_cmpt = '-'.join(cur_cmpt)
+                        if not cur_cmpt in HLA_cmpt:
+                            HLA_cmpt[cur_cmpt] = 1
+                        else:
+                            HLA_cmpt[cur_cmpt] += 1
+
+                        return cur_cmpt
+
+                    if read_id != prev_read_id:
+                        if prev_read_id != None:
+                            cur_cmpt = add_stat(HLA_cmpt, HLA_counts, HLA_count_per_read, allele_rep_set)
+                            add_stat(HLA_gen_cmpt, HLA_gen_counts, HLA_gen_count_per_read)
+                            for read_id_, read_node in read_nodes:
+                                asm_graph.add_node(read_id_, read_node)
+                            read_nodes, read_var_list = [], []
+
+                            if verbose >= 2:
+                                cur_cmpt = cur_cmpt.split('-')
+                                if not(set(cur_cmpt) & set(test_HLA_names)):
+                                    print "%s are chosen instead of %s" % ('-'.join(cur_cmpt), '-'.join(test_HLA_names))
+                                    for prev_line in prev_lines:
+                                        print "\t", prev_line
+
+                            prev_lines = []
+
+                        HLA_count_per_read, HLA_gen_count_per_read = {}, {}
+                        for HLA_name in HLA_names[gene]:
+                            if HLA_name.find("BACKBONE") != -1:
+                                continue
+                            HLA_count_per_read[HLA_name] = 0
+                            HLA_gen_count_per_read[HLA_name] = 0
+
+                    prev_lines.append(line)
+
+                    def add_count(count_per_read, var_id, add):
+                        alleles = Links[var_id]
+                        if verbose >= 2:
+                            if add > 0 and not (set(alleles) & debug_allele_names):
+                                print "Add:", add, debug_allele_names, "-", var_id
+                                print "\t", line
+                                print "\t", alleles
+                            if add < 0 and set(alleles) & debug_allele_names:
+                                print "Add:", add, debug_allele_names, "-", var_id
+                                print "\t", line
+
+                        for allele in alleles:
+                            count_per_read[allele] += add
+
+                    # Decide which allele(s) a read most likely came from
+                    for var_id, data in Vars[gene].items():
+                        var_type, var_pos, var_data = data
+                        if var_type != "deletion":
+                            continue
+                        if left_pos >= var_pos and right_pos <= var_pos + int(var_data):
+                            if var_id in exon_vars:
+                                add_count(HLA_count_per_read, var_id, -1)
+                            add_count(HLA_gen_count_per_read, var_id, -1)
+
+                    # Node
+                    read_node_pos, read_node_seq, read_node_var = -1, "", []
+                    read_vars = []
+
+                    # Positive and negative evidence
+                    positive_vars, negative_vars = set(), set()
+
+                    # Sanity check - read length, cigar string, and MD string
+                    ref_pos, read_pos, cmp_cigar_str, cmp_MD = left_pos, 0, "", ""
+                    cigar_match_len, MD_match_len = 0, 0
+                    cmp_list_left, cmp_list_right = identify_ambigious_diffs(Vars[gene],
+                                                                             Alts_left,
+                                                                             Alts_right,
+                                                                             cmp_list,
+                                                                             verbose)
+
+                    cmp_i = 0
+                    while cmp_i < len(cmp_list):
+                        cmp = cmp_list[cmp_i]
+                        type, length = cmp[0], cmp[2]
+                        if num_mismatch == 0 and type in ["mismatch", "deletion", "insertion"]:
+                            assert cmp[3] != "unknown"
+
+                        if type in ["match", "mismatch"]:
+                            if read_node_pos < 0:
+                                read_node_pos = ref_pos
+
+                        if type == "match":
+                            read_node_seq += read_seq[read_pos:read_pos+length]
+                            read_node_var += ([''] * length)
+                            
+                            var_idx = lower_bound(Var_list[gene], ref_pos)
+                            while var_idx < len(Var_list[gene]):
+                                var_pos, var_id = Var_list[gene][var_idx]
+                                if ref_pos + length <= var_pos:
+                                    break
+                                if ref_pos <= var_pos:
+                                    var_type, _, var_data = Vars[gene][var_id]
+                                    if var_type == "insertion":
+                                        if ref_pos < var_pos and ref_pos + length > var_pos + len(var_data):
+                                            negative_vars.add(var_id)
+                                    elif var_type == "deletion":
+                                        del_len = int(var_data)
+                                        if ref_pos < var_pos and ref_pos + length > var_pos + del_len:
+                                            # Check if this might be one of the two tandem repeats (the same left coordinate)
+                                            cmp_left, cmp_right = cmp[1], cmp[1] + cmp[2]
+                                            test1_seq1 = ref_seq[cmp_left:cmp_right]
+                                            test1_seq2 = ref_seq[cmp_left:var_pos] + ref_seq[var_pos + del_len:cmp_right + del_len]
+                                            # Check if this happens due to small repeats (the same right coordinate - e.g. 19 times of TTTC in DQA1*05:05:01:02)
+                                            cmp_left -= read_pos
+                                            cmp_right += (len(read_seq) - read_pos - cmp[2])
+                                            test2_seq1 = ref_seq[cmp_left+int(var_data):cmp_right]
+                                            test2_seq2 = ref_seq[cmp_left:var_pos] + ref_seq[var_pos+int(var_data):cmp_right]
+                                            if test1_seq1 != test1_seq2 and test2_seq1 != test2_seq2:
+                                                negative_vars.add(var_id)
+                                    else:
+                                        negative_vars.add(var_id)
+                                var_idx += 1
+                            read_pos += length
+                            ref_pos += length
+                            cigar_match_len += length
+                            MD_match_len += length
+                        elif type == "mismatch":
+                            var_id = cmp[3]
+                            read_base = read_seq[read_pos]
+                            read_node_seq += read_base
+                            read_node_var.append(var_id)
+                            if var_id != "unknown":
+                                if cmp_i >= cmp_list_left and cmp_i <= cmp_list_right:
+                                    positive_vars.add(var_id)
+                            
+                            cmp_MD += ("%d%s" % (MD_match_len, ref_seq[ref_pos]))
+                            MD_match_len = 0
+                            cigar_match_len += 1
+                            read_pos += 1
+                            ref_pos += 1
+                        elif type == "insertion":
+                            assert False
+                            ins_seq = read_seq[read_pos:read_pos+length]
+                            var_idx = lower_bound(Var_list[gene], ref_pos)
+                            while var_idx < len(Var_list[gene]):
+                                var_pos, var_id = Var_list[gene][var_idx]
+                                if ref_pos < var_pos:
+                                    break
+                                if ref_pos == var_pos:
+                                    var_type, _, var_data = Vars[gene][var_id]
+                                    if var_type == "insertion":                                
+                                        if var_data == ins_seq:
+                                            positive_vars.add(var_id)
+                                var_idx += 1
+                            if cigar_match_len > 0:
+                                cmp_cigar_str += ("%dM" % cigar_match_len)
+                                cigar_match_len = 0
+                            read_pos += length
+                            cmp_cigar_str += ("%dI" % length)
+                        elif type == "deletion":
+                            var_id = cmp[3]
+                            alt_match = False
+                            del_len = length
+                            read_node_seq += ('D' * del_len)
+                            if var_id != "unknown":
+                                if cmp_i >= cmp_list_left and cmp_i <= cmp_list_right:
+                                    # Require at least 5bp match before and after a deletion
+                                    if read_pos >= 5 and read_pos + 5 <= len(read_seq):
+                                        positive_vars.add(var_id)
+
+                            if len(read_node_seq) > len(read_node_var):
+                                assert len(read_node_seq) == len(read_node_var) + del_len
+                                read_node_var += ([var_id] * del_len)
+
+                            if cigar_match_len > 0:
+                                cmp_cigar_str += ("%dM" % cigar_match_len)
+                                cigar_match_len = 0
+                            cmp_MD += ("%d" % MD_match_len)
+                            MD_match_len = 0
+                            cmp_cigar_str += ("%dD" % length)
+                            cmp_MD += ("^%s" % ref_seq[ref_pos:ref_pos+length])
+                            ref_pos += length
+                        else:
+                            assert type == "intron"
+                            if cigar_match_len > 0:
+                                cmp_cigar_str += ("%dM" % cigar_match_len)
+                                cigar_match_len = 0
+                            cmp_cigar_str += ("%dN" % length)
+                            ref_pos += length
+
+                        cmp_i += 1
+                
+                    if cigar_match_len > 0:
+                        cmp_cigar_str += ("%dM" % cigar_match_len)
+                    cmp_MD += ("%d" % MD_match_len)
+                    # Sanity check
+                    if read_pos != len(read_seq) or \
+                            cmp_cigar_str != cigar_str or \
+                            cmp_MD != MD:
+                        print >> sys.stderr, "Error:", cigar_str, MD
+                        print >> sys.stderr, "\tcomputed:", cmp_cigar_str, cmp_MD
+                        print >> sys.stderr, "\tcmp list:", cmp_list
+                        assert False
+
+                    # Node
+                    read_nodes.append([orig_read_id, assembly_graph.Node(read_node_pos, read_node_seq, read_node_var)])
+
+                    for positive_var in positive_vars:
+                        if positive_var in exon_vars:
+                            add_count(HLA_count_per_read, positive_var, 1)
+                        add_count(HLA_gen_count_per_read, positive_var, 1)
+                    for negative_var in negative_vars:
+                        if negative_var in exon_vars:
+                            add_count(HLA_count_per_read, negative_var, -1)
+                        add_count(HLA_gen_count_per_read, negative_var, -1)
+
+                    prev_read_id = read_id
+                    prev_right_pos = right_pos
+
+                if num_reads <= 0:
+                    continue
+
+                if prev_read_id != None:
+                    add_stat(HLA_cmpt, HLA_counts, HLA_count_per_read, allele_rep_set)
+                    add_stat(HLA_gen_cmpt, HLA_gen_counts, HLA_gen_count_per_read)
+                    for read_id_, read_node in read_nodes:
+                        asm_graph.add_node(read_id_, read_node)
+                    read_nodes, read_var_list = [], []
+
+                # Generate edges
+                asm_graph.generate_edges()
+
+                # Draw assembly graph
+                if len(num_frag_list) > 0:
+                    asm_graph.draw("assembly_graph1", num_frag_list[0][0])
+                else:
+                    asm_graph.draw("assembly_graph1")                    
+
+                # Reduce graph
+                asm_graph.reduce()
+
+                # Draw assembly graph
+                if len(num_frag_list) > 0:
+                    asm_graph.draw("assembly_graph2", num_frag_list[0][0])
+                else:
+                    asm_graph.draw("assembly_graph2")
+                
+                # Further reduce graph with mate pairs
+                tmp_nodes = asm_graph.assemble_with_mates()
+
+                # Draw assembly graph
+                if len(num_frag_list) > 0:
+                    asm_graph.draw("assembly_graph3", num_frag_list[0][0])
+                else:
+                    asm_graph.draw("assembly_graph3")
+
+                # DK - debugging purposes
+                print >> sys.stderr, "Number of tmp nodes:", len(tmp_nodes)
+                for i in range(min(10, len(tmp_nodes))):
+                    node, node_id, node_id_last = tmp_nodes[i]
+                    node_vars = node.get_vars(Vars[gene])
+                    print >> sys.stderr, node_id, node_id_last, node.merged_nodes; node.print_info()
+                    print >> sys.stderr
+                    if simulation:
+                        allele_name, cmp_vars, max_common = "", [], -1
+                        for test_HLA_name in test_HLA_names:
+                            tmp_vars = allele_nodes[test_HLA_name].get_vars(Vars[gene])
+                            tmp_common = len(set(node_vars) & set(allele_vars[test_HLA_name]))
+                            if max_common < tmp_common:
+                                max_common = tmp_common
+                                allele_name = test_HLA_name
+                                cmp_vars = tmp_vars
+                        print >> sys.stderr, "vs.", allele_name
+                        var_i, var_j = 0, 0
+                        while var_i < len(cmp_vars) and var_j < len(node_vars):
+                            cmp_var_id, node_var_id = cmp_vars[var_i], node_vars[var_j]
+                            if cmp_var_id == node_var_id:
+                                print >> sys.stderr, cmp_var_id, Vars[gene][cmp_var_id]
+                                var_i += 1; var_j += 1
+                                continue
+                            cmp_var, node_var = Vars[gene][cmp_var_id], Vars[gene][node_var_id]
+                            if cmp_var[1] <= node_var[1]:
+                                if (var_i > 0 and var_i + 1 < len(cmp_vars)) or cmp_var[0] != "deletion":
+                                    print >> sys.stderr, "***", cmp_var_id, cmp_var, "=="
+                                var_i += 1
+                            else:
+                                print >> sys.stderr, "*** ==", node_var_id, node_var
+                                var_j += 1
+                                
+                            
+                # asm_graph.assemble()
+                
+                # DK - debugging purposes
+                sys.exit(1)
+
+            else:
+                assert index_type == "linear"
+                def add_alleles(alleles):
+                    if not allele in HLA_counts:
+                        HLA_counts[allele] = 1
+                    else:
+                        HLA_counts[allele] += 1
+
+                    cur_cmpt = sorted(list(alleles))
+                    cur_cmpt = '-'.join(cur_cmpt)
+                    if not cur_cmpt in HLA_cmpt:
+                        HLA_cmpt[cur_cmpt] = 1
+                    else:
+                        HLA_cmpt[cur_cmpt] += 1
+
+                prev_read_id, prev_AS = None, None
+                alleles = set()
+                for line in alignview_proc.stdout:
+                    cols = line[:-1].split()
+                    read_id, flag, allele = cols[:3]
+                    flag = int(flag)
+                    if flag & 0x4 != 0:
+                        continue
+                    if not allele.startswith(gene):
+                        continue
+                    if allele.find("BACKBONE") != -1:
+                        continue
+
+                    AS = None
+                    for i in range(11, len(cols)):
+                        col = cols[i]
+                        if col.startswith("AS"):
+                            AS = int(col[5:])
+                    assert AS != None
+                    if read_id != prev_read_id:
+                        if alleles:
+                            if aligner == "hisat2" or \
+                                    (aligner == "bowtie2" and len(alleles) < 10):
+                                add_alleles(alleles)
+                            alleles = set()
+                        prev_AS = None
+                    if prev_AS != None and AS < prev_AS:
+                        continue
+                    prev_read_id = read_id
+                    prev_AS = AS
+                    alleles.add(allele)
+
+                if alleles:
+                    add_alleles(alleles)
+
+            HLA_counts = [[allele, count] for allele, count in HLA_counts.items()]
+            def HLA_count_cmp(a, b):
+                if a[1] != b[1]:
+                    return b[1] - a[1]
+                assert a[0] != b[0]
+                if a[0] < b[0]:
+                    return -1
+                else:
+                    return 1
+            HLA_counts = sorted(HLA_counts, cmp=HLA_count_cmp)
+            for count_i in range(len(HLA_counts)):
+                count = HLA_counts[count_i]
+                if simulation:
+                    found = False
+                    for test_HLA_name in test_HLA_names:
+                        if count[0] == test_HLA_name:
+                            print >> sys.stderr, "\t\t\t*** %d ranked %s (count: %d)" % (count_i + 1, test_HLA_name, count[1])
+                            found = True
+                            """
+                            if count_i > 0 and HLA_counts[0][1] > count[1]:
+                                print >> sys.stderr, "Warning: %s ranked first (count: %d)" % (HLA_counts[0][0], HLA_counts[0][1])
+                                assert False
+                            else:
+                                test_passed += 1
+                            """
+                    if count_i < 5 and not found:
+                        print >> sys.stderr, "\t\t\t\t%d %s (count: %d)" % (count_i + 1, count[0], count[1])
+                else:
+                    print >> sys.stderr, "\t\t\t\t%d %s (count: %d)" % (count_i + 1, count[0], count[1])
+                    if count_i >= 9:
+                        break
+            print >> sys.stderr
+
+            # Calculate the abundance of representative alleles on exonic sequences
+            HLA_prob = single_abundance(HLA_cmpt, HLA_lengths[gene])
+
+            # Incorporate non representative alleles (full length alleles)
+            gen_alleles = set()
+            gen_prob_sum = 0.0
+            for prob_i in range(len(HLA_prob)):
+                allele, prob = HLA_prob[prob_i][:2]
+                if prob_i >= 10 and prob < 0.03:
+                    break
+                if allele in partial_alleles:
+                    continue
+                
+                gen_prob_sum += prob
+                for allele2 in allele_rep_groups[allele]:
+                    gen_alleles.add(allele2)
+            if len(gen_alleles) > 0:
+                HLA_gen_cmpt2 = {}
+                for cmpt, value in HLA_gen_cmpt.items():
+                    cmpt2 = []
+                    for allele in cmpt.split('-'):
+                        if allele in gen_alleles:
+                            cmpt2.append(allele)
+                    if len(cmpt2) == 0:
+                        continue
+                    cmpt2 = '-'.join(cmpt2)
+                    if cmpt2 not in HLA_gen_cmpt2:
+                        HLA_gen_cmpt2[cmpt2] = value
+                    else:
+                        HLA_gen_cmpt2[cmpt2] += value
+                HLA_gen_cmpt = HLA_gen_cmpt2
+                HLA_gen_prob = single_abundance(HLA_gen_cmpt, HLA_lengths[gene])
+
+                HLA_combined_prob = {}
+                for allele, prob in HLA_prob:
+                    assert allele not in HLA_combined_prob
+                    if allele in gen_alleles:
+                        HLA_combined_prob[allele] = 0.0
+                    else:
+                        HLA_combined_prob[allele] = prob
+                for allele, prob in HLA_gen_prob:
+                    HLA_combined_prob[allele] = prob * gen_prob_sum
+                HLA_prob = [[allele, prob] for allele, prob in HLA_combined_prob.items()]
+                HLA_prob = sorted(HLA_prob, cmp=HLA_prob_cmp)
+
+            success = [False for i in range(len(test_HLA_names))]
+            found_list = [False for i in range(len(test_HLA_names))]
+            for prob_i in range(len(HLA_prob)):
+                prob = HLA_prob[prob_i]
+                found = False
+                _allele_rep = prob[0]
+                if partial and exonic_only:
+                    _fields = _allele_rep.split(':')
+                    if len(_fields) == 4:
+                        _allele_rep = ':'.join(_fields[:-1])
+
+                if simulation:
+                    for name_i in range(len(test_HLA_names)):
+                        test_HLA_name = test_HLA_names[name_i]
+                        if prob[0] == test_HLA_name:
+                            rank_i = prob_i
+                            while rank_i > 0:
+                                if prob == HLA_prob[rank_i - 1][1]:
+                                    rank_i -= 1
+                                else:
+                                    break
+                            print >> sys.stderr, "\t\t\t*** %d ranked %s (abundance: %.2f%%)" % (rank_i + 1, test_HLA_name, prob[1] * 100.0)
+                            if rank_i < len(success):
+                                success[rank_i] = True
+                            found_list[name_i] = True
+                            found = True
+                    # DK - for debugging purposes
+                    if not False in found_list and prob_i >= 10:
+                        break
+                if not found:
+                    print >> sys.stderr, "\t\t\t\t%d ranked %s (abundance: %.2f%%)" % (prob_i + 1, _allele_rep, prob[1] * 100.0)
+                    if best_alleles and prob_i < 2:
+                        print >> sys.stdout, "SingleModel %s (abundance: %.2f%%)" % (_allele_rep, prob[1] * 100.0)
+                if not simulation and prob_i >= 9:
+                    break
+                if prob_i >= 19:
+                    break
+            print >> sys.stderr
+
+            # DK - for debugging purposes
+            if False and (len(test_HLA_names) == 2 or not simulation):
+                HLA_prob = joint_abundance(HLA_cmpt, HLA_lengths[gene])
+                if len(HLA_prob) <= 0:
+                    continue
+                success = [False]
+                for prob_i in range(len(HLA_prob)):
+                    allele_pair, prob = HLA_prob[prob_i]
+                    allele1, allele2 = allele_pair.split('-')
+                    if best_alleles and prob_i < 1:
+                        print >> sys.stdout, "PairModel %s (abundance: %.2f%%)" % (allele_pair, prob * 100.0)
+                    if simulation:
+                        if allele1 in test_HLA_names and allele2 in test_HLA_names:
+                            rank_i = prob_i
+                            while rank_i > 0:
+                                if HLA_prob[rank_i-1][1] == prob:                                        
+                                    rank_i -= 1
+                                else:
+                                    break
+                            print >> sys.stderr, "\t\t\t*** %d ranked %s (abundance: %.2f%%)" % (rank_i + 1, allele_pair, prob * 100.0)
+                            if rank_i == 0:
+                                success[0] = True
+                            break
+                    print >> sys.stderr, "\t\t\t\t%d ranked %s (abundance: %.2f%%)" % (prob_i + 1, allele_pair, prob * 100.0)
+                    if not simulation and prob_i >= 9:
+                        break
+                print >> sys.stderr
+
+                # Li's method
+                """
+                li_hla = os.path.join(ex_path, "li_hla/hla")
+                if os.path.exists(li_hla):
+                    li_hla_cmd = [li_hla,
+                                  "hla",
+                                  "hla_input.bam",
+                                  "-b", "%s*BACKBONE" % gene]
+                    li_hla_proc = subprocess.Popen(li_hla_cmd,
+                                                   stdout=subprocess.PIPE,
+                                                   stderr=open("/dev/null", 'w'))
+
+                    # read in the result of Li's hla
+                    for line in li_hla_proc.stdout:
+                        allele1, allele2, score = line.strip().split()
+                        score = float(score)
+                        if simulation:
+                            if allele1 in test_HLA_names and allele2 in test_HLA_names:
+                                print >> sys.stderr, "\t\t\t*** 1 ranked %s-%s (score: %.2f)" % (allele1, allele2, score)
+                                success[0] = True
+                            else:
+                                print >> sys.stderr, "\t\t\tLiModel fails"
+                        if best_alleles:
+                            print >> sys.stdout, "LiModel %s-%s (score: %.2f)" % (allele1, allele2, score)
+                    li_hla_proc.communicate()
+                """
+
+            if simulation and not False in success:
+                aligner_type = "%s %s" % (aligner, index_type)
+                if not aligner_type in test_passed:
+                    test_passed[aligner_type] = 1
+                else:
+                    test_passed[aligner_type] += 1
+
+        if remove_alignment_file:
+            os.system("rm %s*" % (alignment_fname))
+
+    if simulation:
+        return test_passed
+
+    
+"""
+"""
+def read_HLA_alleles(fname, HLAs):
+    for line in open(fname):
+        if line.startswith(">"):
+            HLA_name = line.strip().split()[0][1:]
+            HLA_gene = HLA_name.split('*')[0]
+            if not HLA_gene in HLAs:
+                HLAs[HLA_gene] = {}
+            if not HLA_name in HLAs[HLA_gene]:
+                HLAs[HLA_gene][HLA_name] = ""
+        else:
+            HLAs[HLA_gene][HLA_name] += line.strip()
+    return HLAs
+
+
+"""
+"""
+def read_HLA_vars(fname, reference_type):
+    Vars, Var_list = {}, {}
+    for line in open(fname):
+        var_id, var_type, allele, pos, data = line.strip().split('\t')
+        pos = int(pos)
+        if reference_type != "gene":
+            allele, dist = None, 0
+            for tmp_gene, values in refHLA_loci.items():
+                allele_name, chr, left, right, exons = values
+                if allele == None:
+                    allele = allele_name
+                    dist = abs(pos - left)
+                else:
+                    if dist > abs(pos - left):
+                        allele = allele_name
+                        dist = abs(pos - left)
+            
+        gene = allele.split('*')[0]
+        if not gene in Vars:
+            Vars[gene] = {}
+            assert not gene in Var_list
+            Var_list[gene] = []
+            
+        assert not var_id in Vars[gene]
+        left = 0
+        if reference_type != "gene":
+            _, _, left, _, _ = refHLA_loci[gene]
+        Vars[gene][var_id] = [var_type, pos - left, data]
+        Var_list[gene].append([pos - left, var_id])
+        
+    for gene, in_var_list in Var_list.items():
+        Var_list[gene] = sorted(in_var_list)
+
+    return Vars, Var_list
+
+
+"""
+"""
+def read_HLA_links(fname):
+    Links = {}
+    for line in open(fname):
+        var_id, alleles = line.strip().split('\t')
+        alleles = alleles.split()
+        assert not var_id in Links
+        Links[var_id] = alleles
 
-#
-# Copyright 2015, Daehwan Kim <infphilo at gmail.com>
-#
-# This file is part of HISAT 2.
-#
-# HISAT 2 is free software: you can redistribute it and/or modify
-# it under the terms of the GNU General Public License as published by
-# the Free Software Foundation, either version 3 of the License, or
-# (at your option) any later version.
-#
-# HISAT 2 is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with HISAT 2.  If not, see <http://www.gnu.org/licenses/>.
-#
+    return Links
 
 
-import sys, os, subprocess, re
-import inspect, random
-import math
-from argparse import ArgumentParser, FileType
+"""
+"""
+def construct_allele_seq(backbone_seq, var_ids, Vars):
+    allele_seq = list(backbone_seq)
+    for id in var_ids:
+        assert id in Vars
+        type, pos, data = Vars[id]
+        assert pos < len(allele_seq)
+        if type == "single":
+            assert allele_seq[pos] != data
+            allele_seq[pos] = data
+        else:
+            assert type == "deletion"
+            del_len = int(data)
+            assert pos + del_len <= len(allele_seq)
+            for i in range(pos, pos + del_len):
+                allele_seq[i] = '.'
+
+    allele_seq = ''.join(allele_seq)
+    allele_seq = allele_seq.replace('.', '')
+    return allele_seq
 
 
 """
 """
-def test_HLA_genotyping(reference_type,
+def test_HLA_genotyping(base_fname,
+                        reference_type,
                         hla_list,
                         partial,
                         aligners,
@@ -39,7 +2027,12 @@ def test_HLA_genotyping(reference_type,
                         enable_coverage,
                         best_alleles,
                         exclude_allele_list,
+                        default_allele_list,
                         num_mismatch,
+                        perbase_errorrate,
+                        assembly,
+                        concordant_assembly,
+                        exonic_only,
                         verbose,
                         daehwan_debug):
     # Current script directory
@@ -93,74 +2086,120 @@ def test_HLA_genotyping(reference_type,
         if delete_hla_files:
             os.system("rm hla*")
     
-    # Extract HLA variants, backbone sequence, and other sequeces
-    HLA_fnames = ["hla_backbone.fa",
-                  "hla_sequences.fa",
-                  "hla.ref",
-                  "hla.snp",
-                  "hla.haplotype",
-                  "hla.link"]
-
-    if not check_files(HLA_fnames):
-        extract_hla_script = os.path.join(ex_path, "hisat2_extract_HLA_vars.py")
+    # Extract HLA variants, backbone sequence, and other sequeces  
+    if len(base_fname) > 0:
+        base_fname = "_" + base_fname
+    base_fname = "hla" + base_fname
+    
+    HLA_fnames = [base_fname + "_backbone.fa",
+                  base_fname + "_sequences.fa",
+                  base_fname + ".ref",
+                  base_fname + ".snp",
+                  base_fname + ".haplotype",
+                  base_fname + ".link",
+                  base_fname + "_alleles_excluded.txt"]
+    
+    # Check if excluded alleles in current files match
+    excluded_alleles_match = False
+    if(os.path.exists(HLA_fnames[6])):
+        afile = open(HLA_fnames[6],'r')
+        afile.readline()
+        lines = afile.read().split()
+        excluded_alleles_match = (set(exclude_allele_list) == set(lines))
+        afile.close()
+    elif len(exclude_allele_list) == 0:
+        excluded_alleles_match = True
+        try:
+            temp_name = HLA_fnames[6]
+            HLA_fnames.remove(HLA_fnames[6])
+            os.remove(temp_name)
+        except OSError:
+            pass
+        
+    if not excluded_alleles_match:
+        print("Creating Allele Exclusion File.\n")
+        afile = open(HLA_fnames[6],'w')
+        afile.write("Alleles excluded:\n")
+        afile.write("\n".join(exclude_allele_list))
+        afile.close()
+
+    if verbose >= 1:
+        print >> sys.stderr, HLA_fnames
+    
+    if (not check_files(HLA_fnames)) or (not excluded_alleles_match) :
+        extract_hla_script = os.path.join(ex_path, "hisatgenotype_extract_vars.py")
         extract_cmd = [extract_hla_script,
                        "--reference-type", reference_type,
                        "--hla-list", ','.join(hla_list)]
-        if partial:
-            extract_cmd += ["--partial"]
-        extract_cmd += ["--gap", "30",
-                        "--split", "50"]
-        if verbose:
+
+        if len(exclude_allele_list) > 0:
+            print exclude_allele_list
+            extract_cmd += ["--exclude-allele-list", ",".join(exclude_allele_list)]
+
+        if len(base_fname) > 3:
+            extract_cmd += ["--base", base_fname]
+
+        if not partial:
+            extract_cmd += ["--no-partial"]
+        extract_cmd += ["--inter-gap", "30",
+                        "--intra-gap", "50"]
+        if verbose >= 1:
             print >> sys.stderr, "\tRunning:", ' '.join(extract_cmd)
         proc = subprocess.Popen(extract_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
         proc.communicate()
+        
         if not check_files(HLA_fnames):
             print >> sys.stderr, "Error: extract_HLA_vars failed!"
             sys.exit(1)
 
-    # Build HISAT2 graph indexes based on the above information
-    HLA_hisat2_graph_index_fnames = ["hla.graph.%d.ht2" % (i+1) for i in range(8)]
-    if not check_files(HLA_hisat2_graph_index_fnames):
-        hisat2_build = os.path.join(ex_path, "hisat2-build")
-        build_cmd = [hisat2_build,
-                     "-p", str(threads),
-                     "--snp", "hla.snp",
-                     "--haplotype", "hla.haplotype",
-                     "hla_backbone.fa",
-                     "hla.graph"]
-        if verbose:
-            print >> sys.stderr, "\tRunning:", ' '.join(build_cmd)
-        proc = subprocess.Popen(build_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
-        proc.communicate()        
-        if not check_files(HLA_hisat2_graph_index_fnames):
-            print >> sys.stderr, "Error: indexing HLA failed!  Perhaps, you may have forgotten to build hisat2 executables?"
-            sys.exit(1)
-
-    # Build HISAT2 linear indexes based on the above information
-    HLA_hisat2_linear_index_fnames = ["hla.linear.%d.ht2" % (i+1) for i in range(8)]
-    if reference_type == "gene" and not check_files(HLA_hisat2_linear_index_fnames):
-        hisat2_build = os.path.join(ex_path, "hisat2-build")
-        build_cmd = [hisat2_build,
-                     "hla_backbone.fa,hla_sequences.fa",
-                     "hla.linear"]
-        proc = subprocess.Popen(build_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
-        proc.communicate()        
-        if not check_files(HLA_hisat2_graph_index_fnames):
-            print >> sys.stderr, "Error: indexing HLA failed!"
-            sys.exit(1)
-
-    # Build Bowtie2 indexes based on the above information
-    HLA_bowtie2_index_fnames = ["hla.%d.bt2" % (i+1) for i in range(4)]
-    HLA_bowtie2_index_fnames += ["hla.rev.%d.bt2" % (i+1) for i in range(2)]
-    if reference_type == "gene" and not check_files(HLA_bowtie2_index_fnames):
-        build_cmd = ["bowtie2-build",
-                     "hla_backbone.fa,hla_sequences.fa",
-                     "hla"]
-        proc = subprocess.Popen(build_cmd, stdout=open("/dev/null", 'w'))
-        proc.communicate()        
-        if not check_files(HLA_bowtie2_index_fnames):
-            print >> sys.stderr, "Error: indexing HLA failed!"
-            sys.exit(1)
+    for aligner, index_type in aligners:
+        if aligner == "hisat2":
+            # Build HISAT2 graph indexes based on the above information
+            if index_type == "graph":
+                HLA_hisat2_graph_index_fnames = ["%s.graph.%d.ht2" % (base_fname, i+1) for i in range(8)]
+                if not check_files(HLA_hisat2_graph_index_fnames) or (not excluded_alleles_match):
+                    hisat2_build = os.path.join(ex_path, "hisat2-build")
+                    build_cmd = [hisat2_build,
+                                 "-p", str(threads),
+                                 "--snp", "%s.snp" % base_fname,
+                                 "--haplotype", "%s.haplotype" % base_fname,
+                                 "%s_backbone.fa" % base_fname,
+                                 "%s.graph" % base_fname]
+                    if verbose >= 1:
+                        print >> sys.stderr, "\tRunning:", ' '.join(build_cmd)
+                    proc = subprocess.Popen(build_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
+                    proc.communicate()        
+                    if not check_files(HLA_hisat2_graph_index_fnames):
+                        print >> sys.stderr, "Error: indexing HLA failed!  Perhaps, you may have forgotten to build hisat2 executables?"
+                        sys.exit(1)
+            # Build HISAT2 linear indexes based on the above information
+            else:
+                assert index_type == "linear"
+                HLA_hisat2_linear_index_fnames = ["%s.linear.%d.ht2" % (base_fname, i+1) for i in range(8)]
+                if reference_type == "gene" and (not check_files(HLA_hisat2_linear_index_fnames) or (not excluded_alleles_match)):
+                    hisat2_build = os.path.join(ex_path, "hisat2-build")
+                    build_cmd = [hisat2_build,
+                                 "%s_backbone.fa,%s_sequences.fa" % (base_fname, base_fname),
+                                 "%s.linear" % base_fname]
+                    proc = subprocess.Popen(build_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
+                    proc.communicate()        
+                    if not check_files(HLA_hisat2_linear_index_fnames):
+                        print >> sys.stderr, "Error: indexing HLA failed!"
+                        sys.exit(1)
+        else:
+            assert aligner == "bowtie2" and index_type == "linear"
+            # Build Bowtie2 indexes based on the above information
+            HLA_bowtie2_index_fnames = ["%s.%d.bt2" % (base_fname, i+1) for i in range(4)]
+            HLA_bowtie2_index_fnames += ["%s.rev.%d.bt2" % (base_fname, i+1) for i in range(2)]
+            if reference_type == "gene" and (not check_files(HLA_bowtie2_index_fnames) or (not excluded_alleles_match)):
+                build_cmd = ["bowtie2-build",
+                             "%s_backbone.fa,%s_sequences.fa" % (base_fname, base_fname),
+                             base_fname]
+                proc = subprocess.Popen(build_cmd, stdout=open("/dev/null", 'w'))
+                proc.communicate()        
+                if not check_files(HLA_bowtie2_index_fnames):
+                    print >> sys.stderr, "Error: indexing HLA failed!"
+                    sys.exit(1)
 
     # Read partial alleles from hla.data (temporary)
     partial_alleles = set()
@@ -172,6 +2211,33 @@ def test_HLA_genotyping(reference_type,
         if line.find("partial") != -1:
             partial_alleles.add(allele_name)
 
+    if len(default_allele_list) > 0:
+        if not os.path.exists("Default-HLA/hla_backbone.fa"):
+            try:
+                os.mkdir("Default-HLA")
+            except:
+                pass
+            #os.chdir(current_path + "/Default-HLA")
+            
+            extract_hla_script = os.path.join(ex_path, "hisatgenotype_extract_vars.py")
+            extract_cmd = [extract_hla_script,
+                           "--reference-type", reference_type,
+                           "--hla-list", ','.join(hla_list),
+                           "--base", "Default-HLA/hla"]
+
+            if not partial:
+                extract_cmd += ["--no-partial"]
+            extract_cmd += ["--inter-gap", "30",
+                            "--intra-gap", "50"]
+            if verbose >= 1:
+                print >> sys.stderr, "\tRunning:", ' '.join(extract_cmd)
+            proc = subprocess.Popen(extract_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
+            proc.communicate()
+            
+            if not os.path.exists("Default-HLA/hla_backbone.fa"):
+                print >> sys.stderr, "Error: extract_HLA_vars (Default) failed!"
+                sys.exit(1)
+    
     # Read HLA alleles (names and sequences)
     refHLAs, refHLA_loci = {}, {}
     for line in open("hla.ref"):
@@ -186,22 +2252,11 @@ def test_HLA_genotyping(reference_type,
             exons.append([int(exon_left), int(exon_right)])
         refHLA_loci[HLA_gene] = [HLA_name, chr, left, right, exons]
     HLAs = {}
-    def read_HLA_alleles(fname, HLAs):
-        for line in open(fname):
-            if line.startswith(">"):
-                HLA_name = line.strip().split()[0][1:]
-                HLA_gene = HLA_name.split('*')[0]
-                if not HLA_gene in HLAs:
-                    HLAs[HLA_gene] = {}
-                if not HLA_name in HLAs[HLA_gene]:
-                    HLAs[HLA_gene][HLA_name] = ""
-            else:
-                HLAs[HLA_gene][HLA_name] += line.strip()
-        return HLAs
-    if reference_type == "gene":
-        read_HLA_alleles("hla_backbone.fa", HLAs)
-    read_HLA_alleles("hla_sequences.fa", HLAs)
 
+    if reference_type == "gene":
+        read_HLA_alleles(base_fname + "_backbone.fa", HLAs)
+    read_HLA_alleles(base_fname + "_sequences.fa", HLAs)
+    
     # HLA gene alleles
     HLA_names = {}
     for HLA_gene, data in HLAs.items():
@@ -214,72 +2269,40 @@ def test_HLA_genotyping(reference_type,
         for allele_name, seq in HLA_alleles.items():
             HLA_lengths[HLA_gene][allele_name] = len(seq)
 
-    # Read HLA variants, and link information
-    Vars, Var_list = {}, {}
-    for line in open("hla.snp"):
-        var_id, var_type, allele, pos, data = line.strip().split('\t')
-        pos = int(pos)
-        if reference_type != "gene":
-            allele, dist = None, 0
-            for tmp_gene, values in refHLA_loci.items():
-                allele_name, chr, left, right, exons = values
-                if allele == None:
-                    allele = allele_name
-                    dist = abs(pos - left)
-                else:
-                    if dist > abs(pos - left):
-                        allele = allele_name
-                        dist = abs(pos - left)
-            
-        gene = allele.split('*')[0]
-        if not gene in Vars:
-            Vars[gene] = {}
-            assert not gene in Var_list
-            Var_list[gene] = []
-            
-        assert not var_id in Vars[gene]
-        left = 0
-        if reference_type != "gene":
-            _, _, left, _, _ = refHLA_loci[gene]
-        Vars[gene][var_id] = [var_type, pos - left, data]
-        Var_list[gene].append([pos - left, var_id])
+    # Construct excluded alleles (Via default backbone data)
+    custom_allele_check = False
+    if len(default_allele_list) > 0:
+        custom_allele_check = True
+        HLAs_default = {}
+        read_HLA_alleles("Default-HLA/hla_backbone.fa", HLAs_default)
+        read_HLA_alleles("Default-HLA/hla_sequences.fa", HLAs_default)
+        #HLA_lengths_default = {}
+
+        for HLA_gene, HLA_alleles in HLAs_default.items():
+            for allele_name, seq in HLA_alleles.items():
+                if allele_name in default_allele_list:
+                    HLA_lengths[HLA_gene][allele_name] = len(seq)
         
-    for gene, in_var_list in Var_list.items():
-        Var_list[gene] = sorted(in_var_list)
-    def lower_bound(Var_list, pos):
-        low, high = 0, len(Var_list)
-        while low < high:
-            m = (low + high) / 2
-            m_pos = Var_list[m][0]
-            if m_pos < pos:
-                low = m + 1
-            elif m_pos > pos:
-                high = m
-            else:
-                assert m_pos == pos
-                while m > 0:
-                    if Var_list[m-1][0] < pos:
-                        break
-                    m -= 1
-                return m
-        return low        
-            
-    Links = {}
-    for line in open("hla.link"):
-        var_id, alleles = line.strip().split('\t')
-        alleles = alleles.split()
-        assert not var_id in Links
-        Links[var_id] = alleles
+        #for allele_name, seq in HLAs_default.items():
+         #   if allele_name in default_allele_list:
+          #      HLA_lengths[allele_name] = len(seq)
+            #if (allele_name in default_allele_list):
+            #    HLA_lengths_default[allele_name] = len(seq)
+    else:
+        HLAs_default = HLAs
 
-    # Scoring schemes from Sangtae Kim (Illumina)'s implementation
-    max_qual_value = 100
-    match_score, mismatch_score = [0] * max_qual_value, [0] * max_qual_value
-    for qual in range(max_qual_value):
-        error_rate = 0.1 ** (qual / 10.0)
-        match_score[qual] = math.log(1.000000000001 - error_rate);
-        mismatch_score[qual] = math.log(error_rate / 3.0);
-        
-    # Test HLA genotyping
+    # Read HLA variants, and link information
+    Vars, Var_list = read_HLA_vars("%s.snp" % base_fname, reference_type)
+    Links = read_HLA_links("%s.link" % base_fname)
+    Vars_default, Var_list_default, Links_default = {}, {}, {}
+    if len(default_allele_list) > 0:
+        Vars_default, Var_list_default = read_HLA_vars("Default-HLA/hla.snp", reference_type)
+        Links_default = read_HLA_links("Default-HLA/hla.link")
+    else:
+        Vars_default, Var_list_default = Vars, Var_list
+        Links_default = Links
+
+    # Test HLA typing
     test_list = []
     if simulation:
         basic_test, pair_test = True, False
@@ -293,954 +2316,183 @@ def test_HLA_genotyping(reference_type,
         test_list = []
         genes = list(set(hla_list) & set(HLA_names.keys()))
         if basic_test:
-            for gene in genes:
-                HLA_gene_alleles = HLA_names[gene]
-                for HLA_name in HLA_gene_alleles:
-                    if HLA_name.find("BACKBONE") != -1:
-                        continue
-                    test_list.append([[HLA_name]])
+            if custom_allele_check:
+                for allele in default_allele_list:
+                    test_list.append([[allele]])
+            else:
+                for gene in genes:
+                    HLA_gene_alleles = HLA_names[gene]
+                    for HLA_name in HLA_gene_alleles:
+                        if HLA_name.find("BACKBONE") != -1:
+                            continue
+                        test_list.append([[HLA_name]])
         if pair_test:
             test_size = 500
             allele_count = 2
-            for test_i in range(test_size):
-                test_pairs = []
-                for gene in genes:
-                    HLA_gene_alleles = []
-                    for allele in HLA_names[gene]:
-                        if allele.find("BACKBONE") != -1:
-                            continue
-                        HLA_gene_alleles.append(allele)
-                    nums = [i for i in range(len(HLA_gene_alleles))]
-                    random.shuffle(nums)
-                    test_pairs.append(sorted([HLA_gene_alleles[nums[i]] for i in range(allele_count)]))
-                test_list.append(test_pairs)
-    else:
-        test_list = [hla_list]
-
-    for test_i in range(len(test_list)):
-        if "test_id" in daehwan_debug:
-            daehwan_test_ids = daehwan_debug["test_id"].split('-')
-            if str(test_i + 1) not in daehwan_test_ids:
-                continue
-
-        print >> sys.stderr, "Test %d" % (test_i + 1)
-        test_HLA_list = test_list[test_i]
-
-        # daehwan - for debugging purposes
-        # test_HLA_list = [["A*11:50Q", "A*11:01:01:01", "A*01:01:01:01"]]
-        for test_HLA_names in test_HLA_list:
-            if simulation:
-                for test_HLA_name in test_HLA_names:
-                    gene = test_HLA_name.split('*')[0]
-                    test_HLA_seq = HLAs[gene][test_HLA_name]
-                    seq_type = "partial" if test_HLA_name in partial_alleles else "full"
-                    print >> sys.stderr, "\t%s - %d bp (%s sequence)" % (test_HLA_name, len(test_HLA_seq), seq_type)
-            else:
-                print >> sys.stderr, "\t%s" % (test_HLA_names)
-                
-            
-        if simulation:
-            HLA_reads_1, HLA_reads_2 = [], []
-            for test_HLA_names in test_HLA_list:
-                gene = test_HLA_names[0].split('*')[0]
-                ref_allele = refHLAs[gene]
-                ref_seq = HLAs[gene][ref_allele]
-
-                # Simulate reads from two HLA alleles
-                def simulate_reads(seq, simulate_interval = 1, frag_len = 250, read_len = 100):
-                    comp_table = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
-                    reads_1, reads_2 = [], []
-                    for i in range(0, len(seq) - frag_len + 1, simulate_interval):
-                        reads_1.append(seq[i:i+read_len])
-                        tmp_read_2 = reversed(seq[i+frag_len-read_len:i+frag_len])
-                        read_2 = ""
-                        for s in tmp_read_2:
-                            if s in comp_table:
-                                read_2 += comp_table[s]
-                            else:
-                                read_2 += s
-                        reads_2.append(read_2)
-                    return reads_1, reads_2
-
-                for test_HLA_name in test_HLA_names:
-                    HLA_seq = HLAs[gene][test_HLA_name]
-                    tmp_reads_1, tmp_reads_2 = simulate_reads(HLA_seq, simulate_interval)
-                    HLA_reads_1 += tmp_reads_1
-                    HLA_reads_2 += tmp_reads_2
-
-            # Write reads into a fasta read file
-            def write_reads(reads, idx):
-                read_file = open('hla_input_%d.fa' % idx, 'w')
-                for read_i in range(len(reads)):
-                    print >> read_file, ">%d" % (read_i + 1)
-                    print >> read_file, reads[read_i]
-                read_file.close()
-            write_reads(HLA_reads_1, 1)
-            write_reads(HLA_reads_2, 2)
-
-        for aligner, index_type in aligners:
-            if index_type == "graph":
-                print >> sys.stderr, "\n\t\t%s %s on %s" % (aligner, index_type, reference_type)
+            if custom_allele_check:
+                if (default_allele_list) < allele_count:
+                    print >> sys.stderr, "# of default alleles (%d) is at least %d" % (len(defeault_allele_list), allele_count)
+                    sys.exit(1)
+                    
+                for test_i in range(1):
+                    random.shuffle(default_allele_list)
+                    test_pair = [default_allele_list[:allele_count]]
+                    test_list.append(test_pair)
             else:
-                print >> sys.stderr, "\n\t\t%s %s" % (aligner, index_type)
-
-            if alignment_fname == "":
-                # Align reads, and sort the alignments into a BAM file
-                if aligner == "hisat2":
-                    hisat2 = os.path.join(ex_path, "hisat2")
-                    aligner_cmd = [hisat2,
-                                   "--no-unal",
-                                   "--mm"]
-                    if index_type == "linear":
-                        aligner_cmd += ["-k", "10"]
-                    aligner_cmd += ["-x", "hla.%s" % index_type]
-                elif aligner == "bowtie2":
-                    aligner_cmd = [aligner,
-                                   "--no-unal",
-                                   "-k", "10",
-                                   "-x", "hla"]
-                else:
-                    assert False
-                if simulation:
-                    if "test_id" in daehwan_debug:
-                        aligner_cmd += ["-f", "hla_input_1.fa"]
-                    else:
-                        aligner_cmd += ["-f",
-                                        "-1", "hla_input_1.fa",
-                                        "-2", "hla_input_2.fa"]
-                else:
-                    assert len(read_fname) in [1,2]
-                    aligner_cmd += ["-p", str(threads)]
-                    if len(read_fname) == 1:
-                        aligner_cmd += [read_fname[0]]
-                    else:
-                        aligner_cmd += ["-1", "%s" % read_fname[0],
-                                        "-2", "%s" % read_fname[1]]
-
-                align_proc = subprocess.Popen(aligner_cmd,
-                                              stdout=subprocess.PIPE,
-                                              stderr=open("/dev/null", 'w'))
-
-                sambam_cmd = ["samtools",
-                              "view",
-                              "-bS",
-                              "-"]
-                sambam_proc = subprocess.Popen(sambam_cmd,
-                                               stdin=align_proc.stdout,
-                                               stdout=open("hla_input_unsorted.bam", 'w'),
-                                               stderr=open("/dev/null", 'w'))
-                sambam_proc.communicate()
-                if index_type == "graph":
-                    bamsort_cmd = ["samtools",
-                                   "sort",
-                                   "hla_input_unsorted.bam",
-                                   "hla_input"]
-                    bamsort_proc = subprocess.Popen(bamsort_cmd,
-                                                    stderr=open("/dev/null", 'w'))
-                    bamsort_proc.communicate()
-
-                    bamindex_cmd = ["samtools",
-                                    "index",
-                                    "hla_input.bam"]
-                    bamindex_proc = subprocess.Popen(bamindex_cmd,
-                                                     stderr=open("/dev/null", 'w'))
-                    bamindex_proc.communicate()
-
-                    os.system("rm hla_input_unsorted.bam")            
-                else:
-                    os.system("mv hla_input_unsorted.bam hla_input.bam")
-
-            for test_HLA_names in test_HLA_list:
-                if simulation:
-                    gene = test_HLA_names[0].split('*')[0]
-                else:
-                    gene = test_HLA_names
-                ref_allele = refHLAs[gene]
-                ref_seq = HLAs[gene][ref_allele]
-                ref_exons = refHLA_loci[gene][-1]
-
-                # Read alignments
-                alignview_cmd = ["samtools",
-                                 "view"]
-                if alignment_fname == "":
-                    alignview_cmd += ["hla_input.bam"]
-                else:
-                    if not os.path.exists(alignment_fname + ".bai"):
-                        os.system("samtools index %s" % alignment_fname)
-                    alignview_cmd += [alignment_fname]
-                base_locus = 0
-                if index_type == "graph":
-                    if reference_type == "gene":
-                        alignview_cmd += ["%s" % ref_allele]
-                    else:
-                        assert reference_type in ["chromosome", "genome"]
-                        _, chr, left, right, _ = refHLA_loci[gene]
-                        base_locus = left
-                        alignview_cmd += ["%s:%d-%d" % (chr, left + 1, right + 1)]
-
-                    bamview_proc = subprocess.Popen(alignview_cmd,
-                                                    stdout=subprocess.PIPE,
-                                                    stderr=open("/dev/null", 'w'))
-
-                    sort_read_cmd = ["sort", "-k", "1", "-n"]
-                    alignview_proc = subprocess.Popen(sort_read_cmd,
-                                                      stdin=bamview_proc.stdout,
-                                                      stdout=subprocess.PIPE,
-                                                      stderr=open("/dev/null", 'w'))
-                else:
-                    alignview_proc = subprocess.Popen(alignview_cmd,
-                                                 stdout=subprocess.PIPE,
-                                                 stderr=open("/dev/null", 'w'))
-
-                # Count alleles
-                HLA_counts, HLA_cmpt = {}, {}
-                coverage = [0 for i in range(len(ref_seq) + 1)]
-                num_reads, total_read_len = 0, 0
-                prev_read_id = None
-                prev_exon = False
-                if index_type == "graph":
-                    # Cigar regular expression
-                    cigar_re = re.compile('\d+\w')
-                    for line in alignview_proc.stdout:
-                        cols = line.strip().split()
-                        read_id, flag, chr, pos, mapQ, cigar_str = cols[:6]
-                        read_seq, qual = cols[9], cols[10]
-                        num_reads += 1
-                        total_read_len += len(read_seq)
-                        flag, pos = int(flag), int(pos)
-                        pos -= (base_locus + 1)
-                        if pos < 0:
-                            continue
+                for test_i in range(test_size):
+                    test_pairs = []
+                    for gene in genes:
+                        HLA_gene_alleles = []
 
-                        if flag & 0x4 != 0:
-                            continue
-
-                        NM, Zs, MD = "", "", ""
-                        for i in range(11, len(cols)):
-                            col = cols[i]
-                            if col.startswith("Zs"):
-                                Zs = col[5:]
-                            elif col.startswith("MD"):
-                                MD = col[5:]
-                            elif col.startswith("NM"):
-                                NM = int(col[5:])
-
-                        if NM > num_mismatch:
-                            continue
-
-                        # daehwan - for debugging purposes
-                        debug = False
-                        if read_id in ["2339"] and False:
-                            debug = True
-                            print "read_id: %s)" % read_id, pos, cigar_str, "NM:", NM, MD, Zs
-                            print "            ", read_seq
-
-                        vars = []
-                        if Zs:
-                            vars = Zs.split(',')
-
-                        assert MD != ""
-                        MD_str_pos, MD_len = 0, 0
-                        read_pos, left_pos = 0, pos
-                        right_pos = left_pos
-                        cigars = cigar_re.findall(cigar_str)
-                        cigars = [[cigar[-1], int(cigar[:-1])] for cigar in cigars]
-                        cmp_list = []
-                        for i in range(len(cigars)):
-                            cigar_op, length = cigars[i]
-                            if cigar_op == 'M':
-                                # Update coverage
-                                if enable_coverage:
-                                    if right_pos + length < len(coverage):
-                                        coverage[right_pos] += 1
-                                        coverage[right_pos + length] -= 1
-                                    elif right_pos < len(coverage):
-                                        coverage[right_pos] += 1
-                                        coverage[-1] -= 1
-
-                                first = True
-                                MD_len_used = 0
-                                while True:
-                                    if not first or MD_len == 0:
-                                        if MD[MD_str_pos].isdigit():
-                                            num = int(MD[MD_str_pos])
-                                            MD_str_pos += 1
-                                            while MD_str_pos < len(MD):
-                                                if MD[MD_str_pos].isdigit():
-                                                    num = num * 10 + int(MD[MD_str_pos])
-                                                    MD_str_pos += 1
-                                                else:
-                                                    break
-                                            MD_len += num
-                                    # Insertion or full match followed
-                                    if MD_len >= length:
-                                        MD_len -= length
-                                        cmp_list.append(["match", right_pos + MD_len_used, length - MD_len_used])
-                                        break
-                                    first = False
-                                    read_base = read_seq[read_pos + MD_len]
-                                    MD_ref_base = MD[MD_str_pos]
-                                    MD_str_pos += 1
-                                    assert MD_ref_base in "ACGT"
-                                    cmp_list.append(["match", right_pos + MD_len_used, MD_len - MD_len_used])
-                                    cmp_list.append(["mismatch", right_pos + MD_len, 1])
-                                    MD_len_used = MD_len + 1
-                                    MD_len += 1
-                                    # Full match
-                                    if MD_len == length:
-                                        MD_len = 0
-                                        break
-                            elif cigar_op == 'I':
-                                cmp_list.append(["insertion", right_pos, length])
-                            elif cigar_op == 'D':
-                                if MD[MD_str_pos] == '0':
-                                    MD_str_pos += 1
-                                assert MD[MD_str_pos] == '^'
-                                MD_str_pos += 1
-                                while MD_str_pos < len(MD):
-                                    if not MD[MD_str_pos] in "ACGT":
-                                        break
-                                    MD_str_pos += 1
-                                cmp_list.append(["deletion", right_pos, length])
-                            elif cigar_op == 'S':
-                                cmp_list.append(["soft", right_pos, length])
-                            else:                    
-                                assert cigar_op == 'N'
-                                cmp_list.append(["intron", right_pos, length])
-
-                            if cigar_op in "MND":
-                                right_pos += length
-
-                            if cigar_op in "MIS":
-                                read_pos += length
-
-                        exon = False
-                        for exon in ref_exons:
-                            exon_left, exon_right = exon
-                            if right_pos <= exon_left or pos > exon_right:
-                                continue
-                            else:
-                                exon = True
-                                break
-
-                        if right_pos > len(ref_seq):
-                            continue
-
-                        def add_stat(HLA_cmpt, HLA_counts, HLA_count_per_read, exon = True):
-                            max_count = max(HLA_count_per_read.values())
-                            cur_cmpt = set()
-                            for allele, count in HLA_count_per_read.items():
-                                if count < max_count:
-                                    continue
-                                if allele in exclude_allele_list:
-                                    continue                                
-                                cur_cmpt.add(allele)                    
-                                if not allele in HLA_counts:
-                                    HLA_counts[allele] = 1
-                                else:
-                                    HLA_counts[allele] += 1
-
-                            if len(cur_cmpt) == 0:
-                                return
-                            
-                            # daehwan - for debugging purposes                            
-                            alleles = ["", ""]
-                            # alleles = ["B*40:304", "B*40:02:01"]
-                            allele1_found, allele2_found = False, False
-                            for allele, count in HLA_count_per_read.items():
-                                if count < max_count:
-                                    continue
-                                if allele == alleles[0]:
-                                    allele1_found = True
-                                elif allele == alleles[1]:
-                                    allele2_found = True
-                            if allele1_found != allele2_found:
-                                print alleles[0], HLA_count_per_read[alleles[0]]
-                                print alleles[1], HLA_count_per_read[alleles[1]]
-                                if allele1_found:
-                                    print ("%s\tread_id %s - %d vs. %d]" % (alleles[0], prev_read_id, max_count, HLA_count_per_read[alleles[1]]))
-                                else:
-                                    print ("%s\tread_id %s - %d vs. %d]" % (alleles[1], prev_read_id, max_count, HLA_count_per_read[alleles[0]]))
-                                print read_seq
-
-                            cur_cmpt = sorted(list(cur_cmpt))
-                            cur_cmpt = '-'.join(cur_cmpt)
-                            add = 1
-                            if partial and not exon:
-                                add *= 0.2
-                            if not cur_cmpt in HLA_cmpt:
-                                HLA_cmpt[cur_cmpt] = add
-                            else:
-                                HLA_cmpt[cur_cmpt] += add
-
-                        if read_id != prev_read_id:
-                            if prev_read_id != None:
-                                add_stat(HLA_cmpt, HLA_counts, HLA_count_per_read, prev_exon)
-                                
-                            HLA_count_per_read = {}
-                            for HLA_name in HLA_names[gene]:
-                                if HLA_name.find("BACKBONE") != -1:
-                                    continue
-                                HLA_count_per_read[HLA_name] = 0
-
-                        def add_count(var_id, add):
-                            assert var_id in Links
-                            alleles = Links[var_id]
-                            for allele in alleles:
-                                if allele.find("BACKBONE") != -1:
-                                    continue
-                                HLA_count_per_read[allele] += add
-                                # daehwan - for debugging purposes
-                                if debug:
-                                    if allele in ["DQA1*05:05:01:01", "DQA1*05:05:01:02"]:
-                                        print allele, add, var_id
-
-                        # Decide which allele(s) a read most likely came from
-                        # also sanity check - read length, cigar string, and MD string
-                        for var_id, data in Vars[gene].items():
-                            var_type, var_pos, var_data = data
-                            if var_type != "deletion":
+                        for allele in HLA_names[gene]:
+                            if allele.find("BACKBONE") != -1:
                                 continue
-                            if left_pos >= var_pos and right_pos <= var_pos + int(var_data):
-                                add_count(var_id, -1)                            
-                        ref_pos, read_pos, cmp_cigar_str, cmp_MD = left_pos, 0, "", ""
-                        cigar_match_len, MD_match_len = 0, 0            
-                        for cmp in cmp_list:
-                            type = cmp[0]
-                            length = cmp[2]
-                            if type == "match":
-                                var_idx = lower_bound(Var_list[gene], ref_pos)
-                                while var_idx < len(Var_list[gene]):
-                                    var_pos, var_id = Var_list[gene][var_idx]
-                                    if ref_pos + length <= var_pos:
-                                        break
-                                    if ref_pos <= var_pos:
-                                        var_type, _, var_data = Vars[gene][var_id]
-                                        if var_type == "insertion":
-                                            if ref_pos < var_pos and ref_pos + length > var_pos + len(var_data):
-                                                add_count(var_id, -1)
-                                                # daehwan - for debugging purposes
-                                                if debug:
-                                                    print cmp, var_id, Links[var_id]
-                                        elif var_type == "deletion":
-                                            del_len = int(var_data)
-                                            if ref_pos < var_pos and ref_pos + length > var_pos + del_len:
-                                                # daehwan - for debugging purposes
-                                                if debug:
-                                                    print cmp, var_id, Links[var_id], -1, Vars[gene][var_id]
-                                                # Check if this might be one of the two tandem repeats (the same left coordinate)
-                                                cmp_left, cmp_right = cmp[1], cmp[1] + cmp[2]
-                                                test1_seq1 = ref_seq[cmp_left:cmp_right]
-                                                test1_seq2 = ref_seq[cmp_left:var_pos] + ref_seq[var_pos + del_len:cmp_right + del_len]
-                                                # Check if this happens due to small repeats (the same right coordinate - e.g. 19 times of TTTC in DQA1*05:05:01:02)
-                                                cmp_left -= read_pos
-                                                cmp_right += (len(read_seq) - read_pos - cmp[2])
-                                                test2_seq1 = ref_seq[cmp_left+int(var_data):cmp_right]
-                                                test2_seq2 = ref_seq[cmp_left:var_pos] + ref_seq[var_pos+int(var_data):cmp_right]
-                                                if test1_seq1 != test1_seq2 and test2_seq1 != test2_seq2:
-                                                    add_count(var_id, -1)
-                                        else:
-                                            if debug:
-                                                print cmp, var_id, Links[var_id], -1
-                                            add_count(var_id, -1)
-                                    var_idx += 1
-
-                                read_pos += length
-                                ref_pos += length
-                                cigar_match_len += length
-                                MD_match_len += length
-                            elif type == "mismatch":
-                                read_base = read_seq[read_pos]
-                                var_idx = lower_bound(Var_list[gene], ref_pos)
-                                while var_idx < len(Var_list[gene]):
-                                    var_pos, var_id = Var_list[gene][var_idx]
-                                    if ref_pos < var_pos:
-                                        break
-                                    if ref_pos == var_pos:
-                                        var_type, _, var_data = Vars[gene][var_id]
-                                        if var_type == "single":
-                                            if var_data == read_base:
-                                                # daehwan - for debugging purposes
-                                                if debug:
-                                                    print cmp, var_id, 1, var_data, read_base, Links[var_id]
-
-                                                # daehwan - for debugging purposes
-                                                if False:
-                                                    read_qual = ord(qual[read_pos])
-                                                    add_count(var_id, (read_qual - 60) / 60.0)
-                                                else:
-                                                    add_count(var_id, 1)
-                                            # daehwan - check out if this routine is appropriate
-                                            # else:
-                                            #    add_count(var_id, -1)
-                                    var_idx += 1
-
-                                cmp_MD += ("%d%s" % (MD_match_len, ref_seq[ref_pos]))
-                                MD_match_len = 0
-                                cigar_match_len += 1
-                                read_pos += 1
-                                ref_pos += 1
-                            elif type == "insertion":
-                                ins_seq = read_seq[read_pos:read_pos+length]
-                                var_idx = lower_bound(Var_list[gene], ref_pos)
-                                # daehwan - for debugging purposes
-                                if debug:
-                                    print left_pos, cigar_str, MD, vars
-                                    print ref_pos, ins_seq, Var_list[gene][var_idx], Vars[gene][Var_list[gene][var_idx][1]]
-                                    # sys.exit(1)
-                                while var_idx < len(Var_list[gene]):
-                                    var_pos, var_id = Var_list[gene][var_idx]
-                                    if ref_pos < var_pos:
-                                        break
-                                    if ref_pos == var_pos:
-                                        var_type, _, var_data = Vars[gene][var_id]
-                                        if var_type == "insertion":                                
-                                            if var_data == ins_seq:
-                                                # daehwan - for debugging purposes
-                                                if debug:
-                                                    print cmp, var_id, 1, Links[var_id]
-                                                add_count(var_id, 1)
-                                    var_idx += 1
-
-                                if cigar_match_len > 0:
-                                    cmp_cigar_str += ("%dM" % cigar_match_len)
-                                    cigar_match_len = 0
-                                read_pos += length
-                                cmp_cigar_str += ("%dI" % length)
-                            elif type == "deletion":
-                                del_len = length
-                                # Deletions can be shifted bidirectionally
-                                temp_ref_pos = ref_pos
-                                while temp_ref_pos > 0:
-                                    last_bp = ref_seq[temp_ref_pos + del_len - 1]
-                                    prev_bp = ref_seq[temp_ref_pos - 1]
-                                    if last_bp != prev_bp:
-                                        break
-                                    temp_ref_pos -= 1
-                                var_idx = lower_bound(Var_list[gene], temp_ref_pos)
-                                while var_idx < len(Var_list[gene]):
-                                    var_pos, var_id = Var_list[gene][var_idx]
-                                    if temp_ref_pos < var_pos:
-                                        first_bp = ref_seq[temp_ref_pos]
-                                        next_bp = ref_seq[temp_ref_pos + del_len]
-                                        if first_bp == next_bp:
-                                            temp_ref_pos += 1
-                                            continue
-                                        else:
-                                            break
-                                    if temp_ref_pos == var_pos:
-                                        var_type, _, var_data = Vars[gene][var_id]
-                                        if var_type == "deletion":
-                                            var_len = int(var_data)
-                                            if var_len == length:
-                                                if debug:
-                                                    print cmp, var_id, 1, Links[var_id]
-                                                    print ref_seq[var_pos - 10:var_pos], ref_seq[var_pos:var_pos+int(var_data)], ref_seq[var_pos+int(var_data):var_pos+int(var_data)+10]
-                                                add_count(var_id, 1)
-                                    var_idx += 1
-
-                                if cigar_match_len > 0:
-                                    cmp_cigar_str += ("%dM" % cigar_match_len)
-                                    cigar_match_len = 0
-                                cmp_MD += ("%d" % MD_match_len)
-                                MD_match_len = 0
-                                cmp_cigar_str += ("%dD" % length)
-                                cmp_MD += ("^%s" % ref_seq[ref_pos:ref_pos+length])
-                                ref_pos += length
-                            elif type == "soft":
-                                if cigar_match_len > 0:
-                                    cmp_cigar_str += ("%dM" % cigar_match_len)
-                                    cigar_match_len = 0
-                                read_pos += length
-                                cmp_cigar_str += ("%dS" % length)
-                            else:
-                                assert type == "intron"
-                                if cigar_match_len > 0:
-                                    cmp_cigar_str += ("%dM" % cigar_match_len)
-                                    cigar_match_len = 0
-                                cmp_cigar_str += ("%dN" % length)
-                                ref_pos += length                    
-                        if cigar_match_len > 0:
-                            cmp_cigar_str += ("%dM" % cigar_match_len)
-                        cmp_MD += ("%d" % MD_match_len)
-                        if read_pos != len(read_seq) or \
-                                cmp_cigar_str != cigar_str or \
-                                cmp_MD != MD:
-                            print >> sys.stderr, "Error:", cigar_str, MD
-                            print >> sys.stderr, "\tcomputed:", cmp_cigar_str, cmp_MD
-                            print >> sys.stderr, "\tcmp list:", cmp_list
-                            assert False            
-
-                        prev_read_id = read_id
-                        prev_exon = exon
-
-                    if num_reads <= 0:
+                            HLA_gene_alleles.append(allele)
+                        nums = [i for i in range(len(HLA_gene_alleles))]
+                        random.shuffle(nums)
+                        test_pairs.append(sorted([HLA_gene_alleles[nums[i]] for i in range(allele_count)]))
+                    test_list.append(test_pairs)
+
+        # DK - for debugging purposes
+        # test_list = [[["A*01:01:01:01"]], [["A*32:29"]]]
+        # test_list = [[["A*01:01:01:01", "A*03:01:01:01"]]]
+        # test_list = [[["A*02:01:21"]], [["A*03:01:01:01"]], [["A*03:01:01:04"]], [["A*02:521"]]]
+        for test_i in range(len(test_list)):
+            if "test_id" in daehwan_debug:
+                daehwan_test_ids = daehwan_debug["test_id"].split('-')
+                if str(test_i + 1) not in daehwan_test_ids:
+                    continue
+
+            print >> sys.stderr, "Test %d" % (test_i + 1), str(datetime.now())
+            test_HLA_list = test_list[test_i]
+            num_frag_list = simulate_reads(HLAs_default if custom_allele_check else HLAs,
+                                           test_HLA_list,
+                                           Vars,
+                                           Links,
+                                           simulate_interval,
+                                           perbase_errorrate)
+
+            assert len(num_frag_list) == len(test_HLA_list)
+            for i_ in range(len(test_HLA_list)):
+                test_HLA_names = test_HLA_list[i_]
+                num_frag_list_i = num_frag_list[i_]
+                assert len(num_frag_list_i) == len(test_HLA_names)
+                for j_ in range(len(test_HLA_names)):
+                    test_HLA_name = test_HLA_names[j_]
+                    if custom_allele_check:
+                        gene = test_HLA_name.split('*')[0]
+                        test_HLA_seq = HLAs_default[gene][test_HLA_name]
+                        seq_type = "partial" if test_HLA_name in partial_alleles else "full"
+                        print >> sys.stderr, "\t%s - %d bp (%s sequence, %d pairs)" % (test_HLA_name, len(test_HLA_seq), seq_type, num_frag_list_i[j_])
                         continue
+                    gene = test_HLA_name.split('*')[0]
+                    test_HLA_seq = HLAs[gene][test_HLA_name]
+                    seq_type = "partial" if test_HLA_name in partial_alleles else "full"
+                    print >> sys.stderr, "\t%s - %d bp (%s sequence, %d pairs)" % (test_HLA_name, len(test_HLA_seq), seq_type, num_frag_list_i[j_])
 
-                    if prev_read_id != None:
-                        add_stat(HLA_cmpt, HLA_counts, HLA_count_per_read)
-
-                    # Coverage
-                    # it is not used by the default
-                    if enable_coverage:
-                        assert num_reads > 0
-                        read_len = int(total_read_len / float(num_reads))
-                        coverage_sum = 0
-                        for i in range(len(coverage)):
-                            if i > 0:
-                                coverage[i] += coverage[i-1]
-                            coverage_sum += coverage[i]
-                        coverage_avg = coverage_sum / float(len(coverage))
-                        assert len(ref_seq) < len(coverage)
-                        for i in range(len(ref_seq)):
-                            coverage_threshold = 1.0 * coverage_avg
-                            if i < read_len:
-                                coverage_threshold *= ((i+1) / float(read_len))
-                            elif i + read_len > len(ref_seq):
-                                coverage_threshold *= ((len(ref_seq) - i) / float(read_len))
-                            if coverage[i] >= coverage_threshold:
-                                continue
-                            pseudo_num_reads = (coverage_threshold - coverage[i]) / read_len
-                            var_idx = lower_bound(Var_list[gene], i + 1)
-                            if var_idx >= len(Var_list[gene]):
-                                var_idx = len(Var_list[gene]) - 1
-                            cur_cmpt = set()
-                            while var_idx >= 0:
-                                var_pos, var_id = Var_list[gene][var_idx]
-                                var_type, _, var_data = Vars[gene][var_id]
-                                if var_type == "deletion":
-                                    del_len = int(var_data)
-                                    if i < var_pos:
-                                        break
-                                    if i + read_len < var_pos + int(var_data):
-                                        assert var_id in Links
-                                        cur_cmpt = cur_cmpt.union(set(Links[var_id]))
-                                var_idx -= 1
-                            if cur_cmpt:
-                                cur_cmpt = '-'.join(list(cur_cmpt))
-                                if not cur_cmpt in HLA_cmpt:
-                                    HLA_cmpt[cur_cmpt] = 0
-                                HLA_cmpt[cur_cmpt] += pseudo_num_reads
+            if "single-end" in daehwan_debug:
+                read_fname = ["hla_input_1.fa"]
+            else:
+                read_fname = ["hla_input_1.fa", "hla_input_2.fa"]
+
+            fastq = False
+            tmp_test_passed = HLA_typing(ex_path,
+                                         simulation,
+                                         reference_type,
+                                         test_HLA_list,
+                                         partial,
+                                         partial_alleles,
+                                         refHLAs,
+                                         HLAs,                       
+                                         HLA_names,
+                                         HLA_lengths,
+                                         refHLA_loci,
+                                         Vars,
+                                         Var_list,
+                                         Links,
+                                         HLAs_default,
+                                         Vars_default,
+                                         Var_list_default,
+                                         Links_default,
+                                         exclude_allele_list,
+                                         aligners,
+                                         num_mismatch,
+                                         assembly,
+                                         concordant_assembly,
+                                         exonic_only,
+                                         fastq,
+                                         read_fname,
+                                         alignment_fname,
+                                         num_frag_list,
+                                         threads,
+                                         enable_coverage,
+                                         best_alleles,
+                                         verbose)
+
+            for aligner_type, passed in tmp_test_passed.items():
+                if aligner_type in test_passed:
+                    test_passed[aligner_type] += passed
                 else:
-                    assert index_type == "linear"
-                    def add_alleles(alleles):
-                        if not allele in HLA_counts:
-                            HLA_counts[allele] = 1
-                        else:
-                            HLA_counts[allele] += 1
-
-                        cur_cmpt = sorted(list(alleles))
-                        cur_cmpt = '-'.join(cur_cmpt)
-                        if not cur_cmpt in HLA_cmpt:
-                            HLA_cmpt[cur_cmpt] = 1
-                        else:
-                            HLA_cmpt[cur_cmpt] += 1
-
-                    prev_read_id, prev_AS = None, None
-                    alleles = set()
-                    for line in alignview_proc.stdout:
-                        cols = line[:-1].split()
-                        read_id, flag, allele = cols[:3]
-                        flag = int(flag)
-                        if flag & 0x4 != 0:
-                            continue
-                        if not allele.startswith(gene):
-                            continue
-                        if allele.find("BACKBONE") != -1:
-                            continue
-
-                        AS = None
-                        for i in range(11, len(cols)):
-                            col = cols[i]
-                            if col.startswith("AS"):
-                                AS = int(col[5:])
-                        assert AS != None
-                        if read_id != prev_read_id:
-                            if alleles:
-                                if aligner == "hisat2" or \
-                                        (aligner == "bowtie2" and len(alleles) < 10):
-                                    add_alleles(alleles)
-                                alleles = set()
-                            prev_AS = None
-                        if prev_AS != None and AS < prev_AS:
-                            continue
-                        prev_read_id = read_id
-                        prev_AS = AS
-                        alleles.add(allele)
-
-                    if alleles:
-                        add_alleles(alleles)
-
-                HLA_counts = [[allele, count] for allele, count in HLA_counts.items()]
-                def HLA_count_cmp(a, b):
-                    if a[1] != b[1]:
-                        return b[1] - a[1]
-                    assert a[0] != b[0]
-                    if a[0] < b[0]:
-                        return -1
-                    else:
-                        return 1
-                HLA_counts = sorted(HLA_counts, cmp=HLA_count_cmp)
-                for count_i in range(len(HLA_counts)):
-                    count = HLA_counts[count_i]
-                    if simulation:
-                        found = False
-                        for test_HLA_name in test_HLA_names:
-                            if count[0] == test_HLA_name:
-                                print >> sys.stderr, "\t\t\t*** %d ranked %s (count: %d)" % (count_i + 1, test_HLA_name, count[1])
-                                found = True
-                                """
-                                if count_i > 0 and HLA_counts[0][1] > count[1]:
-                                    print >> sys.stderr, "Warning: %s ranked first (count: %d)" % (HLA_counts[0][0], HLA_counts[0][1])
-                                    assert False
-                                else:
-                                    test_passed += 1
-                                """
-                        if count_i < 5 and not found:
-                            print >> sys.stderr, "\t\t\t\t%d %s (count: %d)" % (count_i + 1, count[0], count[1])
-                    else:
-                        print >> sys.stderr, "\t\t\t\t%d %s (count: %d)" % (count_i + 1, count[0], count[1])
-                        if count_i >= 9:
-                            break
-                print >> sys.stderr
-
-                def normalize(prob):
-                    total = sum(prob.values())
-                    for allele, mass in prob.items():
-                        prob[allele] = mass / total
-
-                def normalize2(prob, length):
-                    total = 0
-                    for allele, mass in prob.items():
-                        assert allele in length
-                        total += (mass / length[allele])
-                    for allele, mass in prob.items():
-                        assert allele in length
-                        prob[allele] = mass / length[allele] / total
-
-                def prob_diff(prob1, prob2):
-                    diff = 0.0
-                    for allele in prob1.keys():
-                        if allele in prob2:
-                            diff += abs(prob1[allele] - prob2[allele])
-                        else:
-                            diff += prob1[allele]
-                    return diff
-
-                def HLA_prob_cmp(a, b):
-                    if a[1] != b[1]:
-                        if a[1] < b[1]:
-                            return 1
-                        else:
-                            return -1
-                    assert a[0] != b[0]
-                    if a[0] < b[0]:
-                        return -1
-                    else:
-                        return 1
-
-                HLA_prob, HLA_prob_next = {}, {}
-                for cmpt, count in HLA_cmpt.items():
-                    alleles = cmpt.split('-')
-                    for allele in alleles:
-                        if allele not in HLA_prob:
-                            HLA_prob[allele] = 0.0
-                        HLA_prob[allele] += (float(count) / len(alleles))
-
-                assert gene in HLA_lengths
-                HLA_length = HLA_lengths[gene]
-                # normalize2(HLA_prob, HLA_length)
-                normalize(HLA_prob)
-                def next_prob(HLA_cmpt, HLA_prob, HLA_length):
-                    HLA_prob_next = {}
-                    for cmpt, count in HLA_cmpt.items():
-                        alleles = cmpt.split('-')
-                        alleles_prob = 0.0
-                        for allele in alleles:
-                            assert allele in HLA_prob
-                            alleles_prob += HLA_prob[allele]
-                        for allele in alleles:
-                            if allele not in HLA_prob_next:
-                                HLA_prob_next[allele] = 0.0
-                            HLA_prob_next[allele] += (float(count) * HLA_prob[allele] / alleles_prob)
-                    # normalize2(HLA_prob_next, HLA_length)
-                    normalize(HLA_prob_next)
-                    return HLA_prob_next
-
-                diff, iter = 1.0, 0
-                while diff > 0.0001 and iter < 1000:
-                    HLA_prob_next = next_prob(HLA_cmpt, HLA_prob, HLA_length)
-                    diff = prob_diff(HLA_prob, HLA_prob_next)
-                    HLA_prob = HLA_prob_next
-                    iter += 1
-                for allele, prob in HLA_prob.items():
-                    allele_len = len(HLAs[gene][allele])
-                    HLA_prob[allele] /= float(allele_len)
-                normalize(HLA_prob)
-                HLA_prob = [[allele, prob] for allele, prob in HLA_prob.items()]
-
-                HLA_prob = sorted(HLA_prob, cmp=HLA_prob_cmp)
-                success = [False for i in range(len(test_HLA_names))]
-                found_list = [False for i in range(len(test_HLA_names))]
-                for prob_i in range(len(HLA_prob)):
-                    prob = HLA_prob[prob_i]
-                    found = False
-                    if simulation:
-                        for name_i in range(len(test_HLA_names)):
-                            test_HLA_name = test_HLA_names[name_i]
-                            if prob[0] == test_HLA_name:
-                                rank_i = prob_i
-                                while rank_i > 0:
-                                    if prob == HLA_prob[rank_i - 1][1]:
-                                        rank_i -= 1
-                                    else:
-                                        break
-                                print >> sys.stderr, "\t\t\t*** %d ranked %s (abundance: %.2f%%)" % (rank_i + 1, test_HLA_name, prob[1] * 100.0)
-                                if rank_i < len(success):
-                                    success[rank_i] = True
-                                found_list[name_i] = True
-                                found = True                        
-                        if not False in found_list:
-                            break
-                    if not found:
-                        print >> sys.stderr, "\t\t\t\t%d ranked %s (abundance: %.2f%%)" % (prob_i + 1, prob[0], prob[1] * 100.0)
-                        if best_alleles and prob_i < 2:
-                            print >> sys.stdout, "SingleModel %s (abundance: %.2f%%)" % (prob[0], prob[1] * 100.0)
-                    if not simulation and prob_i >= 9:
-                        break
-                print >> sys.stderr
-
-                if len(test_HLA_names) == 2 or not simulation:
-                    HLA_prob, HLA_prob_next = {}, {}
-                    for cmpt, count in HLA_cmpt.items():
-                        alleles = cmpt.split('-')
-                        for allele1 in alleles:
-                            for allele2 in HLA_names[gene]:
-                                if allele1 < allele2:
-                                    allele_pair = "%s-%s" % (allele1, allele2)
-                                else:
-                                    allele_pair = "%s-%s" % (allele2, allele1)
-                                if not allele_pair in HLA_prob:
-                                    HLA_prob[allele_pair] = 0.0
-                                HLA_prob[allele_pair] += (float(count) / len(alleles))
-
-                    if len(HLA_prob) <= 0:
-                        continue
-
-                    # Choose top allele pairs
-                    def choose_top_alleles(HLA_prob):
-                        HLA_prob_list = [[allele_pair, prob] for allele_pair, prob in HLA_prob.items()]
-                        HLA_prob_list = sorted(HLA_prob_list, cmp=HLA_prob_cmp)
-                        HLA_prob = {}
-                        best_prob = HLA_prob_list[0][1]
-                        for i in range(len(HLA_prob_list)):
-                            allele_pair, prob = HLA_prob_list[i]
-                            if prob * 2 <= best_prob:
-                                break                        
-                            HLA_prob[allele_pair] = prob
-                        normalize(HLA_prob)
-                        return HLA_prob
-                    HLA_prob = choose_top_alleles(HLA_prob)
-
-                    def next_prob(HLA_cmpt, HLA_prob):
-                        HLA_prob_next = {}
-                        for cmpt, count in HLA_cmpt.items():
-                            alleles = cmpt.split('-')
-                            prob = 0.0
-                            for allele in alleles:
-                                for allele_pair in HLA_prob.keys():
-                                    if allele in allele_pair:
-                                        prob += HLA_prob[allele_pair]
-                            for allele in alleles:
-                                for allele_pair in HLA_prob.keys():
-                                    if not allele in allele_pair:
-                                        continue
-                                    if allele_pair not in HLA_prob_next:
-                                        HLA_prob_next[allele_pair] = 0.0
-                                    HLA_prob_next[allele_pair] += (float(count) * HLA_prob[allele_pair] / prob)
-                        normalize(HLA_prob_next)
-                        return HLA_prob_next
-
-                    diff, iter = 1.0, 0
-                    while diff > 0.0001 and iter < 1000:
-                        HLA_prob_next = next_prob(HLA_cmpt, HLA_prob)
-                        diff = prob_diff(HLA_prob, HLA_prob_next)
-                        HLA_prob = HLA_prob_next
-                        HLA_prob = choose_top_alleles(HLA_prob)
-                        iter += 1
-
-                    HLA_prob = [[allele_pair, prob] for allele_pair, prob in HLA_prob.items()]
-                    HLA_prob = sorted(HLA_prob, cmp=HLA_prob_cmp)
-
-                    success = [False]
-                    for prob_i in range(len(HLA_prob)):
-                        allele_pair, prob = HLA_prob[prob_i]
-                        allele1, allele2 = allele_pair.split('-')
-                        if best_alleles and prob_i < 1:
-                            print >> sys.stdout, "PairModel %s (abundance: %.2f%%)" % (allele_pair, prob * 100.0)
-                        if simulation:
-                            if allele1 in test_HLA_names and allele2 in test_HLA_names:
-                                rank_i = prob_i
-                                while rank_i > 0:
-                                    if HLA_prob[rank_i-1][1] == prob:                                        
-                                        rank_i -= 1
-                                    else:
-                                        break
-                                print >> sys.stderr, "\t\t\t*** %d ranked %s (abundance: %.2f%%)" % (rank_i + 1, allele_pair, prob * 100.0)
-                                if rank_i == 0:
-                                    success[0] = True
-                                break
-                        print >> sys.stderr, "\t\t\t\t%d ranked %s (abundance: %.2f%%)" % (prob_i + 1, allele_pair, prob * 100.0)
-                        if not simulation and prob_i >= 9:
-                            break
-                    print >> sys.stderr
-                    
-                    # Li's method
-                    li_hla = os.path.join(ex_path, "li_hla/hla")
-                    if os.path.exists(li_hla):
-                        li_hla_cmd = [li_hla,
-                                      "hla",
-                                      "hla_input.bam",
-                                      "-b", "%s*BACKBONE" % gene]
-                        li_hla_proc = subprocess.Popen(li_hla_cmd,
-                                                       stdout=subprocess.PIPE,
-                                                       stderr=open("/dev/null", 'w'))
-
-                        # read in the result of Li's hla
-                        for line in li_hla_proc.stdout:
-                            allele1, allele2, score = line.strip().split()
-                            score = float(score)
-                            if simulation:
-                                if allele1 in test_HLA_names and allele2 in test_HLA_names:
-                                    print >> sys.stderr, "\t\t\t*** 1 ranked %s-%s (score: %.2f)" % (allele1, allele2, score)
-                                    success[0] = True
-                                else:
-                                    print >> sys.stderr, "\t\t\tLiModel fails"
-                            if best_alleles:
-                                print >> sys.stdout, "LiModel %s-%s (score: %.2f)" % (allele1, allele2, score)
-                        li_hla_proc.communicate()
-
-                if simulation and not False in success:
-                    aligner_type = "%s %s" % (aligner, index_type)
-                    if not aligner_type in test_passed:
-                        test_passed[aligner_type] = 1
-                    else:
-                        test_passed[aligner_type] += 1
+                    test_passed[aligner_type] = passed
 
-                if simulation:
-                    print >> sys.stderr, "\t\tPassed so far: %d/%d (abundance: %.2f%%)" % (test_passed[aligner_type], test_i + 1, (test_passed[aligner_type] * 100.0 / (test_i + 1)))
+                print >> sys.stderr, "\t\tPassed so far: %d/%d (%.2f%%)" % (test_passed[aligner_type], test_i + 1, (test_passed[aligner_type] * 100.0 / (test_i + 1)))
 
 
-    if simulation:
         for aligner_type, passed in test_passed.items():
             print >> sys.stderr, "%s:\t%d/%d passed (%.2f%%)" % (aligner_type, passed, len(test_list), passed * 100.0 / len(test_list))
     
+    else: # With real reads or BAMs
+        print >> sys.stderr, "\t", ' '.join(hla_list)
+        fastq = True
+        HLA_typing(ex_path,
+                   simulation,
+                   reference_type,
+                   hla_list,
+                   partial,
+                   partial_alleles,
+                   refHLAs,
+                   HLAs,                       
+                   HLA_names,
+                   HLA_lengths,
+                   refHLA_loci,
+                   Vars,
+                   Var_list,
+                   Links,
+                   HLAs_default,
+                   Vars_default,
+                   Var_list_default,
+                   Links_default,
+                   exclude_allele_list,
+                   aligners,
+                   num_mismatch,
+                   assembly,
+                   concordant_assembly,
+                   exonic_only,
+                   fastq,
+                   read_fname,
+                   alignment_fname,
+                   [],
+                   threads,
+                   enable_coverage,
+                   best_alleles,
+                   verbose)
+
         
 """
 """
 if __name__ == '__main__':
     parser = ArgumentParser(
         description='test HLA genotyping')
+    parser.add_argument("--base",
+                        dest="base_fname",
+                        type=str,
+                        default="",
+                        help="base filename for backbone HLA sequence, HLA variants, and HLA linking info")
+    parser.add_argument("--default-list",
+                        dest = "default_allele_list",
+                        type=str,
+                        default="",
+                        help="A comma-separated list of HLA alleles to be tested. Alleles are retrieved from default backbone data (all alleles included in backbone).")
     parser.add_argument("--reference-type",
                         dest="reference_type",
                         type=str,
@@ -1251,9 +2503,9 @@ if __name__ == '__main__':
                         type=str,
                         default="A,B,C,DQA1,DQB1,DRB1",
                         help="A comma-separated list of HLA genes (default: A,B,C,DQA1,DQB1,DRB1)")
-    parser.add_argument('--partial',
+    parser.add_argument('--no-partial',
                         dest='partial',
-                        action='store_true',
+                        action='store_false',
                         help='Include partial alleles (e.g. A_nuc.fasta)')
     parser.add_argument("--aligner-list",
                         dest="aligners",
@@ -1292,21 +2544,53 @@ if __name__ == '__main__':
                         dest="exclude_allele_list",
                         type=str,
                         default="",
-                        help="A comma-separated list of allleles to be excluded")
+                        help="A comma-separated list of alleles to be excluded. Enter a number N to randomly select N alleles for exclusion and N non-excluded alleles for testing (2N tested in total).")
+    parser.add_argument("--random-seed",
+                        dest="random_seed",
+                        type=int,
+                        default=0,
+                        help="A seeding number for randomness (default: 0)")
     parser.add_argument("--num-mismatch",
                         dest="num_mismatch",
                         type=int,
                         default=0,
                         help="Maximum number of mismatches per read alignment to be considered (default: 0)")
+    parser.add_argument("--perbase-errorrate",
+                        dest="perbase_errorrate",
+                        type=float,
+                        default=0.0,
+                        help="Per basepair error rate when simulating reads (default: 0.0)")
     parser.add_argument('-v', '--verbose',
                         dest='verbose',
                         action='store_true',
                         help='also print some statistics to stderr')
-    parser.add_argument("--daehwan-debug",
-                        dest="daehwan_debug",
+    parser.add_argument('--verbose-level',
+                        dest='verbose_level',
+                        type=int,
+                        default=0,
+                        help='also print some statistics to stderr (default: 0)')
+    parser.add_argument("--debug",
+                        dest="debug",
                         type=str,
                         default="",
                         help="e.g., test_id:10,read_id:10000,basic_test")
+    parser.add_argument("--assembly",
+                        dest="assembly",
+                        action="store_true",
+                        help="Perform assembly")
+    parser.add_argument("--no-concordant-assembly",
+                        dest="concordant_assembly",
+                        action="store_false",
+                        help="")
+    parser.add_argument("--exonic-only",
+                        dest="exonic_only",
+                        action="store_true",
+                        help="Consider exonic regions only")
+    parser.add_argument("--novel_allele_detection",
+                        dest="novel_allele_detection",
+                        action='store_true',
+                        help="Change test to detection of new alleles. Report sensitivity and specificity rate at the end.")
+
 
     args = parser.parse_args()
     if not args.reference_type in ["gene", "chromosome", "genome"]:
@@ -1327,18 +2611,73 @@ if __name__ == '__main__':
             not os.path.exists(args.alignment_fname):
         print >> sys.stderr, "Error: %s doesn't exist." % args.alignment_fname
         sys.exit(1)
-    args.exclude_allele_list = args.exclude_allele_list.split(',')
-    daehwan_debug = {}
-    if args.daehwan_debug != "":
-        for item in args.daehwan_debug.split(','):
+
+    if args.verbose and args.verbose_level == 0:
+        args.verbose_level = 1
+    
+    if len(args.default_allele_list) > 0:
+        args.default_allele_list = args.default_allele_list.split(',')
+        
+    if len(args.exclude_allele_list) > 0:
+        if args.exclude_allele_list.strip().isdigit():
+            num_alleles = int(args.exclude_allele_list)            
+            
+            if not os.path.exists("Default-HLA/hla_backbone.fa"):
+                curr_script = os.path.realpath(inspect.getsourcefile(test_HLA_genotyping))
+                ex_path = os.path.dirname(curr_script)
+                extract_hla_script = os.path.join(ex_path, "hisatgenotype_extract_vars.py")
+                extract_cmd = [extract_hla_script,
+                               "--reference-type", args.reference_type,
+                               "--hla-list", ','.join(args.hla_list),
+                               "--base", "Default-HLA/hla"]
+                if not args.partial:
+                    extract_cmd += ["--no-partial"]
+                extract_cmd += ["--inter-gap", "30",
+                                "--intra-gap", "50"]
+                if args.verbose_level >= 1:
+                    print >> sys.stderr, "\tRunning:", ' '.join(extract_cmd)
+                proc = subprocess.Popen(extract_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
+                proc.communicate()
+                if not os.path.exists("Default-HLA/hla_backbone.fa"):
+                    print >> sys.stderr, "Error: extract_HLA_vars (Default) failed!"
+                    sys.exit(1)
+       
+            HLAs_default = {}
+            #read_HLA_alleles("Default-HLA/hla_backbone.fa", HLAs_default)
+            read_HLA_alleles("Default-HLA/hla_sequences.fa", HLAs_default)
+            
+            allele_names = list(HLAs_default['A'].keys())
+            random.seed(args.random_seed)
+            random.shuffle(allele_names)
+            args.exclude_allele_list = allele_names[0:num_alleles]
+            args.default_allele_list = allele_names[num_alleles:2*num_alleles]
+            
+            args.default_allele_list = args.default_allele_list + args.exclude_allele_list
+            
+            # DK - for debugging purposes
+            args.default_allele_list = args.exclude_allele_list
+        else:
+            args.exclude_allele_list = args.exclude_allele_list.split(',')
+
+        if args.num_mismatch == 0:
+            args.num_mismatch = 3
+        
+    debug = {}
+    if args.debug != "":
+        for item in args.debug.split(','):
             if ':' in item:
                 key, value = item.split(':')
-                daehwan_debug[key] = value
+                debug[key] = value
             else:
-                daehwan_debug[item] = 1
+                debug[item] = 1
+
+    if not args.partial:
+        print >> sys.stderr, "Error: --no-partial is not supported!"
+        sys.exit(1)
 
     random.seed(1)
-    test_HLA_genotyping(args.reference_type,
+    test_HLA_genotyping(args.base_fname,
+                        args.reference_type,
                         args.hla_list,
                         args.partial,
                         args.aligners,
@@ -1349,6 +2688,12 @@ if __name__ == '__main__':
                         args.coverage,
                         args.best_alleles,
                         args.exclude_allele_list,
+                        args.default_allele_list,
                         args.num_mismatch,
-                        args.verbose,
-                        daehwan_debug)
+                        args.perbase_errorrate,
+                        args.assembly,
+                        args.concordant_assembly,
+                        args.exonic_only,
+                        args.verbose_level,
+                        debug)
+
diff --git a/hisat_bp.cpp b/hisat_bp.cpp
index b178cf5..956bd62 100644
--- a/hisat_bp.cpp
+++ b/hisat_bp.cpp
@@ -828,10 +828,10 @@ static void printUsage(ostream& out) {
 	    << "  -p/--threads <int> number of alignment threads to launch (1)" << endl
 	    << "  --reorder          force SAM output order to match order of input reads" << endl
 #ifdef BOWTIE_MM
-	    << "  --mm               use memory-mapped I/O for index; many 'bowtie's can share" << endl
+	    << "  --mm               use memory-mapped I/O for index; many 'hisat2's can share" << endl
 #endif
 #ifdef BOWTIE_SHARED_MEM
-		//<< "  --shmem            use shared mem for index; many 'bowtie's can share" << endl
+		//<< "  --shmem            use shared mem for index; many 'hisat2's can share" << endl
 #endif
 		<< endl
 	    << " Other:" << endl
diff --git a/hisat2_extract_HLA_vars.py b/hisatgenotype_extract_vars.py
similarity index 60%
rename from hisat2_extract_HLA_vars.py
rename to hisatgenotype_extract_vars.py
index 9835fcc..17d2d80 100755
--- a/hisat2_extract_HLA_vars.py
+++ b/hisatgenotype_extract_vars.py
@@ -22,6 +22,7 @@
 
 import os, sys, subprocess, re
 import inspect
+import glob
 from argparse import ArgumentParser, FileType
 
 
@@ -44,101 +45,143 @@ def create_map(seq):
 
 """
 """
-def extract_HLA_vars(base_fname,
-                     reference_type,
-                     hla_list,
-                     partial,
-                     inter_gap,
-                     intra_gap,
-                     DRB1_REF,
-                     verbose):
+def extract_vars(base_fname,
+                 base_dname,
+                 reference_type,
+                 hla_list,
+                 partial,
+                 inter_gap,
+                 intra_gap,
+                 DRB1_REF,
+                 exclude_allele_list,
+                 leftshift,
+                 verbose):
     # Current script directory
-    curr_script = os.path.realpath(inspect.getsourcefile(extract_HLA_vars))
+    curr_script = os.path.realpath(inspect.getsourcefile(extract_vars))
     ex_path = os.path.dirname(curr_script)
 
+    base_fullpath_name = base_fname
+    if base_dname != "" and not os.path.exists(base_dname):
+        os.mkdir(base_dname)
+        base_fullpath_name = "%s/%s" % (base_dname, base_fname)
+
     # Samples of HLA_MSA_file are found in
     #    ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/msf/
     #    git clone https://github.com/jrob119/IMGTHLA.git
 
     # Corresponding genomic loci found by HISAT2 (reference is GRCh38)
     #   e.g. hisat2 --no-unal --score-min C,0 -x grch38/genome -f IMGTHLA/fasta/A_gen.fasta
-    hla_ref_file = open("hla.ref", 'w')
-    HLA_genes, HLA_gene_strand = {}, {}
-    for gene in hla_list:
-        hisat2 = os.path.join(ex_path, "hisat2")
-        aligner_cmd = [hisat2,
-                       "--score-min", "C,0",
-                       "--no-unal",
-                       "-x", "grch38/genome",
-                       "-f", "IMGTHLA/fasta/%s_gen.fasta" % gene]
-        align_proc = subprocess.Popen(aligner_cmd,
-                                      stdout=subprocess.PIPE,
-                                      stderr=open("/dev/null", 'w'))
-        allele_id, strand = "", ''
-        for line in align_proc.stdout:
-            if line.startswith('@'):
-                continue
-            line = line.strip()
-            cols = line.split()
-            allele_id, flag = cols[:2]
-            flag = int(flag)
-            strand = '-' if flag & 0x10 else '+'
-            AS = ""
-            for i in range(11, len(cols)):
-                col = cols[i]
-                if col.startswith("AS"):
-                    AS = col[5:]
-            assert int(AS) == 0
-        align_proc.communicate()
-        assert allele_id != ""
-        allele_name = ""
-        for line in open("IMGTHLA/fasta/%s_gen.fasta" % gene):
-            line = line.strip()
-            if not line.startswith('>'):
-                continue
-            tmp_allele_id, tmp_allele_name = line[1:].split()[:2]
-            if allele_id == tmp_allele_id:
-                allele_name = tmp_allele_name
+    hla_ref_file = open(base_fullpath_name + ".ref", 'w')
+    if base_fname in ["hla"]:
+        HLA_genes, HLA_gene_strand = {}, {}
+        for gene in hla_list:
+            hisat2 = os.path.join(ex_path, "hisat2")
+            aligner_cmd = [hisat2,
+                           "--score-min", "C,0",
+                           "--no-unal",
+                           "-x", "grch38/genome",
+                           "-f", "IMGTHLA/fasta/%s_gen.fasta" % gene]
+            align_proc = subprocess.Popen(aligner_cmd,
+                                          stdout=subprocess.PIPE,
+                                          stderr=open("/dev/null", 'w'))
+            print aligner_cmd
+            allele_id, strand = "", ''
+            for line in align_proc.stdout:
+                if line.startswith('@'):
+                    continue
+                line = line.strip()
+                cols = line.split()
+                t_allele_id, flag = cols[:2]
+                # Avoid selection of excluded allele as backbone
+                if t_allele_id in exclude_allele_list:
+                    continue
+                allele_id = t_allele_id
+
+                flag = int(flag)
+                strand = '-' if flag & 0x10 else '+'
+                AS = ""
+                for i in range(11, len(cols)):
+                    col = cols[i]
+                    if col.startswith("AS"):
+                        AS = col[5:]
+                assert int(AS) == 0
+
+            align_proc.communicate()
+            assert allele_id != ""
+            allele_name = ""
+            for line in open("IMGTHLA/fasta/%s_gen.fasta" % gene):
+                line = line.strip()
+                if not line.startswith('>'):
+                    continue
+                tmp_allele_id, tmp_allele_name = line[1:].split()[:2]
+                if allele_id == tmp_allele_id:
+                    allele_name = tmp_allele_name
+                    break
+            assert allele_name != "" and strand != ''
+            HLA_genes[gene] = allele_name
+            HLA_gene_strand[gene] = strand
+            print "HLA-%s's backbone allele is %s on '%s' strand" % (gene, allele_name, strand)
+
+        # Extract exon information from hla.data
+        HLA_gene_exons = {}
+        skip = False
+        for line in open("IMGTHLA/hla.dat"):
+            if line.startswith("DE"):
+                allele_name = line.split()[1][4:-1]
+                gene = allele_name.split('*')[0]
+                if line.find("partial") != -1 or \
+                        not gene in HLA_genes or \
+                        allele_name != HLA_genes[gene] or \
+                        allele_name in exclude_allele_list :
+                    skip = True
+                    continue
+                skip = False
+            elif not skip:
+                if not line.startswith("FT") or \
+                        line.find("exon") == -1:
+                    continue
+                exon_range = line.split()[2].split("..")
+                if not gene in HLA_gene_exons:
+                    HLA_gene_exons[gene] = []
+                HLA_gene_exons[gene].append([int(exon_range[0]) - 1, int(exon_range[1]) - 1])
+    else:
+        assert base_fname == "cyp"
+        
+        HLA_genes, HLA_gene_strand = {}, {}        
+        fasta_dirname = "hisat_genotype_db/%s/fasta" % base_fname.upper()
+        assert os.path.exists(fasta_dirname)
+        fasta_fnames = glob.glob("%s/*.fasta" % fasta_dirname)
+        for fasta_fname in fasta_fnames:
+            gene_name = fasta_fname.split('/')[-1]
+            gene_name = gene_name.split('_')[0]
+            ref_allele_name = ""
+            for line in open(fasta_fname):
+                assert line[0] == '>'
+                ref_allele_name = line.split(' ')[0][1:]
                 break
-        assert allele_name != "" and strand != ''
-        HLA_genes[gene] = allele_name
-        HLA_gene_strand[gene] = strand
-        print "HLA-%s's backbone allele is %s on '%s' strand" % (gene, allele_name, strand)
-
-    # Extract exon information from hla.data
-    HLA_gene_exons = {}
-    skip = False
-    for line in open("IMGTHLA/hla.dat"):
-        if line.startswith("DE"):
-            allele_name = line.split()[1][4:-1]
-            gene = allele_name.split('*')[0]
-            if line.find("partial") != -1 or \
-                    not gene in HLA_genes or \
-                    allele_name != HLA_genes[gene]:
-                skip = True
-                continue
-            skip = False
-        elif not skip:
-            if not line.startswith("FT") or \
-                    line.find("exon") == -1:
-                continue
-            exon_range = line.split()[2].split("..")
-            if not gene in HLA_gene_exons:
-                HLA_gene_exons[gene] = []
-            HLA_gene_exons[gene].append([int(exon_range[0]) - 1, int(exon_range[1]) - 1])
+
+            assert ref_allele_name != ""
+            assert gene_name not in HLA_genes
+            HLA_genes[gene_name] = ref_allele_name
+            # DK - temporary solution
+            HLA_gene_strand[gene_name] = '+'
+
+        HLA_gene_exons = {}
+        assert reference_type == "gene"
         
     # Write the backbone sequences into a fasta file
     if reference_type == "gene":
-        backbone_file = open(base_fname + "_backbone.fa", 'w')        
+        backbone_file = open(base_fullpath_name + "_backbone.fa", 'w')        
     # variants w.r.t the backbone sequences into a SNP file
-    var_file = open(base_fname + ".snp", 'w')
+    var_file = open(base_fullpath_name + ".snp", 'w')
     # haplotypes
-    haplotype_file = open(base_fname + ".haplotype", 'w')
+    haplotype_file = open(base_fullpath_name + ".haplotype", 'w')
     # pairs of a variant and the corresponding HLA allels into a LINK file    
-    link_file = open(base_fname + ".link", 'w')
+    link_file = open(base_fullpath_name + ".link", 'w')
     # Write all the sequences with dots removed into a file
-    input_file = open(base_fname + "_sequences.fa", 'w')
+    input_file = open(base_fullpath_name + "_sequences.fa", 'w')
     num_vars, num_haplotypes = 0, 0
+    HLA_full_alleles = {}
     for HLA_gene, HLA_ref_gene in HLA_genes.items():
         strand = HLA_gene_strand[HLA_gene]        
         def read_MSF_file(fname):
@@ -157,6 +200,9 @@ def extract_HLA_vars(base_fname,
                     try:
                         name = line.split('\t')[0]
                         name = name.split()[1]
+                        if name in exclude_allele_list:
+                            continue
+                        
                     except ValueError:
                         continue
 
@@ -177,45 +223,67 @@ def extract_HLA_vars(base_fname,
                         continue
 
                     if name not in HLA_names:
-                        print >> sys.stderr, "Warning: %s is not present in Names" % (name)
-                        continue
+                        HLA_names[name] = len(HLA_names)
 
                     id = HLA_names[name]
+                    if id >= len(HLA_seqs):
+                        assert id == len(HLA_seqs)
+                        HLA_seqs.append("")
+                        
                     HLA_seqs[id] += ''.join(fives)
+
+                    # Add sub-names of the allele
+                    sub_name = ""
+                    for group in name.split(':')[:-1]:
+                        if sub_name != "":
+                            sub_name += ":"
+                        sub_name += group
+                        if sub_name not in HLA_full_alleles:
+                            HLA_full_alleles[sub_name] = [name]
+                        else:
+                            HLA_full_alleles[sub_name].append(name)                        
+                    
             return HLA_names, HLA_seqs
 
-        HLA_MSA_fname = "IMGTHLA/msf/%s_gen.msf" % HLA_gene
+        if base_fname == "hla":
+            HLA_MSA_fname = "IMGTHLA/msf/%s_gen.msf" % HLA_gene
+        else:
+            HLA_MSA_fname = "hisatgenotype_db/%s/msf/%s_gen.msf" % (base_fname.upper(), HLA_gene)
+            
         if not os.path.exists(HLA_MSA_fname):
             print >> sys.stderr, "Warning: %s does not exist" % HLA_MSA_fname
             continue
+        
         HLA_names, HLA_seqs = read_MSF_file(HLA_MSA_fname)
 
         # Identify a consensus sequence
         assert len(HLA_seqs) > 0
 
         # Check sequences are of equal length
-        seq_lens = {}
-        for s in range(len(HLA_seqs)):
-            seq_len = len(HLA_seqs[s])
-            if seq_len not in seq_lens:
-                seq_lens[seq_len] = 1
-            else:
-                seq_lens[seq_len] += 1
-        max_seq_count = 0
-        for tmp_seq_len, tmp_seq_count in seq_lens.items():
-            if tmp_seq_count > max_seq_count:
-                seq_len = tmp_seq_len
-                max_seq_count = tmp_seq_count
-        
-        if reference_type == "gene" and \
-                (not DRB1_REF or HLA_gene != "DRB1"):
+        def find_seq_len(seqs):
+            seq_lens = {}
+            for s in range(len(seqs)):
+                seq_len = len(seqs[s])
+                if seq_len not in seq_lens:
+                    seq_lens[seq_len] = 1
+                else:
+                    seq_lens[seq_len] += 1
+
+            max_seq_count = 0
+            for tmp_seq_len, tmp_seq_count in seq_lens.items():
+                if tmp_seq_count > max_seq_count:
+                    seq_len = tmp_seq_len
+                    max_seq_count = tmp_seq_count
+            return seq_len
+
+        def create_consensus_seq(seqs, seq_len, partial):
             consensus_count = [[0, 0, 0, 0] for i in range(seq_len)]
-            for i in range(len(HLA_seqs)):                
-                HLA_seq = HLA_seqs[i]
-                if len(HLA_seq) != seq_len:
+            for i in range(len(seqs)):                
+                seq = seqs[i]
+                if len(seq) != seq_len:
                     continue                    
                 for j in range(seq_len):
-                    nt = HLA_seq[j]
+                    nt = seq[j]
                     if not nt in "ACGT":
                         continue
                     if nt == 'A':
@@ -227,13 +295,45 @@ def extract_HLA_vars(base_fname,
                     else:
                         assert nt == 'T'
                         consensus_count[j][3] += 1
-            backbone_name = "%s*BACKBONE" % HLA_gene
-            backbone_seq = ""
+            consensus_seq = ""
+            has_empty = False
             for count in consensus_count:
-                assert sum(consensus_count[j]) > 0
+                # No alleles have bases at this particular location
+                if sum(count) <= 0:
+                    has_empty = True
+                    consensus_seq += 'E'
+                    continue
                 idx = count.index(max(count))
                 assert idx < 4
-                backbone_seq += "ACGT"[idx]
+                consensus_seq += "ACGT"[idx]
+            consensus_seq = ''.join(consensus_seq)
+
+            # Remove dots (deletions)
+            if has_empty and not partial:
+                for seq_i in range(len(seqs)):
+                    seqs[seq_i] = list(seqs[seq_i])
+                for i in range(len(consensus_seq)):
+                    if consensus_seq[i] != 'E':
+                        continue
+                    for seq_i in range(len(seqs)):
+                        if i >= len(seqs[seq_i]):
+                            continue
+                        seqs[seq_i][i] = 'E'
+                for seq_i in range(len(seqs)):
+                    seqs[seq_i] = ''.join(seqs[seq_i])
+                    seqs[seq_i] = seqs[seq_i].replace('E', '')
+                consensus_seq = consensus_seq.replace('E', '')
+                
+            return consensus_seq
+
+        seq_len = find_seq_len(HLA_seqs)        
+        if reference_type == "gene" and \
+                (not DRB1_REF or HLA_gene != "DRB1"):
+            backbone_name = "%s*BACKBONE" % HLA_gene
+            backbone_seq = create_consensus_seq(HLA_seqs, seq_len, partial)
+            # Allele sequences can shrink, so readjust the sequence length
+            if not partial:
+                seq_len = find_seq_len(HLA_seqs)
         else:
             backbone_name = HLA_ref_gene
             backbone_id = HLA_names[backbone_name]
@@ -245,6 +345,29 @@ def extract_HLA_vars(base_fname,
                 print >> sys.stderr, "Warning: %s does not exist" % HLA_partial_MSA_fname
                 continue
             HLA_partial_names, HLA_partial_seqs = read_MSF_file(HLA_partial_MSA_fname)
+
+            # DK - for debugging purposes
+            # Partial alleles vs. Full alleles
+            """
+            counts = [0, 0, 0, 0]
+            for partial_name in HLA_partial_names.keys():
+                if partial_name in HLA_names:
+                    continue
+                name_group = partial_name.split(':')
+                for group_i in [3, 2, 1, 0]:
+                    if group_i == 0:
+                        counts[group_i] += 1
+                    if group_i > len(name_group):
+                        continue
+                    sub_name = ':'.join(name_group[:group_i])
+                    if sub_name in HLA_full_alleles:
+                        print partial_name, sub_name, HLA_full_alleles[sub_name][:5]
+                        counts[group_i] += 1
+                        break
+            print "DK: counts:", counts
+            sys.exit(1)
+            """
+                
             ref_seq = HLA_seqs[HLA_names[HLA_ref_gene]]
             ref_seq_map = create_map(ref_seq)
             ref_partial_seq = HLA_partial_seqs[HLA_partial_names[HLA_ref_gene]]
@@ -279,6 +402,68 @@ def extract_HLA_vars(base_fname,
                 HLA_names[HLA_name] = len(HLA_seqs)
                 HLA_seqs.append(new_seq)
 
+            backbone_seq = create_consensus_seq(HLA_seqs, seq_len, partial)
+
+        # Left-shift deletions if poissble
+        def leftshift_deletions(backbone_seq, seq, debug = False):
+            if len(seq) != len(backbone_seq):
+                return seq
+            seq = list(seq)
+            seq_len = len(seq)
+            bp_i = 0
+            # Skip the first deletion
+            while bp_i < seq_len:
+                if seq[bp_i] in "ACGT":
+                    break
+                bp_i += 1
+
+            while bp_i < seq_len:
+                bp = seq[bp_i]
+                if bp != '.':
+                    bp_i += 1
+                    continue
+                bp_j = bp_i + 1
+                while bp_j < seq_len:
+                    bp2 = seq[bp_j]
+                    if bp2 != '.':
+                        break
+                    else:
+                        bp_j += 1
+
+                if bp_j >= seq_len:
+                    bp_i = bp_j
+                    break
+                
+                # DK - for debugging purposes
+                if debug:
+                    print bp_i, bp_j, backbone_seq[bp_i-10:bp_i], backbone_seq[bp_i:bp_j], backbone_seq[bp_j:bp_j+10]
+                    print bp_i, bp_j, ''.join(seq[bp_i-10:bp_i]), ''.join(seq[bp_i:bp_j]), ''.join(seq[bp_j:bp_j+10])
+                prev_i, prev_j = bp_i, bp_j
+
+                while bp_i > 0 and seq[bp_i-1] in "ACGT":
+                    assert backbone_seq[bp_j-1] in "ACGT"
+                    if seq[bp_i-1] != backbone_seq[bp_j-1]:
+                        break
+                    seq[bp_j-1] = seq[bp_i-1]
+                    seq[bp_i-1] = '.'
+                    bp_i -= 1
+                    bp_j -= 1
+                bp_i = bp_j
+                while bp_i < seq_len:
+                    if seq[bp_i] in "ACGT":
+                        break
+                    bp_i += 1
+
+                # DK - for debugging purposes
+                if debug:
+                    print prev_i, prev_j, ''.join(seq[prev_i-10:prev_i]), ''.join(seq[prev_i:prev_j]), ''.join(seq[prev_j:prev_j+10])
+                  
+            return ''.join(seq)
+
+        if leftshift:
+            for seq_i in range(len(HLA_seqs)):
+                HLA_seqs[seq_i] = leftshift_deletions(backbone_seq, HLA_seqs[seq_i], seq_i == HLA_names["A*68:02:02"] and False)
+
         # Reverse complement MSF if this gene is on '-' strand
         if strand == '-':
             def reverse_complement(seq):
@@ -308,10 +493,16 @@ def extract_HLA_vars(base_fname,
                     (cmp_name, len(cmp_seq), seq_len)
                 continue
 
+            # DK - for debugging purposes
             """
-            for s in range(0, seq_len, 100):
-                print s, backbone_seq[s:s+100]
-                print s, cmp_seq[s:s+100]
+            if cmp_name == "A*03:01:07":
+                print cmp_name
+                cmp_seq2 = HLA_seqs[HLA_names["A*32:29"]]
+                for s in range(0, seq_len, 100):
+                    print s, backbone_seq[s:s+100]
+                    print s, cmp_seq2[s:s+100]
+                    print s, cmp_seq[s:s+100]
+                # sys.exit(1)
             """
 
             def insertVar(indel, type):
@@ -478,70 +669,73 @@ def extract_HLA_vars(base_fname,
 
         # Remap the backbone allele, which is sometimes slighly different from
         #   IMGTHLA/fasta version
-        ref_backbone_id = HLA_names[HLA_ref_gene]
-        ref_backbone_seq = HLA_seqs[ref_backbone_id]
-        hisat2 = os.path.join(ex_path, "hisat2")
-        aligner_cmd = [hisat2,
-                       "--score-min", "C,0",
-                       "--no-unal",
-                       "-x", "grch38/genome",
-                       "-f", 
-                       "-c", "%s" % ref_backbone_seq.replace('.', '')]
-        align_proc = subprocess.Popen(aligner_cmd,
-                                      stdout=subprocess.PIPE,
-                                      stderr=open("/dev/null", 'w'))
-        left, right = 0, 0
-        for line in align_proc.stdout:
-            if line.startswith('@'):
-                continue
-            line = line.strip()
-            cols = line.split()
-            allele_id, flag, chr, left, mapQ, cigar_str = cols[:6]
-            flag = int(flag)
-            assert flag & 0x10 == 0
-            left = int(left) - 1
-            AS = ""
-            for i in range(11, len(cols)):
-                col = cols[i]
-                if col.startswith("AS"):
-                    AS = col[5:]
-            assert int(AS) == 0
-            cigar_re = re.compile('\d+\w')
-            right = left
-            cigars = cigar_re.findall(cigar_str)
-            cigars = [[cigar[-1], int(cigar[:-1])] for cigar in cigars]
-            assert len(cigars) == 1
-            for cigar_op, length in cigars:
-                assert cigar_op == 'M'
-                right += (length - 1)
-            break            
-        align_proc.communicate()
-        assert left < right
-
-        if reference_type == "gene":
-            base_locus = 0
-            backbone_seq_ = backbone_seq.replace('.', '')
-
-            ref_seq = HLA_seqs[HLA_names[HLA_ref_gene]]
-            ref_seq_map = create_map(ref_seq)
-            exons = HLA_gene_exons[HLA_gene]
-            exon_str = ""
-            for exon in exons:
-                if exon_str != "":
-                    exon_str += ','
-                exon_str += ("%d-%d" % (ref_seq_map[exon[0]], ref_seq_map[exon[1]]))
-                
-            print >> hla_ref_file, "%s\t6\t%d\t%d\t%d\t%s" % (backbone_name, left, right, len(backbone_seq_), exon_str)
+        if base_fname == "hla":
+            ref_backbone_id = HLA_names[HLA_ref_gene]
+            ref_backbone_seq = HLA_seqs[ref_backbone_id]
+            hisat2 = os.path.join(ex_path, "hisat2")
+            aligner_cmd = [hisat2,
+                           "--score-min", "C,0",
+                           "--no-unal",
+                           "-x", "grch38/genome",
+                           "-f", 
+                           "-c", "%s" % ref_backbone_seq.replace('.', '')]
+            align_proc = subprocess.Popen(aligner_cmd,
+                                          stdout=subprocess.PIPE,
+                                          stderr=open("/dev/null", 'w'))
+            left, right = 0, 0
+            for line in align_proc.stdout:
+                if line.startswith('@'):
+                    continue
+                line = line.strip()
+                cols = line.split()
+                allele_id, flag, chr, left, mapQ, cigar_str = cols[:6]
+                flag = int(flag)
+                assert flag & 0x10 == 0
+                left = int(left) - 1
+                AS = ""
+                for i in range(11, len(cols)):
+                    col = cols[i]
+                    if col.startswith("AS"):
+                        AS = col[5:]
+                assert int(AS) == 0
+                cigar_re = re.compile('\d+\w')
+                right = left
+                cigars = cigar_re.findall(cigar_str)
+                cigars = [[cigar[-1], int(cigar[:-1])] for cigar in cigars]
+                assert len(cigars) == 1
+                for cigar_op, length in cigars:
+                    assert cigar_op == 'M'
+                    right += (length - 1)
+                break            
+            align_proc.communicate()
+            assert left < right
+
+            if reference_type == "gene":
+                base_locus = 0
+                backbone_seq_ = backbone_seq.replace('.', '')
+
+                ref_seq = HLA_seqs[HLA_names[HLA_ref_gene]]
+                ref_seq_map = create_map(ref_seq)
+                exons = HLA_gene_exons[HLA_gene]
+                exon_str = ""
+                for exon in exons:
+                    if exon_str != "":
+                        exon_str += ','
+                    exon_str += ("%d-%d" % (ref_seq_map[exon[0]], ref_seq_map[exon[1]]))
+
+                print >> hla_ref_file, "%s\t6\t%d\t%d\t%d\t%s" % (backbone_name, left, right, len(backbone_seq_), exon_str)
+            else:
+                exons = HLA_gene_exons[HLA_gene]
+                exon_str = ""
+                for exon in exons:
+                    if exon_str != "":
+                        exon_str += ','
+                    exon_str += ("%d-%d" % (left + exon[0], left + exon[1]))
+
+                print >> hla_ref_file, "%s\t6\t%d\t%d\t%d\t%s" % (backbone_name, left, right, right - left + 1, exon_str)
+                base_locus = left
         else:
-            exons = HLA_gene_exons[HLA_gene]
-            exon_str = ""
-            for exon in exons:
-                if exon_str != "":
-                    exon_str += ','
-                exon_str += ("%d-%d" % (left + exon[0], left + exon[1]))
-
-            print >> hla_ref_file, "%s\t6\t%d\t%d\t%d\t%s" % (backbone_name, left, right, right - left + 1, exon_str)
-            base_locus = left
+            base_locus = 0
 
         # Write
         #       (1) variants w.r.t the backbone sequences into a SNP file
@@ -651,7 +845,7 @@ def extract_HLA_vars(base_fname,
             haplotypes = sorted(list(haplotypes), cmp=cmp_haplotype)
             haplotypes2 = sorted(list(haplotypes2), cmp=cmp_haplotype)
             
-            # daehwan - for debugging purposes
+            # DK - for debugging purposes
             """
             dis = prev_locus - locus
             print "\n[%d, %d]: %d haplotypes" % (i, j, len(haplotypes)), dis
@@ -678,7 +872,7 @@ def extract_HLA_vars(base_fname,
                 varIDs = []
                 for var in h:
                     varIDs.append("hv%s" % var2ID[var])
-                    # daehwan - for debugging purposes
+                    # DK - for debugging purposes
                     # varIDs.append(var)
                     sanity_vars.add(var2ID[var])
                 h_new_begin = h_begin
@@ -734,7 +928,7 @@ def extract_HLA_vars(base_fname,
 """
 if __name__ == '__main__':
     parser = ArgumentParser(
-        description="Extract HLA variants from HLA multiple sequence alignments")
+        description="Extract variants from multiple sequence alignments")
     parser.add_argument("-b", "--base",
                         dest="base_fname",
                         type=str,
@@ -750,9 +944,9 @@ if __name__ == '__main__':
                         type=str,
                         default="A,B,C,DQA1,DQB1,DRB1",
                         help="A comma-separated list of HLA genes")
-    parser.add_argument("--partial",
+    parser.add_argument("--no-partial",
                         dest="partial",
-                        action="store_true",
+                        action="store_false",
                         help="Include partial alleles (e.g. A_nuc.fasta)")
     parser.add_argument("--inter-gap",
                         dest="inter_gap",
@@ -768,6 +962,15 @@ if __name__ == '__main__':
                         dest="DRB1_REF",
                         action="store_true",
                         help="Some DRB1 alleles seem to include vector sequences, so use this option to avoid including them")
+    parser.add_argument("--exclude-allele-list",
+                        dest="exclude_allele_list",
+                        type=str,
+                        default="",
+                        help="A comma-separated list of alleles to be excluded")
+    parser.add_argument("--leftshift",
+                        dest="leftshift",
+                        action="store_true",
+                        help="Shift deletions to the leftmost")
     parser.add_argument("-v", "--verbose",
                         dest="verbose",
                         action="store_true",
@@ -780,12 +983,30 @@ if __name__ == '__main__':
         sys.exit(1)
     if not args.reference_type in ["gene", "chromosome", "genome"]:
         print >> sys.stderr, "Error: --reference-type (%s) must be one of gene, chromosome, and genome" % (args.reference_type)
-        sys.exit(1)        
-    extract_HLA_vars(args.base_fname,
-                     args.reference_type,
-                     args.hla_list,
-                     args.partial,
-                     args.inter_gap,
-                     args.intra_gap,
-                     args.DRB1_REF,
-                     args.verbose)
+        sys.exit(1)
+             
+    if len(args.exclude_allele_list) > 0:
+        args.exclude_allele_list = args.exclude_allele_list.split(',')
+    else:
+        args.exclude_allele_list = []
+
+    if args.base_fname.find('/') != -1:
+        elems = args.base_fname.split('/')
+        base_fname = elems[-1]
+        base_dname = '/'.join(elems[:-1])
+    else:
+        base_fname = args.base_fname
+        base_dname = ""
+        
+    # print args.exclude_allele_list
+    extract_vars(base_fname,
+                 base_dname,
+                 args.reference_type,
+                 args.hla_list,
+                 args.partial,
+                 args.inter_gap,
+                 args.intra_gap,
+                 args.DRB1_REF,
+                 args.exclude_allele_list,
+                 args.leftshift,
+                 args.verbose)
diff --git a/hisatgenotype_typing.py b/hisatgenotype_typing.py
new file mode 100755
index 0000000..b4603ff
--- /dev/null
+++ b/hisatgenotype_typing.py
@@ -0,0 +1,1688 @@
+#!/usr/bin/env python
+
+#
+# Copyright 2015, Daehwan Kim <infphilo at gmail.com>
+#
+# This file is part of HISAT 2.
+#
+# HISAT 2 is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# HISAT 2 is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with HISAT 2.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+import sys, os, subprocess, re
+import inspect, random
+import math
+from argparse import ArgumentParser, FileType
+
+"""
+"""
+def simulate_reads(HLAs,
+                   test_HLA_list,
+                   simulate_interval):
+    HLA_reads_1, HLA_reads_2 = [], []
+    for test_HLA_names in test_HLA_list:
+        gene = test_HLA_names[0].split('*')[0]
+        # ref_allele = refHLAs[gene]
+        # ref_seq = HLAs[gene][ref_allele]
+
+        # Simulate reads from two HLA alleles
+        def simulate_reads_impl(seq, simulate_interval = 1, frag_len = 250, read_len = 100):
+            comp_table = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}
+            reads_1, reads_2 = [], []
+            for i in range(0, len(seq) - frag_len + 1, simulate_interval):
+                reads_1.append(seq[i:i+read_len])
+                tmp_read_2 = reversed(seq[i+frag_len-read_len:i+frag_len])
+                read_2 = ""
+                for s in tmp_read_2:
+                    if s in comp_table:
+                        read_2 += comp_table[s]
+                    else:
+                        read_2 += s
+                reads_2.append(read_2)
+            return reads_1, reads_2
+
+        for test_HLA_name in test_HLA_names:
+            HLA_seq = HLAs[gene][test_HLA_name]
+            tmp_reads_1, tmp_reads_2 = simulate_reads_impl(HLA_seq, simulate_interval)
+            HLA_reads_1 += tmp_reads_1
+            HLA_reads_2 += tmp_reads_2
+
+    # Write reads into a fasta read file
+    def write_reads(reads, idx):
+        read_file = open('hla_input_%d.fa' % idx, 'w')
+        for read_i in range(len(reads)):
+            print >> read_file, ">%d" % (read_i + 1)
+            print >> read_file, reads[read_i]
+        read_file.close()
+    write_reads(HLA_reads_1, 1)
+    write_reads(HLA_reads_2, 2)
+
+
+"""
+Align reads, and sort the alignments into a BAM file
+"""
+def align_reads(ex_path,
+                aligner,
+                index_type,
+                read_fname,
+                fastq,
+                threads,
+                verbose):
+    if aligner == "hisat2":
+        hisat2 = os.path.join(ex_path, "hisat2")
+        aligner_cmd = [hisat2,
+                       "--no-unal",
+                       "--mm"]
+        if index_type == "linear":
+            aligner_cmd += ["-k", "10"]
+        aligner_cmd += ["-x", "hla.%s" % index_type]
+    elif aligner == "bowtie2":
+        aligner_cmd = [aligner,
+                       "--no-unal",
+                       "-k", "10",
+                       "-x", "hla"]
+    else:
+        assert False
+    assert len(read_fname) in [1,2]
+    aligner_cmd += ["-p", str(threads)]
+    if not fastq:
+        aligner_cmd += ["-f"]
+    if len(read_fname) == 1:
+        aligner_cmd += ["-U", read_fname[0]]
+    else:
+        aligner_cmd += ["-1", "%s" % read_fname[0],
+                        "-2", "%s" % read_fname[1]]
+
+    if verbose:
+        print >> sys.stderr, ' '.join(aligner_cmd)
+    align_proc = subprocess.Popen(aligner_cmd,
+                                  stdout=subprocess.PIPE,
+                                  stderr=open("/dev/null", 'w'))
+
+    sambam_cmd = ["samtools",
+                  "view",
+                  "-bS",
+                  "-"]
+    sambam_proc = subprocess.Popen(sambam_cmd,
+                                   stdin=align_proc.stdout,
+                                   stdout=open("hla_input_unsorted.bam", 'w'),
+                                   stderr=open("/dev/null", 'w'))
+    sambam_proc.communicate()
+    if index_type == "graph":
+        bamsort_cmd = ["samtools",
+                       "sort",
+                       "hla_input_unsorted.bam",
+                       "-o", "hla_input.bam"]
+        bamsort_proc = subprocess.Popen(bamsort_cmd,
+                                        stderr=open("/dev/null", 'w'))
+        bamsort_proc.communicate()
+
+        bamindex_cmd = ["samtools",
+                        "index",
+                        "hla_input.bam"]
+        bamindex_proc = subprocess.Popen(bamindex_cmd,
+                                         stderr=open("/dev/null", 'w'))
+        bamindex_proc.communicate()
+
+        os.system("rm hla_input_unsorted.bam")            
+    else:
+        os.system("mv hla_input_unsorted.bam hla_input.bam")
+
+
+"""
+""" 
+def normalize(prob):
+    total = sum(prob.values())
+    for allele, mass in prob.items():
+        prob[allele] = mass / total
+
+        
+"""
+"""
+def prob_diff(prob1, prob2):
+    diff = 0.0
+    for allele in prob1.keys():
+        if allele in prob2:
+            diff += abs(prob1[allele] - prob2[allele])
+        else:
+            diff += prob1[allele]
+    return diff
+
+
+"""
+"""
+def HLA_prob_cmp(a, b):
+    if a[1] != b[1]:
+        if a[1] < b[1]:
+            return 1
+        else:
+            return -1
+    assert a[0] != b[0]
+    if a[0] < b[0]:
+        return -1
+    else:
+        return 1
+
+
+"""
+"""
+def single_abundance(HLA_cmpt,
+                     HLA_length):
+    def normalize2(prob, length):
+        total = 0
+        for allele, mass in prob.items():
+            assert allele in length
+            total += (mass / length[allele])
+        for allele, mass in prob.items():
+            assert allele in length
+            prob[allele] = mass / length[allele] / total
+
+    HLA_prob, HLA_prob_next = {}, {}
+    for cmpt, count in HLA_cmpt.items():
+        alleles = cmpt.split('-')
+        for allele in alleles:
+            if allele not in HLA_prob:
+                HLA_prob[allele] = 0.0
+            HLA_prob[allele] += (float(count) / len(alleles))
+
+    # normalize2(HLA_prob, HLA_length)
+    normalize(HLA_prob)
+    def next_prob(HLA_cmpt, HLA_prob, HLA_length):
+        HLA_prob_next = {}
+        for cmpt, count in HLA_cmpt.items():
+            alleles = cmpt.split('-')
+            alleles_prob = 0.0
+            for allele in alleles:
+                assert allele in HLA_prob
+                alleles_prob += HLA_prob[allele]
+            for allele in alleles:
+                if allele not in HLA_prob_next:
+                    HLA_prob_next[allele] = 0.0
+                HLA_prob_next[allele] += (float(count) * HLA_prob[allele] / alleles_prob)
+        # normalize2(HLA_prob_next, HLA_length)
+        normalize(HLA_prob_next)
+        return HLA_prob_next
+
+    diff, iter = 1.0, 0
+    while diff > 0.0001 and iter < 1000:
+        HLA_prob_next = next_prob(HLA_cmpt, HLA_prob, HLA_length)
+        diff = prob_diff(HLA_prob, HLA_prob_next)
+        HLA_prob = HLA_prob_next
+        iter += 1
+    for allele, prob in HLA_prob.items():
+        allele_len = HLA_length[allele]
+        HLA_prob[allele] /= float(allele_len)
+    normalize(HLA_prob)
+    HLA_prob = [[allele, prob] for allele, prob in HLA_prob.items()]
+    HLA_prob = sorted(HLA_prob, cmp=HLA_prob_cmp)
+    return HLA_prob
+
+    
+"""
+"""
+def joint_abundance(HLA_cmpt,
+                    HLA_length):
+    allele_names = set()
+    for cmpt in HLA_cmpt.keys():
+        allele_names |= set(cmpt.split('-'))
+    
+    HLA_prob, HLA_prob_next = {}, {}
+    for cmpt, count in HLA_cmpt.items():
+        alleles = cmpt.split('-')
+        for allele1 in alleles:
+            for allele2 in allele_names:
+                if allele1 < allele2:
+                    allele_pair = "%s-%s" % (allele1, allele2)
+                else:
+                    allele_pair = "%s-%s" % (allele2, allele1)
+                if not allele_pair in HLA_prob:
+                    HLA_prob[allele_pair] = 0.0
+                HLA_prob[allele_pair] += (float(count) / len(alleles))
+
+    if len(HLA_prob) <= 0:
+        return HLA_prob
+
+    # Choose top allele pairs
+    def choose_top_alleles(HLA_prob):
+        HLA_prob_list = [[allele_pair, prob] for allele_pair, prob in HLA_prob.items()]
+        HLA_prob_list = sorted(HLA_prob_list, cmp=HLA_prob_cmp)
+        HLA_prob = {}
+        best_prob = HLA_prob_list[0][1]
+        for i in range(len(HLA_prob_list)):
+            allele_pair, prob = HLA_prob_list[i]
+            if prob * 2 <= best_prob:
+                break                        
+            HLA_prob[allele_pair] = prob
+        normalize(HLA_prob)
+        return HLA_prob
+    HLA_prob = choose_top_alleles(HLA_prob)
+
+    def next_prob(HLA_cmpt, HLA_prob):
+        HLA_prob_next = {}
+        for cmpt, count in HLA_cmpt.items():
+            alleles = cmpt.split('-')
+            prob = 0.0
+            for allele in alleles:
+                for allele_pair in HLA_prob.keys():
+                    if allele in allele_pair:
+                        prob += HLA_prob[allele_pair]
+            for allele in alleles:
+                for allele_pair in HLA_prob.keys():
+                    if not allele in allele_pair:
+                        continue
+                    if allele_pair not in HLA_prob_next:
+                        HLA_prob_next[allele_pair] = 0.0
+                    HLA_prob_next[allele_pair] += (float(count) * HLA_prob[allele_pair] / prob)
+        normalize(HLA_prob_next)
+        return HLA_prob_next
+
+    diff, iter = 1.0, 0
+    while diff > 0.0001 and iter < 1000:
+        HLA_prob_next = next_prob(HLA_cmpt, HLA_prob)
+        diff = prob_diff(HLA_prob, HLA_prob_next)
+        HLA_prob = HLA_prob_next
+        HLA_prob = choose_top_alleles(HLA_prob)
+        iter += 1
+
+    HLA_prob = [[allele_pair, prob] for allele_pair, prob in HLA_prob.items()]
+    HLA_prob = sorted(HLA_prob, cmp=HLA_prob_cmp)
+    return HLA_prob
+
+
+"""
+"""
+def HLA_typing(ex_path,
+               simulation,
+               reference_type,
+               hla_list,
+               partial,
+               refHLAs,
+               HLAs,
+               HLA_names,
+               HLA_lengths,
+               refHLA_loci,
+               Vars,
+               Var_list,
+               Links,
+               exclude_allele_list,
+               aligners,
+               num_mismatch,
+               fastq,
+               read_fname,
+               alignment_fname,
+               threads,
+               enable_coverage,
+               best_alleles,
+               verbose):
+
+    def lower_bound(Var_list, pos):
+        low, high = 0, len(Var_list)
+        while low < high:
+            m = (low + high) / 2
+            m_pos = Var_list[m][0]
+            if m_pos < pos:
+                low = m + 1
+            elif m_pos > pos:
+                high = m
+            else:
+                assert m_pos == pos
+                while m > 0:
+                    if Var_list[m-1][0] < pos:
+                        break
+                    m -= 1
+                return m
+        return low        
+            
+    if simulation:
+        test_passed = {}
+    for aligner, index_type in aligners:
+        if index_type == "graph":
+            print >> sys.stderr, "\n\t\t%s %s on %s" % (aligner, index_type, reference_type)
+        else:
+            print >> sys.stderr, "\n\t\t%s %s" % (aligner, index_type)
+
+        if alignment_fname == "":
+            # Align reads, and sort the alignments into a BAM file
+            align_reads(ex_path,
+                        aligner,
+                        index_type,
+                        read_fname,
+                        fastq,
+                        threads,
+                        verbose)
+            
+        for test_HLA_names in hla_list:
+            if simulation:
+                gene = test_HLA_names[0].split('*')[0]
+            else:
+                gene = test_HLA_names
+            ref_allele = refHLAs[gene]
+            ref_seq = HLAs[gene][ref_allele]
+            ref_exons = refHLA_loci[gene][-1]
+
+            # Read alignments
+            alignview_cmd = ["samtools",
+                             "view"]
+            if alignment_fname == "":
+                alignview_cmd += ["hla_input.bam"]
+            else:
+                if not os.path.exists(alignment_fname + ".bai"):
+                    os.system("samtools index %s" % alignment_fname)
+                alignview_cmd += [alignment_fname]
+            base_locus = 0
+            if index_type == "graph":
+                if reference_type == "gene":
+                    alignview_cmd += ["%s" % ref_allele]
+                else:
+                    assert reference_type in ["chromosome", "genome"]
+                    _, chr, left, right, _ = refHLA_loci[gene]
+                    base_locus = left
+                    alignview_cmd += ["%s:%d-%d" % (chr, left + 1, right + 1)]
+
+                bamview_proc = subprocess.Popen(alignview_cmd,
+                                                stdout=subprocess.PIPE,
+                                                stderr=open("/dev/null", 'w'))
+
+                sort_read_cmd = ["sort", "-k", "1", "-n"]
+                alignview_proc = subprocess.Popen(sort_read_cmd,
+                                                  stdin=bamview_proc.stdout,
+                                                  stdout=subprocess.PIPE,
+                                                  stderr=open("/dev/null", 'w'))
+            else:
+                alignview_proc = subprocess.Popen(alignview_cmd,
+                                             stdout=subprocess.PIPE,
+                                             stderr=open("/dev/null", 'w'))
+
+            # Count alleles
+            HLA_counts, HLA_cmpt = {}, {}
+            coverage = [0 for i in range(len(ref_seq) + 1)]
+            num_reads, total_read_len = 0, 0
+            prev_read_id = None
+            prev_exon = False
+            if index_type == "graph":
+                # Cigar regular expression
+                cigar_re = re.compile('\d+\w')
+                for line in alignview_proc.stdout:
+                    cols = line.strip().split()
+                    read_id, flag, chr, pos, mapQ, cigar_str = cols[:6]
+                    read_seq, qual = cols[9], cols[10]
+                    num_reads += 1
+                    total_read_len += len(read_seq)
+                    flag, pos = int(flag), int(pos)
+                    pos -= (base_locus + 1)
+                    if pos < 0:
+                        continue
+
+                    if flag & 0x4 != 0:
+                        continue
+
+                    NM, Zs, MD = "", "", ""
+                    for i in range(11, len(cols)):
+                        col = cols[i]
+                        if col.startswith("Zs"):
+                            Zs = col[5:]
+                        elif col.startswith("MD"):
+                            MD = col[5:]
+                        elif col.startswith("NM"):
+                            NM = int(col[5:])
+
+                    if NM > num_mismatch:
+                        continue
+
+                    # daehwan - for debugging purposes
+                    debug = False
+                    if read_id in ["2339"] and False:
+                        debug = True
+                        print "read_id: %s)" % read_id, pos, cigar_str, "NM:", NM, MD, Zs
+                        print "            ", read_seq
+
+                    vars = []
+                    if Zs:
+                        vars = Zs.split(',')
+
+                    assert MD != ""
+                    MD_str_pos, MD_len = 0, 0
+                    read_pos, left_pos = 0, pos
+                    right_pos = left_pos
+                    cigars = cigar_re.findall(cigar_str)
+                    cigars = [[cigar[-1], int(cigar[:-1])] for cigar in cigars]
+                    cmp_list = []
+                    for i in range(len(cigars)):
+                        cigar_op, length = cigars[i]
+                        if cigar_op == 'M':
+                            # Update coverage
+                            if enable_coverage:
+                                if right_pos + length < len(coverage):
+                                    coverage[right_pos] += 1
+                                    coverage[right_pos + length] -= 1
+                                elif right_pos < len(coverage):
+                                    coverage[right_pos] += 1
+                                    coverage[-1] -= 1
+
+                            first = True
+                            MD_len_used = 0
+                            while True:
+                                if not first or MD_len == 0:
+                                    if MD[MD_str_pos].isdigit():
+                                        num = int(MD[MD_str_pos])
+                                        MD_str_pos += 1
+                                        while MD_str_pos < len(MD):
+                                            if MD[MD_str_pos].isdigit():
+                                                num = num * 10 + int(MD[MD_str_pos])
+                                                MD_str_pos += 1
+                                            else:
+                                                break
+                                        MD_len += num
+                                # Insertion or full match followed
+                                if MD_len >= length:
+                                    MD_len -= length
+                                    cmp_list.append(["match", right_pos + MD_len_used, length - MD_len_used])
+                                    break
+                                first = False
+                                read_base = read_seq[read_pos + MD_len]
+                                MD_ref_base = MD[MD_str_pos]
+                                MD_str_pos += 1
+                                assert MD_ref_base in "ACGT"
+                                cmp_list.append(["match", right_pos + MD_len_used, MD_len - MD_len_used])
+                                cmp_list.append(["mismatch", right_pos + MD_len, 1])
+                                MD_len_used = MD_len + 1
+                                MD_len += 1
+                                # Full match
+                                if MD_len == length:
+                                    MD_len = 0
+                                    break
+                        elif cigar_op == 'I':
+                            cmp_list.append(["insertion", right_pos, length])
+                        elif cigar_op == 'D':
+                            if MD[MD_str_pos] == '0':
+                                MD_str_pos += 1
+                            assert MD[MD_str_pos] == '^'
+                            MD_str_pos += 1
+                            while MD_str_pos < len(MD):
+                                if not MD[MD_str_pos] in "ACGT":
+                                    break
+                                MD_str_pos += 1
+                            cmp_list.append(["deletion", right_pos, length])
+                        elif cigar_op == 'S':
+                            cmp_list.append(["soft", right_pos, length])
+                        else:                    
+                            assert cigar_op == 'N'
+                            cmp_list.append(["intron", right_pos, length])
+
+                        if cigar_op in "MND":
+                            right_pos += length
+
+                        if cigar_op in "MIS":
+                            read_pos += length
+
+                    exon = False
+                    for exon in ref_exons:
+                        exon_left, exon_right = exon
+                        if right_pos <= exon_left or pos > exon_right:
+                            continue
+                        else:
+                            exon = True
+                            break
+
+                    if right_pos > len(ref_seq):
+                        continue
+
+                    def add_stat(HLA_cmpt, HLA_counts, HLA_count_per_read, exon = True):
+                        max_count = max(HLA_count_per_read.values())
+                        cur_cmpt = set()
+                        for allele, count in HLA_count_per_read.items():
+                            if count < max_count:
+                                continue
+                            if allele in exclude_allele_list:
+                                continue                                
+                            cur_cmpt.add(allele)                    
+                            if not allele in HLA_counts:
+                                HLA_counts[allele] = 1
+                            else:
+                                HLA_counts[allele] += 1
+
+                        if len(cur_cmpt) == 0:
+                            return
+
+                        # daehwan - for debugging purposes                            
+                        alleles = ["", ""]
+                        # alleles = ["B*40:304", "B*40:02:01"]
+                        allele1_found, allele2_found = False, False
+                        for allele, count in HLA_count_per_read.items():
+                            if count < max_count:
+                                continue
+                            if allele == alleles[0]:
+                                allele1_found = True
+                            elif allele == alleles[1]:
+                                allele2_found = True
+                        if allele1_found != allele2_found:
+                            print alleles[0], HLA_count_per_read[alleles[0]]
+                            print alleles[1], HLA_count_per_read[alleles[1]]
+                            if allele1_found:
+                                print ("%s\tread_id %s - %d vs. %d]" % (alleles[0], prev_read_id, max_count, HLA_count_per_read[alleles[1]]))
+                            else:
+                                print ("%s\tread_id %s - %d vs. %d]" % (alleles[1], prev_read_id, max_count, HLA_count_per_read[alleles[0]]))
+                            print read_seq
+
+                        cur_cmpt = sorted(list(cur_cmpt))
+                        cur_cmpt = '-'.join(cur_cmpt)
+                        add = 1
+                        if partial and not exon:
+                            add *= 0.2
+                        if not cur_cmpt in HLA_cmpt:
+                            HLA_cmpt[cur_cmpt] = add
+                        else:
+                            HLA_cmpt[cur_cmpt] += add
+
+                    if read_id != prev_read_id:
+                        if prev_read_id != None:
+                            add_stat(HLA_cmpt, HLA_counts, HLA_count_per_read, prev_exon)
+
+                        HLA_count_per_read = {}
+                        for HLA_name in HLA_names[gene]:
+                            if HLA_name.find("BACKBONE") != -1:
+                                continue
+                            HLA_count_per_read[HLA_name] = 0
+
+                    def add_count(var_id, add):
+                        assert var_id in Links
+                        alleles = Links[var_id]
+                        for allele in alleles:
+                            if allele.find("BACKBONE") != -1:
+                                continue
+                            HLA_count_per_read[allele] += add
+                            # daehwan - for debugging purposes
+                            if debug:
+                                if allele in ["DQA1*05:05:01:01", "DQA1*05:05:01:02"]:
+                                    print allele, add, var_id
+
+                    # Decide which allele(s) a read most likely came from
+                    # also sanity check - read length, cigar string, and MD string
+                    for var_id, data in Vars[gene].items():
+                        var_type, var_pos, var_data = data
+                        if var_type != "deletion":
+                            continue
+                        if left_pos >= var_pos and right_pos <= var_pos + int(var_data):
+                            add_count(var_id, -1)                            
+                    ref_pos, read_pos, cmp_cigar_str, cmp_MD = left_pos, 0, "", ""
+                    cigar_match_len, MD_match_len = 0, 0            
+                    for cmp in cmp_list:
+                        type = cmp[0]
+                        length = cmp[2]
+                        if type == "match":
+                            var_idx = lower_bound(Var_list[gene], ref_pos)
+                            while var_idx < len(Var_list[gene]):
+                                var_pos, var_id = Var_list[gene][var_idx]
+                                if ref_pos + length <= var_pos:
+                                    break
+                                if ref_pos <= var_pos:
+                                    var_type, _, var_data = Vars[gene][var_id]
+                                    if var_type == "insertion":
+                                        if ref_pos < var_pos and ref_pos + length > var_pos + len(var_data):
+                                            add_count(var_id, -1)
+                                            # daehwan - for debugging purposes
+                                            if debug:
+                                                print cmp, var_id, Links[var_id]
+                                    elif var_type == "deletion":
+                                        del_len = int(var_data)
+                                        if ref_pos < var_pos and ref_pos + length > var_pos + del_len:
+                                            # daehwan - for debugging purposes
+                                            if debug:
+                                                print cmp, var_id, Links[var_id], -1, Vars[gene][var_id]
+                                            # Check if this might be one of the two tandem repeats (the same left coordinate)
+                                            cmp_left, cmp_right = cmp[1], cmp[1] + cmp[2]
+                                            test1_seq1 = ref_seq[cmp_left:cmp_right]
+                                            test1_seq2 = ref_seq[cmp_left:var_pos] + ref_seq[var_pos + del_len:cmp_right + del_len]
+                                            # Check if this happens due to small repeats (the same right coordinate - e.g. 19 times of TTTC in DQA1*05:05:01:02)
+                                            cmp_left -= read_pos
+                                            cmp_right += (len(read_seq) - read_pos - cmp[2])
+                                            test2_seq1 = ref_seq[cmp_left+int(var_data):cmp_right]
+                                            test2_seq2 = ref_seq[cmp_left:var_pos] + ref_seq[var_pos+int(var_data):cmp_right]
+                                            if test1_seq1 != test1_seq2 and test2_seq1 != test2_seq2:
+                                                add_count(var_id, -1)
+                                    else:
+                                        if debug:
+                                            print cmp, var_id, Links[var_id], -1
+                                        add_count(var_id, -1)
+                                var_idx += 1
+
+                            read_pos += length
+                            ref_pos += length
+                            cigar_match_len += length
+                            MD_match_len += length
+                        elif type == "mismatch":
+                            read_base = read_seq[read_pos]
+                            var_idx = lower_bound(Var_list[gene], ref_pos)
+                            while var_idx < len(Var_list[gene]):
+                                var_pos, var_id = Var_list[gene][var_idx]
+                                if ref_pos < var_pos:
+                                    break
+                                if ref_pos == var_pos:
+                                    var_type, _, var_data = Vars[gene][var_id]
+                                    if var_type == "single":
+                                        if var_data == read_base:
+                                            # daehwan - for debugging purposes
+                                            if debug:
+                                                print cmp, var_id, 1, var_data, read_base, Links[var_id]
+
+                                            # daehwan - for debugging purposes
+                                            if False:
+                                                read_qual = ord(qual[read_pos])
+                                                add_count(var_id, (read_qual - 60) / 60.0)
+                                            else:
+                                                add_count(var_id, 1)
+                                        # daehwan - check out if this routine is appropriate
+                                        # else:
+                                        #    add_count(var_id, -1)
+                                var_idx += 1
+
+                            cmp_MD += ("%d%s" % (MD_match_len, ref_seq[ref_pos]))
+                            MD_match_len = 0
+                            cigar_match_len += 1
+                            read_pos += 1
+                            ref_pos += 1
+                        elif type == "insertion":
+                            ins_seq = read_seq[read_pos:read_pos+length]
+                            var_idx = lower_bound(Var_list[gene], ref_pos)
+                            # daehwan - for debugging purposes
+                            if debug:
+                                print left_pos, cigar_str, MD, vars
+                                print ref_pos, ins_seq, Var_list[gene][var_idx], Vars[gene][Var_list[gene][var_idx][1]]
+                                # sys.exit(1)
+                            while var_idx < len(Var_list[gene]):
+                                var_pos, var_id = Var_list[gene][var_idx]
+                                if ref_pos < var_pos:
+                                    break
+                                if ref_pos == var_pos:
+                                    var_type, _, var_data = Vars[gene][var_id]
+                                    if var_type == "insertion":                                
+                                        if var_data == ins_seq:
+                                            # daehwan - for debugging purposes
+                                            if debug:
+                                                print cmp, var_id, 1, Links[var_id]
+                                            add_count(var_id, 1)
+                                var_idx += 1
+
+                            if cigar_match_len > 0:
+                                cmp_cigar_str += ("%dM" % cigar_match_len)
+                                cigar_match_len = 0
+                            read_pos += length
+                            cmp_cigar_str += ("%dI" % length)
+                        elif type == "deletion":
+                            del_len = length
+                            # Deletions can be shifted bidirectionally
+                            temp_ref_pos = ref_pos
+                            while temp_ref_pos > 0:
+                                last_bp = ref_seq[temp_ref_pos + del_len - 1]
+                                prev_bp = ref_seq[temp_ref_pos - 1]
+                                if last_bp != prev_bp:
+                                    break
+                                temp_ref_pos -= 1
+                            var_idx = lower_bound(Var_list[gene], temp_ref_pos)
+                            while var_idx < len(Var_list[gene]):
+                                var_pos, var_id = Var_list[gene][var_idx]
+                                if temp_ref_pos < var_pos:
+                                    first_bp = ref_seq[temp_ref_pos]
+                                    next_bp = ref_seq[temp_ref_pos + del_len]
+                                    if first_bp == next_bp:
+                                        temp_ref_pos += 1
+                                        continue
+                                    else:
+                                        break
+                                if temp_ref_pos == var_pos:
+                                    var_type, _, var_data = Vars[gene][var_id]
+                                    if var_type == "deletion":
+                                        var_len = int(var_data)
+                                        if var_len == length:
+                                            if debug:
+                                                print cmp, var_id, 1, Links[var_id]
+                                                print ref_seq[var_pos - 10:var_pos], ref_seq[var_pos:var_pos+int(var_data)], ref_seq[var_pos+int(var_data):var_pos+int(var_data)+10]
+                                            add_count(var_id, 1)
+                                var_idx += 1
+
+                            if cigar_match_len > 0:
+                                cmp_cigar_str += ("%dM" % cigar_match_len)
+                                cigar_match_len = 0
+                            cmp_MD += ("%d" % MD_match_len)
+                            MD_match_len = 0
+                            cmp_cigar_str += ("%dD" % length)
+                            cmp_MD += ("^%s" % ref_seq[ref_pos:ref_pos+length])
+                            ref_pos += length
+                        elif type == "soft":
+                            if cigar_match_len > 0:
+                                cmp_cigar_str += ("%dM" % cigar_match_len)
+                                cigar_match_len = 0
+                            read_pos += length
+                            cmp_cigar_str += ("%dS" % length)
+                        else:
+                            assert type == "intron"
+                            if cigar_match_len > 0:
+                                cmp_cigar_str += ("%dM" % cigar_match_len)
+                                cigar_match_len = 0
+                            cmp_cigar_str += ("%dN" % length)
+                            ref_pos += length                    
+                    if cigar_match_len > 0:
+                        cmp_cigar_str += ("%dM" % cigar_match_len)
+                    cmp_MD += ("%d" % MD_match_len)
+                    if read_pos != len(read_seq) or \
+                            cmp_cigar_str != cigar_str or \
+                            cmp_MD != MD:
+                        print >> sys.stderr, "Error:", cigar_str, MD
+                        print >> sys.stderr, "\tcomputed:", cmp_cigar_str, cmp_MD
+                        print >> sys.stderr, "\tcmp list:", cmp_list
+                        assert False            
+
+                    prev_read_id = read_id
+                    prev_exon = exon
+
+                if num_reads <= 0:
+                    continue
+
+                if prev_read_id != None:
+                    add_stat(HLA_cmpt, HLA_counts, HLA_count_per_read)
+
+                # Coverage
+                # it is not used by the default
+                if enable_coverage:
+                    assert num_reads > 0
+                    read_len = int(total_read_len / float(num_reads))
+                    coverage_sum = 0
+                    for i in range(len(coverage)):
+                        if i > 0:
+                            coverage[i] += coverage[i-1]
+                        coverage_sum += coverage[i]
+                    coverage_avg = coverage_sum / float(len(coverage))
+                    assert len(ref_seq) < len(coverage)
+                    for i in range(len(ref_seq)):
+                        coverage_threshold = 1.0 * coverage_avg
+                        if i < read_len:
+                            coverage_threshold *= ((i+1) / float(read_len))
+                        elif i + read_len > len(ref_seq):
+                            coverage_threshold *= ((len(ref_seq) - i) / float(read_len))
+                        if coverage[i] >= coverage_threshold:
+                            continue
+                        pseudo_num_reads = (coverage_threshold - coverage[i]) / read_len
+                        var_idx = lower_bound(Var_list[gene], i + 1)
+                        if var_idx >= len(Var_list[gene]):
+                            var_idx = len(Var_list[gene]) - 1
+                        cur_cmpt = set()
+                        while var_idx >= 0:
+                            var_pos, var_id = Var_list[gene][var_idx]
+                            var_type, _, var_data = Vars[gene][var_id]
+                            if var_type == "deletion":
+                                del_len = int(var_data)
+                                if i < var_pos:
+                                    break
+                                if i + read_len < var_pos + int(var_data):
+                                    assert var_id in Links
+                                    cur_cmpt = cur_cmpt.union(set(Links[var_id]))
+                            var_idx -= 1
+                        if cur_cmpt:
+                            cur_cmpt = '-'.join(list(cur_cmpt))
+                            if not cur_cmpt in HLA_cmpt:
+                                HLA_cmpt[cur_cmpt] = 0
+                            HLA_cmpt[cur_cmpt] += pseudo_num_reads
+            else:
+                assert index_type == "linear"
+                def add_alleles(alleles):
+                    if not allele in HLA_counts:
+                        HLA_counts[allele] = 1
+                    else:
+                        HLA_counts[allele] += 1
+
+                    cur_cmpt = sorted(list(alleles))
+                    cur_cmpt = '-'.join(cur_cmpt)
+                    if not cur_cmpt in HLA_cmpt:
+                        HLA_cmpt[cur_cmpt] = 1
+                    else:
+                        HLA_cmpt[cur_cmpt] += 1
+
+                prev_read_id, prev_AS = None, None
+                alleles = set()
+                for line in alignview_proc.stdout:
+                    cols = line[:-1].split()
+                    read_id, flag, allele = cols[:3]
+                    flag = int(flag)
+                    if flag & 0x4 != 0:
+                        continue
+                    if not allele.startswith(gene):
+                        continue
+                    if allele.find("BACKBONE") != -1:
+                        continue
+
+                    AS = None
+                    for i in range(11, len(cols)):
+                        col = cols[i]
+                        if col.startswith("AS"):
+                            AS = int(col[5:])
+                    assert AS != None
+                    if read_id != prev_read_id:
+                        if alleles:
+                            if aligner == "hisat2" or \
+                                    (aligner == "bowtie2" and len(alleles) < 10):
+                                add_alleles(alleles)
+                            alleles = set()
+                        prev_AS = None
+                    if prev_AS != None and AS < prev_AS:
+                        continue
+                    prev_read_id = read_id
+                    prev_AS = AS
+                    alleles.add(allele)
+
+                if alleles:
+                    add_alleles(alleles)
+
+            HLA_counts = [[allele, count] for allele, count in HLA_counts.items()]
+            def HLA_count_cmp(a, b):
+                if a[1] != b[1]:
+                    return b[1] - a[1]
+                assert a[0] != b[0]
+                if a[0] < b[0]:
+                    return -1
+                else:
+                    return 1
+            HLA_counts = sorted(HLA_counts, cmp=HLA_count_cmp)
+            for count_i in range(len(HLA_counts)):
+                count = HLA_counts[count_i]
+                if simulation:
+                    found = False
+                    for test_HLA_name in test_HLA_names:
+                        if count[0] == test_HLA_name:
+                            print >> sys.stderr, "\t\t\t*** %d ranked %s (count: %d)" % (count_i + 1, test_HLA_name, count[1])
+                            found = True
+                            """
+                            if count_i > 0 and HLA_counts[0][1] > count[1]:
+                                print >> sys.stderr, "Warning: %s ranked first (count: %d)" % (HLA_counts[0][0], HLA_counts[0][1])
+                                assert False
+                            else:
+                                test_passed += 1
+                            """
+                    if count_i < 5 and not found:
+                        print >> sys.stderr, "\t\t\t\t%d %s (count: %d)" % (count_i + 1, count[0], count[1])
+                else:
+                    print >> sys.stderr, "\t\t\t\t%d %s (count: %d)" % (count_i + 1, count[0], count[1])
+                    if count_i >= 9:
+                        break
+            print >> sys.stderr
+
+            HLA_prob = single_abundance(HLA_cmpt, HLA_lengths[gene])
+
+            success = [False for i in range(len(test_HLA_names))]
+            found_list = [False for i in range(len(test_HLA_names))]
+            for prob_i in range(len(HLA_prob)):
+                prob = HLA_prob[prob_i]
+                found = False
+                if simulation:
+                    for name_i in range(len(test_HLA_names)):
+                        test_HLA_name = test_HLA_names[name_i]
+                        if prob[0] == test_HLA_name:
+                            rank_i = prob_i
+                            while rank_i > 0:
+                                if prob == HLA_prob[rank_i - 1][1]:
+                                    rank_i -= 1
+                                else:
+                                    break
+                            print >> sys.stderr, "\t\t\t*** %d ranked %s (abundance: %.2f%%)" % (rank_i + 1, test_HLA_name, prob[1] * 100.0)
+                            if rank_i < len(success):
+                                success[rank_i] = True
+                            found_list[name_i] = True
+                            found = True                        
+                    if not False in found_list:
+                        break
+                if not found:
+                    print >> sys.stderr, "\t\t\t\t%d ranked %s (abundance: %.2f%%)" % (prob_i + 1, prob[0], prob[1] * 100.0)
+                    if best_alleles and prob_i < 2:
+                        print >> sys.stdout, "SingleModel %s (abundance: %.2f%%)" % (prob[0], prob[1] * 100.0)
+                if not simulation and prob_i >= 9:
+                    break
+            print >> sys.stderr
+
+            if len(test_HLA_names) == 2 or not simulation:
+                HLA_prob = joint_abundance(HLA_cmpt, HLA_lengths[gene])
+                if len(HLA_prob) <= 0:
+                    continue
+                success = [False]
+                for prob_i in range(len(HLA_prob)):
+                    allele_pair, prob = HLA_prob[prob_i]
+                    allele1, allele2 = allele_pair.split('-')
+                    if best_alleles and prob_i < 1:
+                        print >> sys.stdout, "PairModel %s (abundance: %.2f%%)" % (allele_pair, prob * 100.0)
+                    if simulation:
+                        if allele1 in test_HLA_names and allele2 in test_HLA_names:
+                            rank_i = prob_i
+                            while rank_i > 0:
+                                if HLA_prob[rank_i-1][1] == prob:                                        
+                                    rank_i -= 1
+                                else:
+                                    break
+                            print >> sys.stderr, "\t\t\t*** %d ranked %s (abundance: %.2f%%)" % (rank_i + 1, allele_pair, prob * 100.0)
+                            if rank_i == 0:
+                                success[0] = True
+                            break
+                    print >> sys.stderr, "\t\t\t\t%d ranked %s (abundance: %.2f%%)" % (prob_i + 1, allele_pair, prob * 100.0)
+                    if not simulation and prob_i >= 9:
+                        break
+                print >> sys.stderr
+
+                # Li's method
+                """
+                li_hla = os.path.join(ex_path, "li_hla/hla")
+                if os.path.exists(li_hla):
+                    li_hla_cmd = [li_hla,
+                                  "hla",
+                                  "hla_input.bam",
+                                  "-b", "%s*BACKBONE" % gene]
+                    li_hla_proc = subprocess.Popen(li_hla_cmd,
+                                                   stdout=subprocess.PIPE,
+                                                   stderr=open("/dev/null", 'w'))
+
+                    # read in the result of Li's hla
+                    for line in li_hla_proc.stdout:
+                        allele1, allele2, score = line.strip().split()
+                        score = float(score)
+                        if simulation:
+                            if allele1 in test_HLA_names and allele2 in test_HLA_names:
+                                print >> sys.stderr, "\t\t\t*** 1 ranked %s-%s (score: %.2f)" % (allele1, allele2, score)
+                                success[0] = True
+                            else:
+                                print >> sys.stderr, "\t\t\tLiModel fails"
+                        if best_alleles:
+                            print >> sys.stdout, "LiModel %s-%s (score: %.2f)" % (allele1, allele2, score)
+                    li_hla_proc.communicate()
+                """
+
+            if simulation and not False in success:
+                aligner_type = "%s %s" % (aligner, index_type)
+                if not aligner_type in test_passed:
+                    test_passed[aligner_type] = 1
+                else:
+                    test_passed[aligner_type] += 1
+
+    if simulation:
+        return test_passed
+
+
+def read_HLA_alleles(fname, HLAs):
+    for line in open(fname):
+        if line.startswith(">"):
+            HLA_name = line.strip().split()[0][1:]
+            HLA_gene = HLA_name.split('*')[0]
+            if not HLA_gene in HLAs:
+                HLAs[HLA_gene] = {}
+            if not HLA_name in HLAs[HLA_gene]:
+                HLAs[HLA_gene][HLA_name] = ""
+        else:
+            HLAs[HLA_gene][HLA_name] += line.strip()
+    return HLAs
+
+"""
+"""
+def genotyping(base_fname,
+               reference_type,
+               hla_list,
+               partial,
+               aligners,
+               read_fname,
+               alignment_fname,
+               threads,
+               simulate_interval,
+               enable_coverage,
+               best_alleles,
+               exclude_allele_list,
+               default_allele_list,
+               num_mismatch,
+               verbose,
+               daehwan_debug):
+    # Current script directory
+    curr_script = os.path.realpath(inspect.getsourcefile(genotyping))
+    ex_path = os.path.dirname(curr_script)
+
+    # Clone a git repository, IMGTHLA
+    if not os.path.exists("IMGTHLA"):
+        os.system("git clone https://github.com/jrob119/IMGTHLA.git")
+
+    # Clone hisat2 genotype database, hisat_genotype_db
+    if not os.path.exists("hisat_genotype_db"):
+        os.system("git clone https://github.com/infphilo/hisat_genotype_db.git")
+
+    simulation = (read_fname == [] and alignment_fname == "")
+
+    def check_files(fnames):
+        for fname in fnames:
+            if not os.path.exists(fname):
+                return False
+        return True
+
+    # Download HISAT2 index
+    HISAT2_fnames = ["grch38",
+                     "genome.fa",
+                     "genome.fa.fai"]
+    if not check_files(HISAT2_fnames):
+        os.system("wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38.tar.gz; tar xvzf grch38.tar.gz; rm grch38.tar.gz")
+        hisat2_inspect = os.path.join(ex_path, "hisat2-inspect")
+        os.system("%s grch38/genome > genome.fa" % hisat2_inspect)
+        os.system("samtools faidx genome.fa")
+
+    # Check if the pre-existing files (hla*) are compatible with the current parameter setting
+    if os.path.exists("hla.ref"):
+        left = 0
+        HLA_genes = set()
+        BACKBONE = False
+        for line in open("hla.ref"):
+            HLA_name = line.strip().split()[0]
+            if HLA_name.find("BACKBONE") != -1:
+                BACKBONE = True
+            HLA_gene = HLA_name.split('*')[0]
+            HLA_genes.add(HLA_gene)
+        delete_hla_files = False
+        if reference_type == "gene":
+            if not BACKBONE:
+                delete_hla_files = True
+        elif reference_type in ["chromosome", "genome"]:
+            if BACKBONE:
+                delete_hla_files = True
+        else:
+            assert False
+        if not set(hla_list).issubset(HLA_genes):
+            delete_hla_files = True
+        if delete_hla_files:
+            os.system("rm hla*")
+    
+    # Extract HLA variants, backbone sequence, and other sequeces  
+    if len(base_fname) > 0:
+        base_fname = "_" + base_fname
+    base_fname = "hla" + base_fname
+    
+    HLA_fnames = [base_fname+"_backbone.fa",
+                  base_fname+"_sequences.fa",
+                  base_fname+".ref",
+                  base_fname+".snp",
+                  base_fname+".haplotype",
+                  base_fname+".link",
+                  base_fname+"_alleles_excluded.txt"]
+
+    
+    # Check if excluded alleles in current files match
+    excluded_alleles_match = False
+    if(os.path.exists(HLA_fnames[6])):
+        afile = open(HLA_fnames[6],'r')
+        afile.readline()
+        lines = afile.read().split()
+        excluded_alleles_match = (set(exclude_allele_list) == set(lines))
+        afile.close()
+    elif len(exclude_allele_list) == 0:
+        excluded_alleles_match = True
+        try:
+            temp_name = HLA_fnames[6]
+            HLA_fnames.remove(HLA_fnames[6])
+            os.remove(temp_name)
+        except OSError:
+            pass
+        
+    if not excluded_alleles_match:
+        print("Creating Allele Exclusion File.\n")
+        afile = open(HLA_fnames[6],'w')
+        afile.write("Alleles excluded:\n")
+        afile.write("\n".join(exclude_allele_list))
+        afile.close()
+        
+    print HLA_fnames
+    
+    if (not check_files(HLA_fnames)) or (not excluded_alleles_match) :
+        extract_hla_script = os.path.join(ex_path, "hisatgenotype_extract_vars.py")
+        extract_cmd = [extract_hla_script,
+                       "--reference-type", reference_type,
+                       "--hla-list", ','.join(hla_list)]
+
+        if len(exclude_allele_list) > 0:
+            print exclude_allele_list
+            extract_cmd += ["--exclude-allele-list", ",".join(exclude_allele_list)]
+
+        if len(base_fname) > 3:
+            extract_cmd += ["--base", base_fname]
+
+        if partial:
+            extract_cmd += ["--partial"]
+        extract_cmd += ["--inter-gap", "30",
+                        "--intra-gap", "50"]
+        if verbose:
+            print >> sys.stderr, "\tRunning:", ' '.join(extract_cmd)
+        proc = subprocess.Popen(extract_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
+        proc.communicate()
+        
+        if not check_files(HLA_fnames):
+            print >> sys.stderr, "Error: extract_HLA_vars failed!"
+            sys.exit(1)
+            
+    print "Base files built\n"
+
+    # Build HISAT2 graph indexes based on the above information
+    HLA_hisat2_graph_index_fnames = ["hla.graph.%d.ht2" % (i+1) for i in range(8)]
+    if not check_files(HLA_hisat2_graph_index_fnames) or (not excluded_alleles_match):
+        hisat2_build = os.path.join(ex_path, "hisat2-build")
+        build_cmd = [hisat2_build,
+                     "-p", str(threads),
+                     "--snp", HLA_fnames[3],
+                     "--haplotype", HLA_fnames[4] ,
+                     HLA_fnames[0],
+                     "hla.graph"]
+        if verbose:
+            print >> sys.stderr, "\tRunning:", ' '.join(build_cmd)
+        proc = subprocess.Popen(build_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
+        proc.communicate()        
+        if not check_files(HLA_hisat2_graph_index_fnames):
+            print >> sys.stderr, "Error: indexing HLA failed!  Perhaps, you may have forgotten to build hisat2 executables?"
+            sys.exit(1)
+    print "Step 1 Complete\n"
+    # Build HISAT2 linear indexes based on the above information
+    HLA_hisat2_linear_index_fnames = ["hla.linear.%d.ht2" % (i+1) for i in range(8)]
+    if reference_type == "gene" and (not check_files(HLA_hisat2_linear_index_fnames) or (not excluded_alleles_match)):
+        hisat2_build = os.path.join(ex_path, "hisat2-build")
+        build_cmd = [hisat2_build,
+                     "%s,%s"%(HLA_fnames[0],HLA_fnames[1]),
+                     "hla.linear"]
+        proc = subprocess.Popen(build_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
+        proc.communicate()        
+        if not check_files(HLA_hisat2_linear_index_fnames):
+            print >> sys.stderr, "Error: indexing HLA failed!"
+            sys.exit(1)
+            
+    print "Step 2 Complete\n"
+    # Build Bowtie2 indexes based on the above information
+    HLA_bowtie2_index_fnames = ["hla.%d.bt2" % (i+1) for i in range(4)]
+    HLA_bowtie2_index_fnames += ["hla.rev.%d.bt2" % (i+1) for i in range(2)]
+    if reference_type == "gene" and (not check_files(HLA_bowtie2_index_fnames) or (not excluded_alleles_match)):
+        build_cmd = ["bowtie2-build",
+                     "%s,%s"%(HLA_fnames[0],HLA_fnames[1]),
+                     "hla"]
+        proc = subprocess.Popen(build_cmd, stdout=open("/dev/null", 'w'))
+        proc.communicate()        
+        if not check_files(HLA_bowtie2_index_fnames):
+            print >> sys.stderr, "Error: indexing HLA failed!"
+            sys.exit(1)
+
+    print "Step 3 Complete\n"
+    # Read partial alleles from hla.data (temporary)
+    partial_alleles = set()
+    for line in open("IMGTHLA/hla.dat"):
+        if not line.startswith("DE"):
+            continue
+        allele_name = line.split()[1][4:-1]
+        gene = allele_name.split('*')[0]
+        if line.find("partial") != -1:
+            partial_alleles.add(allele_name)
+
+    if len(default_allele_list) != 0:
+        #print os.getcwd()
+        if not os.path.exists("./Default-HLA/hla_backbone.fa"):
+            #current_path = os.getcwd()
+            try:
+                os.mkdir("./Default-HLA")
+            except:
+                pass
+            #os.chdir(current_path + "/Default-HLA")
+            
+            extract_hla_script = os.path.join(ex_path, "hisat2_extract_HLA_vars.py")
+            extract_cmd = [extract_hla_script,
+                           "--reference-type", reference_type,
+                           "--hla-list", ','.join(hla_list),
+                           "--base", "./Default-HLA/hla"]
+
+            if partial:
+                extract_cmd += ["--partial"]
+            extract_cmd += ["--inter-gap", "30",
+                            "--intra-gap", "50"]
+            if verbose:
+                print >> sys.stderr, "\tRunning:", ' '.join(extract_cmd)
+            proc = subprocess.Popen(extract_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
+            proc.communicate()
+            
+            if not os.path.exists("./Default-HLA/hla_backbone.fa"):
+                print >> sys.stderr, "Error: extract_HLA_vars (Default) failed!"
+                sys.exit(1)
+    
+    # Read HLA alleles (names and sequences)
+    refHLAs, refHLA_loci = {}, {}
+    for line in open("hla.ref"):
+        HLA_name, chr, left, right, length, exon_str = line.strip().split()
+        HLA_gene = HLA_name.split('*')[0]
+        assert not HLA_gene in refHLAs
+        refHLAs[HLA_gene] = HLA_name
+        left, right = int(left), int(right)
+        exons = []
+        for exon in exon_str.split(','):
+            exon_left, exon_right = exon.split('-')
+            exons.append([int(exon_left), int(exon_right)])
+        refHLA_loci[HLA_gene] = [HLA_name, chr, left, right, exons]
+    HLAs = {}
+
+
+
+    if reference_type == "gene":
+        read_HLA_alleles(HLA_fnames[0], HLAs)
+    read_HLA_alleles(HLA_fnames[1], HLAs)
+    
+    # HLA gene alleles
+    HLA_names = {}
+    for HLA_gene, data in HLAs.items():
+        HLA_names[HLA_gene] = list(data.keys())
+
+    # HLA gene allele lengths
+    HLA_lengths = {}
+    for HLA_gene, HLA_alleles in HLAs.items():
+        HLA_lengths[HLA_gene] = {}
+        for allele_name, seq in HLA_alleles.items():
+            HLA_lengths[HLA_gene][allele_name] = len(seq)
+
+    # Construct excluded alleles (Via default backbone data)
+    custom_allele_check = False
+    if len(default_allele_list) > 0:
+        custom_allele_check = True
+        HLAs_default = {}
+        read_HLA_alleles("./Default-HLA/hla_backbone.fa",HLAs_default)
+        read_HLA_alleles("./Default-HLA/hla_sequences.fa",HLAs_default)
+        #HLA_lengths_default = {}
+        
+        
+        for HLA_gene, HLA_alleles in HLAs_default.items():
+            for allele_name, seq in HLA_alleles.items():
+                if allele_name in default_allele_list:
+                    HLA_lengths[HLA_gene][allele_name] = len(seq)
+        
+        #for allele_name, seq in HLAs_default.items():
+         #   if allele_name in default_allele_list:
+          #      HLA_lengths[allele_name] = len(seq)
+            #if (allele_name in default_allele_list):
+            #    HLA_lengths_default[allele_name] = len(seq)
+
+
+    # Read HLA variants, and link information
+    Vars, Var_list = {}, {}
+    for line in open(HLA_fnames[3]):
+        var_id, var_type, allele, pos, data = line.strip().split('\t')
+        pos = int(pos)
+        if reference_type != "gene":
+            allele, dist = None, 0
+            for tmp_gene, values in refHLA_loci.items():
+                allele_name, chr, left, right, exons = values
+                if allele == None:
+                    allele = allele_name
+                    dist = abs(pos - left)
+                else:
+                    if dist > abs(pos - left):
+                        allele = allele_name
+                        dist = abs(pos - left)
+            
+        gene = allele.split('*')[0]
+        if not gene in Vars:
+            Vars[gene] = {}
+            assert not gene in Var_list
+            Var_list[gene] = []
+            
+        assert not var_id in Vars[gene]
+        left = 0
+        if reference_type != "gene":
+            _, _, left, _, _ = refHLA_loci[gene]
+        Vars[gene][var_id] = [var_type, pos - left, data]
+        Var_list[gene].append([pos - left, var_id])
+        
+    for gene, in_var_list in Var_list.items():
+        Var_list[gene] = sorted(in_var_list)
+        
+    Links = {}
+    for line in open(HLA_fnames[5]):
+        var_id, alleles = line.strip().split('\t')
+        alleles = alleles.split()
+        assert not var_id in Links
+        Links[var_id] = alleles
+
+    # Scoring schemes from Sangtae Kim (Illumina)'s implementation
+    # Currently not used.
+    """
+    max_qual_value = 100
+    match_score, mismatch_score = [0] * max_qual_value, [0] * max_qual_value
+    for qual in range(max_qual_value):
+        error_rate = 0.1 ** (qual / 10.0)
+        match_score[qual] = math.log(1.000000000001 - error_rate);
+        mismatch_score[qual] = math.log(error_rate / 3.0);
+    """
+    # Test HLA typing
+    test_list = []
+    if simulation:
+        basic_test, pair_test = True, False
+        if daehwan_debug:
+            if "basic_test" in daehwan_debug:
+                basic_test, pair_test = True, False
+            else:
+                basic_test, pair_test = False, True
+
+        test_passed = {}
+        test_list = []
+        genes = list(set(hla_list) & set(HLA_names.keys()))
+        if basic_test:
+            for gene in genes:
+                HLA_gene_alleles = HLA_names[gene]
+                for HLA_name in HLA_gene_alleles:
+                    if HLA_name.find("BACKBONE") != -1:
+                        continue
+                    test_list.append([[HLA_name]])
+        if pair_test:
+            test_size = 500
+            allele_count = 2
+            for test_i in range(test_size):
+                test_pairs = []
+                for gene in genes:
+                    HLA_gene_alleles = []
+                    
+                    for allele in HLA_names[gene]:
+                        if allele.find("BACKBONE") != -1:
+                            continue
+                        HLA_gene_alleles.append(allele)
+                    nums = [i for i in range(len(HLA_gene_alleles))]
+                    random.shuffle(nums)
+                    test_pairs.append(sorted([HLA_gene_alleles[nums[i]] for i in range(allele_count)]))
+                test_list.append(test_pairs)
+
+        print test_list
+        if custom_allele_check:
+        
+            test_list = []
+            if basic_test:
+            #test_pairs = []
+            #allele_count = 2
+            #for allele in default_allele_list:
+            #nums = [i for i in range(len(default_allele_list))]
+            #random.shuffle(nums)
+            #test_pairs.append(sorted([default_allele_list[nums[i]] for i in range(allele_count)]))
+            #test_list.append(test_pairs)
+                for allele in default_allele_list:
+                    test_list.append([[allele]])
+        print test_list
+        
+        for test_i in range(len(test_list)):
+            if "test_id" in daehwan_debug:
+                daehwan_test_ids = daehwan_debug["test_id"].split('-')
+                if str(test_i + 1) not in daehwan_test_ids:
+                    continue
+
+            print >> sys.stderr, "Test %d" % (test_i + 1)
+            test_HLA_list = test_list[test_i]
+           
+            # daehwan - for debugging purposes
+            # test_HLA_list = [["A*11:50Q", "A*11:01:01:01", "A*01:01:01:01"]]
+            for test_HLA_names in test_HLA_list:
+                for test_HLA_name in test_HLA_names:
+                    if custom_allele_check:
+                        gene = test_HLA_name.split('*')[0]
+                        test_HLA_seq = HLAs_default[gene][test_HLA_name]
+                        seq_type = "partial" if test_HLA_name in partial_alleles else "full"
+                        print >> sys.stderr, "\t%s - %d bp (%s sequence)" % (test_HLA_name, len(test_HLA_seq), seq_type)
+                        continue
+                    gene = test_HLA_name.split('*')[0]
+                    test_HLA_seq = HLAs[gene][test_HLA_name]
+                    seq_type = "partial" if test_HLA_name in partial_alleles else "full"
+                    print >> sys.stderr, "\t%s - %d bp (%s sequence)" % (test_HLA_name, len(test_HLA_seq), seq_type)
+            if custom_allele_check:
+                simulate_reads(HLAs_default, test_HLA_list, simulate_interval)
+            else:
+                simulate_reads(HLAs, test_HLA_list, simulate_interval)
+
+            if "test_id" in daehwan_debug:
+                read_fname = ["hla_input_1.fa"]
+            else:
+                read_fname = ["hla_input_1.fa", "hla_input_2.fa"]
+
+            fastq = False
+            tmp_test_passed = HLA_typing(ex_path,
+                                         simulation,
+                                         reference_type,
+                                         test_HLA_list,
+                                         partial,
+                                         refHLAs,
+                                         HLAs,                       
+                                         HLA_names,
+                                         HLA_lengths,
+                                         refHLA_loci,
+                                         Vars,
+                                         Var_list,
+                                         Links,
+                                         exclude_allele_list,
+                                         aligners,
+                                         num_mismatch,
+                                         fastq,
+                                         read_fname,
+                                         alignment_fname,
+                                         threads,
+                                         enable_coverage,
+                                         best_alleles,
+                                         verbose)
+
+            for aligner_type, passed in tmp_test_passed.items():
+                if aligner_type in test_passed:
+                    test_passed[aligner_type] += passed
+                else:
+                    test_passed[aligner_type] = passed
+
+                print >> sys.stderr, "\t\tPassed so far: %d/%d (%.2f%%)" % (test_passed[aligner_type], test_i + 1, (test_passed[aligner_type] * 100.0 / (test_i + 1)))
+
+
+        for aligner_type, passed in test_passed.items():
+            print >> sys.stderr, "%s:\t%d/%d passed (%.2f%%)" % (aligner_type, passed, len(test_list), passed * 100.0 / len(test_list))
+    
+    else: # With real reads or BAMs
+        print >> sys.stderr, "\t", ' '.join(hla_list)
+        fastq = True
+        HLA_typing(ex_path,
+                   simulation,
+                   reference_type,
+                   hla_list,
+                   partial,
+                   refHLAs,
+                   HLAs,                       
+                   HLA_names,
+                   HLA_lengths,
+                   refHLA_loci,
+                   Vars,
+                   Var_list,
+                   Links,
+                   exclude_allele_list,
+                   aligners,
+                   num_mismatch,
+                   fastq,
+                   read_fname,
+                   alignment_fname,
+                   threads,
+                   enable_coverage,
+                   best_alleles,
+                   verbose)
+
+        
+"""
+"""
+if __name__ == '__main__':
+    parser = ArgumentParser(
+        description='genotyping')
+    parser.add_argument("--base",
+                        dest="base_fname",
+                        type=str,
+                        default="",
+                        help="base filename for backbone HLA sequence, HLA variants, and HLA linking info")
+    parser.add_argument("--default-list",
+                        dest = "default_allele_list",
+                        type=str,
+                        default="",
+                        help="A comma-separated list of HLA alleles to be tested. Alleles are retrieved from default backbone data (all alleles included in backbone).")
+    parser.add_argument("--reference-type",
+                        dest="reference_type",
+                        type=str,
+                        default="gene",
+                        help="Reference type: gene, chromosome, and genome (default: gene)")
+    parser.add_argument("--hla-list",
+                        dest="hla_list",
+                        type=str,
+                        default="A,B,C,DQA1,DQB1,DRB1",
+                        help="A comma-separated list of HLA genes (default: A,B,C,DQA1,DQB1,DRB1)")
+    parser.add_argument('--partial',
+                        dest='partial',
+                        action='store_true',
+                        help='Include partial alleles (e.g. A_nuc.fasta)')
+    parser.add_argument("--aligner-list",
+                        dest="aligners",
+                        type=str,
+                        default="hisat2.graph,hisat2.linear,bowtie2.linear",
+                        help="A comma-separated list of aligners (default: hisat2.graph,hisat2.linear,bowtie2.linear)")
+    parser.add_argument("--reads",
+                        dest="read_fname",
+                        type=str,
+                        default="",
+                        help="Fastq read file name")
+    parser.add_argument("--alignment",
+                        dest="alignment_fname",
+                        type=str,
+                        default="",
+                        help="BAM file name")
+    parser.add_argument("-p", "--threads",
+                        dest="threads",
+                        type=int,
+                        default=1,
+                        help="Number of threads")
+    parser.add_argument("--simulate-interval",
+                        dest="simulate_interval",
+                        type=int,
+                        default=1,
+                        help="Reads simulated at every these base pairs (default: 1)")
+    parser.add_argument("--coverage",
+                        dest="coverage",
+                        action='store_true',
+                        help="Experimental purpose (assign reads based on coverage)")
+    parser.add_argument("--best-alleles",
+                        dest="best_alleles",
+                        action='store_true',
+                        help="")
+    parser.add_argument("--exclude-allele-list",
+                        dest="exclude_allele_list",
+                        type=str,
+                        default="",
+                        help="A comma-separated list of alleles to be excluded. Enter a number N to randomly select N alleles for exclusion and N non-excluded alleles for testing (2N tested in total).")
+    parser.add_argument("--num-mismatch",
+                        dest="num_mismatch",
+                        type=int,
+                        default=0,
+                        help="Maximum number of mismatches per read alignment to be considered (default: 0)")
+    parser.add_argument('-v', '--verbose',
+                        dest='verbose',
+                        action='store_true',
+                        help='also print some statistics to stderr')
+    parser.add_argument("--debug",
+                        dest="debug",
+                        type=str,
+                        default="",
+                        help="e.g., test_id:10,read_id:10000,basic_test")
+    parser.add_argument("--novel_allele_detection",
+                        dest="novel_allele_detection",
+                        action='store_true',
+                        help="Change test to detection of new alleles. Report sensitivity and specificity rate at the end.")
+
+
+    args = parser.parse_args()
+    if not args.reference_type in ["gene", "chromosome", "genome"]:
+        print >> sys.stderr, "Error: --reference-type (%s) must be one of gene, chromosome, and genome." % (args.reference_type)
+        sys.exit(1)
+    args.hla_list = args.hla_list.split(',')
+    if args.aligners == "":
+        print >> sys.stderr, "Error: --aligners must be non-empty."
+        sys.exit(1)    
+    args.aligners = args.aligners.split(',')
+    for i in range(len(args.aligners)):
+        args.aligners[i] = args.aligners[i].split('.')
+    if args.read_fname:
+        args.read_fname = args.read_fname.split(',')
+    else:
+        args.read_fname = []
+    if args.alignment_fname != "" and \
+            not os.path.exists(args.alignment_fname):
+        print >> sys.stderr, "Error: %s doesn't exist." % args.alignment_fname
+        sys.exit(1)
+    
+    if len(args.default_allele_list) > 0:
+        args.default_allele_list = args.default_allele_list.split(',')
+        
+    if len(args.exclude_allele_list) > 0:
+        if args.exclude_allele_list.strip().isdigit():
+            num_alleles = int(args.exclude_allele_list)
+            
+            
+            if not os.path.exists("./Default-HLA/hla_backbone.fa"):
+                try:
+                    os.mkdir("./Default-HLA")
+                except:
+                    pass
+                
+                extract_hla_script = os.path.join(ex_path, "hisat2_extract_HLA_vars.py")
+                extract_cmd = [extract_hla_script,
+                               "--reference-type", reference_type,
+                               "--hla-list", ','.join(hla_list),
+                               "--base", "./Default-HLA/hla"]
+                if partial:
+                    extract_cmd += ["--partial"]
+                extract_cmd += ["--inter-gap", "30",
+                                "--intra-gap", "50"]
+                if verbose:
+                    print >> sys.stderr, "\tRunning:", ' '.join(extract_cmd)
+                proc = subprocess.Popen(extract_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
+                proc.communicate()
+                if not os.path.exists("./Default-HLA/hla_backbone.fa"):
+                    print >> sys.stderr, "Error: extract_HLA_vars (Default) failed!"
+                    sys.exit(1)
+       
+            HLAs_default = {}
+            #read_HLA_alleles("./Default-HLA/hla_backbone.fa",HLAs_default)
+            read_HLA_alleles("./Default-HLA/hla_sequences.fa",HLAs_default)
+            
+
+            allele_names = list(HLAs_default['A'].keys())
+            random.shuffle(allele_names)
+            args.exclude_allele_list = allele_names[0:num_alleles]
+            args.default_allele_list = allele_names[num_alleles:2*num_alleles]
+            
+            args.default_allele_list = args.default_allele_list + args.exclude_allele_list
+        else:
+            args.exclude_allele_list = args.exclude_allele_list.split(',')
+        
+    debug = {}
+    if args.debug != "":
+        for item in args.debug.split(','):
+            if ':' in item:
+                key, value = item.split(':')
+                debug[key] = value
+            else:
+                debug[item] = 1
+
+    random.seed(1)
+    genotyping(args.base_fname,
+               args.reference_type,
+               args.hla_list,
+               args.partial,
+               args.aligners,
+               args.read_fname,
+               args.alignment_fname,
+               args.threads,
+               args.simulate_interval,
+               args.coverage,
+               args.best_alleles,
+               args.exclude_allele_list,
+               args.default_allele_list,
+               args.num_mismatch,
+               args.verbose,
+               debug)
diff --git a/hisat2_test_HLA_genotyping.py b/old_hisat2_test_HLA_genotyping.py
similarity index 96%
copy from hisat2_test_HLA_genotyping.py
copy to old_hisat2_test_HLA_genotyping.py
index 3baa823..3bdd6cf 100755
--- a/hisat2_test_HLA_genotyping.py
+++ b/old_hisat2_test_HLA_genotyping.py
@@ -102,14 +102,14 @@ def test_HLA_genotyping(reference_type,
                   "hla.link"]
 
     if not check_files(HLA_fnames):
-        extract_hla_script = os.path.join(ex_path, "hisat2_extract_HLA_vars.py")
+        extract_hla_script = os.path.join(ex_path, "hisatgenotype_extract_vars.py")
         extract_cmd = [extract_hla_script,
                        "--reference-type", reference_type,
                        "--hla-list", ','.join(hla_list)]
         if partial:
             extract_cmd += ["--partial"]
-        extract_cmd += ["--gap", "30",
-                        "--split", "50"]
+        extract_cmd += ["--inter-gap", "30",
+                        "--intra-gap", "50"]
         if verbose:
             print >> sys.stderr, "\tRunning:", ' '.join(extract_cmd)
         proc = subprocess.Popen(extract_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
@@ -136,32 +136,6 @@ def test_HLA_genotyping(reference_type,
             print >> sys.stderr, "Error: indexing HLA failed!  Perhaps, you may have forgotten to build hisat2 executables?"
             sys.exit(1)
 
-    # Build HISAT2 linear indexes based on the above information
-    HLA_hisat2_linear_index_fnames = ["hla.linear.%d.ht2" % (i+1) for i in range(8)]
-    if reference_type == "gene" and not check_files(HLA_hisat2_linear_index_fnames):
-        hisat2_build = os.path.join(ex_path, "hisat2-build")
-        build_cmd = [hisat2_build,
-                     "hla_backbone.fa,hla_sequences.fa",
-                     "hla.linear"]
-        proc = subprocess.Popen(build_cmd, stdout=open("/dev/null", 'w'), stderr=open("/dev/null", 'w'))
-        proc.communicate()        
-        if not check_files(HLA_hisat2_graph_index_fnames):
-            print >> sys.stderr, "Error: indexing HLA failed!"
-            sys.exit(1)
-
-    # Build Bowtie2 indexes based on the above information
-    HLA_bowtie2_index_fnames = ["hla.%d.bt2" % (i+1) for i in range(4)]
-    HLA_bowtie2_index_fnames += ["hla.rev.%d.bt2" % (i+1) for i in range(2)]
-    if reference_type == "gene" and not check_files(HLA_bowtie2_index_fnames):
-        build_cmd = ["bowtie2-build",
-                     "hla_backbone.fa,hla_sequences.fa",
-                     "hla"]
-        proc = subprocess.Popen(build_cmd, stdout=open("/dev/null", 'w'))
-        proc.communicate()        
-        if not check_files(HLA_bowtie2_index_fnames):
-            print >> sys.stderr, "Error: indexing HLA failed!"
-            sys.exit(1)
-
     # Read partial alleles from hla.data (temporary)
     partial_alleles = set()
     for line in open("IMGTHLA/hla.dat"):
@@ -412,11 +386,13 @@ def test_HLA_genotyping(reference_type,
                     assert len(read_fname) in [1,2]
                     aligner_cmd += ["-p", str(threads)]
                     if len(read_fname) == 1:
-                        aligner_cmd += [read_fname[0]]
+                        aligner_cmd += ["-U", read_fname[0]]
                     else:
                         aligner_cmd += ["-1", "%s" % read_fname[0],
                                         "-2", "%s" % read_fname[1]]
 
+                if verbose:
+                    print >> sys.stderr, ' '.join(aligner_cmd)
                 align_proc = subprocess.Popen(aligner_cmd,
                                               stdout=subprocess.PIPE,
                                               stderr=open("/dev/null", 'w'))
@@ -425,30 +401,38 @@ def test_HLA_genotyping(reference_type,
                               "view",
                               "-bS",
                               "-"]
+                if simulation:
+                    output_fname_base = "hla_input"
+                else:
+                    output_fname_base = read_fname[0].split('/')[1]
+                    output_fname_base = output_fname_base.split('.')[0]                    
+                    
                 sambam_proc = subprocess.Popen(sambam_cmd,
                                                stdin=align_proc.stdout,
-                                               stdout=open("hla_input_unsorted.bam", 'w'),
+                                               stdout=open("%s_unsorted.bam" % output_fname_base, 'w'),
                                                stderr=open("/dev/null", 'w'))
                 sambam_proc.communicate()
                 if index_type == "graph":
                     bamsort_cmd = ["samtools",
                                    "sort",
-                                   "hla_input_unsorted.bam",
-                                   "hla_input"]
+                                   "%s_unsorted.bam" % output_fname_base,
+                                   "-o", "%s.bam" % output_fname_base]
                     bamsort_proc = subprocess.Popen(bamsort_cmd,
                                                     stderr=open("/dev/null", 'w'))
                     bamsort_proc.communicate()
 
                     bamindex_cmd = ["samtools",
                                     "index",
-                                    "hla_input.bam"]
+                                    "%s.bam" % output_fname_base]
                     bamindex_proc = subprocess.Popen(bamindex_cmd,
                                                      stderr=open("/dev/null", 'w'))
                     bamindex_proc.communicate()
 
-                    os.system("rm hla_input_unsorted.bam")            
+                    os.system("rm %s_unsorted.bam" % output_fname_base)            
                 else:
-                    os.system("mv hla_input_unsorted.bam hla_input.bam")
+                    os.system("mv %s.bam %s.bam" % (output_fname_base, output_fname_base))
+
+                alignment_fname = "%s.bam" % output_fname_base
 
             for test_HLA_names in test_HLA_list:
                 if simulation:
@@ -463,7 +447,7 @@ def test_HLA_genotyping(reference_type,
                 alignview_cmd = ["samtools",
                                  "view"]
                 if alignment_fname == "":
-                    alignview_cmd += ["hla_input.bam"]
+                    alignview_cmd += [alignment_fname]
                 else:
                     if not os.path.exists(alignment_fname + ".bai"):
                         os.system("samtools index %s" % alignment_fname)
@@ -581,7 +565,7 @@ def test_HLA_genotyping(reference_type,
                                     read_base = read_seq[read_pos + MD_len]
                                     MD_ref_base = MD[MD_str_pos]
                                     MD_str_pos += 1
-                                    assert MD_ref_base in "ACGT"
+                                    # assert MD_ref_base in "ACGT"
                                     cmp_list.append(["match", right_pos + MD_len_used, MD_len - MD_len_used])
                                     cmp_list.append(["mismatch", right_pos + MD_len, 1])
                                     MD_len_used = MD_len + 1
@@ -1110,7 +1094,8 @@ def test_HLA_genotyping(reference_type,
                         break
                 print >> sys.stderr
 
-                if len(test_HLA_names) == 2 or not simulation:
+                # daehwan - for debugging purposes
+                if False and (len(test_HLA_names) == 2 or not simulation):
                     HLA_prob, HLA_prob_next = {}, {}
                     for cmpt, count in HLA_cmpt.items():
                         alleles = cmpt.split('-')
@@ -1200,7 +1185,7 @@ def test_HLA_genotyping(reference_type,
                     if os.path.exists(li_hla):
                         li_hla_cmd = [li_hla,
                                       "hla",
-                                      "hla_input.bam",
+                                      alignment_fname,
                                       "-b", "%s*BACKBONE" % gene]
                         li_hla_proc = subprocess.Popen(li_hla_cmd,
                                                        stdout=subprocess.PIPE,
@@ -1229,6 +1214,8 @@ def test_HLA_genotyping(reference_type,
 
                 if simulation:
                     print >> sys.stderr, "\t\tPassed so far: %d/%d (abundance: %.2f%%)" % (test_passed[aligner_type], test_i + 1, (test_passed[aligner_type] * 100.0 / (test_i + 1)))
+            os.system("rm %s %s.bai" % (alignment_fname, alignment_fname))
+            alignment_fname = ""
 
 
     if simulation:
@@ -1302,8 +1289,8 @@ if __name__ == '__main__':
                         dest='verbose',
                         action='store_true',
                         help='also print some statistics to stderr')
-    parser.add_argument("--daehwan-debug",
-                        dest="daehwan_debug",
+    parser.add_argument("--debug",
+                        dest="debug",
                         type=str,
                         default="",
                         help="e.g., test_id:10,read_id:10000,basic_test")
@@ -1328,14 +1315,14 @@ if __name__ == '__main__':
         print >> sys.stderr, "Error: %s doesn't exist." % args.alignment_fname
         sys.exit(1)
     args.exclude_allele_list = args.exclude_allele_list.split(',')
-    daehwan_debug = {}
-    if args.daehwan_debug != "":
-        for item in args.daehwan_debug.split(','):
+    debug = {}
+    if args.debug != "":
+        for item in args.debug.split(','):
             if ':' in item:
                 key, value = item.split(':')
-                daehwan_debug[key] = value
+                debug[key] = value
             else:
-                daehwan_debug[item] = 1
+                debug[item] = 1
 
     random.seed(1)
     test_HLA_genotyping(args.reference_type,
@@ -1351,4 +1338,4 @@ if __name__ == '__main__':
                         args.exclude_allele_list,
                         args.num_mismatch,
                         args.verbose,
-                        daehwan_debug)
+                        debug)
diff --git a/opts.h b/opts.h
index e5e7e5f..afdd5a3 100644
--- a/opts.h
+++ b/opts.h
@@ -83,6 +83,7 @@ enum {
 	ARG_SCORE_MA,               // --ma
 	ARG_SCORE_MMP,              // --mp
     ARG_SCORE_SCP,              // --sp
+    ARG_NO_SOFTCLIP,            // --no-softclip
 	ARG_SCORE_NP,               // --nm
 	ARG_SCORE_RDG,              // --rdg
 	ARG_SCORE_RFG,              // --rfg
@@ -125,6 +126,7 @@ enum {
 	ARG_VERSION,                // --version
 	ARG_SEED_OFF,               // --seed-off
 	ARG_SEED_BOOST_THRESH,      // --seed-boost
+    ARG_MAX_SEEDS,
 	ARG_READ_TIMES,             // --read-times
 	ARG_EXTEND_ITERS,           // --extends
 	ARG_DP_MATE_STREAK_THRESH,  // --db-mate-streak
@@ -175,6 +177,7 @@ enum {
 #endif
     ARG_REMOVE_CHRNAME,
     ARG_ADD_CHRNAME,
+    ARG_MAX_ALTSTRIED,
 };
 
 #endif
diff --git a/pat.cpp b/pat.cpp
index e8b8e38..15ea652 100644
--- a/pat.cpp
+++ b/pat.cpp
@@ -1752,6 +1752,22 @@ void SRAPatternSource::open() {
         try {            
             // open requested accession using SRA implementation of the API
             sra_run_ = new ngs::ReadCollection(ncbi::NGS::openReadCollection(sra_accs_[sra_acc_cur_]));
+            
+#if 0
+            string run_name = sra_run_->getName();
+            cerr << " ReadGroups for " << run_name << endl;
+            
+            ngs::ReadGroupIterator it = sra_run_->getReadGroups();
+            do {
+                ngs::Statistics s = it.getStatistics();
+                cerr << "Statistics for group <" << it.getName() << ">" << endl;
+
+                // for(string p = s.nextPath(""); p != ""; p = s.nextPath(p)){
+                //    System.out.println("\t"+p+": "+s.getAsString(p));
+            } while(it.nextReadGroup());
+            exit(1);
+#endif
+            
             // compute window to iterate through
             size_t MAX_ROW = sra_run_->getReadCount();
             sra_it_ = new ngs::ReadIterator(sra_run_->getReadRange(1, MAX_ROW, ngs::Read::all));
diff --git a/ref_read.cpp b/ref_read.cpp
index 40dd454..dc2fb8a 100644
--- a/ref_read.cpp
+++ b/ref_read.cpp
@@ -319,6 +319,20 @@ fastaRefReadSizes(
 		assert(!in[i]->eof());
 #endif
 	}
+    
+    // Remove empty reference sequences
+    for(int64_t i = 0; (size_t)i < recs.size(); i++) {
+        const RefRecord& rec = recs[i];
+        if(rec.first && rec.len == 0) {
+            if(i + 1 >= recs.size() || recs[i+1].first) {
+                bothTot -= rec.len;
+                bothTot -= rec.off;
+                recs.erase(i);
+                i -= 1;
+            }
+        }
+    }
+    
 	assert_geq(bothTot, 0);
 	assert_geq(unambigTot, 0);
 	return make_pair(
diff --git a/sam.h b/sam.h
index edccde1..e926d9c 100644
--- a/sam.h
+++ b/sam.h
@@ -244,7 +244,7 @@ public:
 			namelen = 255;
 		}
 		for(size_t i = 0; i < namelen; i++) {
-			if(isspace(name[i])) {
+			if(truncQname_ && isspace(name[i])) {
 				return;
 			}
 			o.append(name[i]);
diff --git a/spliced_aligner.h b/spliced_aligner.h
index 24ac033..21ef2ff 100644
--- a/spliced_aligner.h
+++ b/spliced_aligner.h
@@ -35,18 +35,12 @@ public:
 	 */
 	SplicedAligner(
                    const GFM<index_t>& gfm,
-                   const TranscriptomePolicy& tpol,
                    bool anchorStop,
-                   size_t minIntronLen,
-                   size_t maxIntronLen,
                    bool secondary = false,
                    bool local = false,
                    uint64_t threads_rids_mindist = 0) :
     HI_Aligner<index_t, local_index_t>(gfm,
-                                       tpol,
                                        anchorStop,
-                                       minIntronLen,
-                                       maxIntronLen,
                                        secondary,
                                        local,
                                        threads_rids_mindist)
@@ -64,6 +58,9 @@ public:
     virtual
     void hybridSearch(
                       const Scoring&                     sc,
+                      const PairedEndPolicy&             pepol, // paired-end policy
+                      const TranscriptomePolicy&         tpol,
+                      const GraphPolicy&                 gpol,
                       const GFM<index_t>&                gfm,
                       const ALTDB<index_t>&              altdb,
                       const BitPairReference&            ref,
@@ -86,6 +83,9 @@ public:
     virtual
     int64_t hybridSearch_recur(
                                const Scoring&                   sc,
+                               const PairedEndPolicy&           pepol, // paired-end policy
+                               const TranscriptomePolicy&       tpol,
+                               const GraphPolicy&               gpol,
                                const GFM<index_t>&              gfm,
                                const ALTDB<index_t>&            altdb,
                                const BitPairReference&          ref,
@@ -112,6 +112,9 @@ public:
 template <typename index_t, typename local_index_t>
 void SplicedAligner<index_t, local_index_t>::hybridSearch(
                                                           const Scoring&                 sc,
+                                                          const PairedEndPolicy&         pepol, // paired-end policy
+                                                          const TranscriptomePolicy&     tpol,
+                                                          const GraphPolicy&             gpol,
                                                           const GFM<index_t>&            gfm,
                                                           const ALTDB<index_t>&          altdb,
                                                           const BitPairReference&        ref,
@@ -149,10 +152,11 @@ void SplicedAligner<index_t, local_index_t>::hybridSearch(
                          this->_minsc[rdi],
                          rnd,
                          INDEX_MAX,
-                         (index_t)this->_minIntronLen,
-                         (index_t)this->_maxIntronLen,
-                         this->_minAnchorLen,
-                         this->_minAnchorLen_noncan,
+                         (index_t)tpol.minIntronLen(),
+                         (index_t)tpol.maxIntronLen(),
+                         tpol.minAnchorLen(),
+                         tpol.minAnchorLen_noncan(),
+                         gpol.maxAltsTried(),
                          leftext,
                          rightext);
     }
@@ -181,6 +185,9 @@ void SplicedAligner<index_t, local_index_t>::hybridSearch(
         GenomeHit<index_t>& genomeHit = this->_genomeHits[hj];
         hybridSearch_recur(
                            sc,
+                           pepol,
+                           tpol,
+                           gpol,
                            gfm,
                            altdb,
                            ref,
@@ -209,6 +216,9 @@ void SplicedAligner<index_t, local_index_t>::hybridSearch(
 template <typename index_t, typename local_index_t>
 int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                                                    const Scoring&                   sc,
+                                                                   const PairedEndPolicy&           pepol, // paired-end policy
+                                                                   const TranscriptomePolicy&       tpol,
+                                                                   const GraphPolicy&               gpol,
                                                                    const GFM<index_t>&              gfm,
                                                                    const ALTDB<index_t>&            altdb,
                                                                    const BitPairReference&          ref,
@@ -290,7 +300,7 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                 if(fraglen >= minMatchLen &&
                    left >= minMatchLen &&
                    hit.trim5() == 0 &&
-                   !this->_tpol.no_spliced_alignment()) {
+                   !tpol.no_spliced_alignment()) {
                     spliceSites.clear();
                     ssdb.getLeftSpliceSites(hit.ref(), left + minMatchLen, minMatchLen, spliceSites);
                     for(size_t si = 0; si < spliceSites.size(); si++) {
@@ -342,19 +352,20 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                        this->_minsc[rdi],
                                        rnd,
                                        (index_t)this->_minK_local,
-                                       (index_t)this->_minIntronLen,
-                                       (index_t)this->_maxIntronLen,
-                                       this->_minAnchorLen,
-                                       this->_minAnchorLen_noncan,
+                                       (index_t)tpol.minIntronLen(),
+                                       (index_t)tpol.maxIntronLen(),
+                                       tpol.minAnchorLen(),
+                                       tpol.minAnchorLen_noncan(),
+                                       gpol.maxAltsTried(),
                                        leftext,
                                        rightext);
                         if(tempHit.len() <= 0)
                             continue;
                         if(!tempHit.compatibleWith(
                                                    hit,
-                                                   (index_t)this->_minIntronLen,
-                                                   (index_t)this->_maxIntronLen,
-                                                   this->_tpol.no_spliced_alignment()))
+                                                   (index_t)tpol.minIntronLen(),
+                                                   (index_t)tpol.maxIntronLen(),
+                                                   tpol.no_spliced_alignment()))
                             continue;
                         int64_t minsc = max<int64_t>(this->_minsc[rdi], best_score);
                         bool combined = tempHit.combineWith(
@@ -370,11 +381,13 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                                             minsc,
                                                             rnd,
                                                             (index_t)this->_minK_local,
-                                                            (index_t)this->_minIntronLen,
-                                                            (index_t)this->_maxIntronLen,
+                                                            (index_t)tpol.minIntronLen(),
+                                                            (index_t)tpol.maxIntronLen(),
                                                             1,
                                                             1,
-                                                            &ss);
+                                                            gpol.maxAltsTried(),
+                                                            &ss,
+                                                            tpol.no_spliced_alignment());
                         if(rdi == 0) minsc = max(minsc, sink.bestUnp1());
                         else         minsc = max(minsc, sink.bestUnp2());
                         index_t leftAnchorLen = 0, nedits = 0;
@@ -406,7 +419,7 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                     // make use of a list of known or novel splice sites to further align the read
                     if(fraglen >= minMatchLen &&
                        local_genomeHits[i].trim3() == 0 &&
-                       !this->_tpol.no_spliced_alignment()) {
+                       !tpol.no_spliced_alignment()) {
                         spliceSites.clear();
                         assert_gt(fraglen, 0);
                         ssdb.getRightSpliceSites(local_genomeHits[i].ref(), right + fraglen - minMatchLen, minMatchLen, spliceSites);
@@ -459,15 +472,16 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                            this->_minsc[rdi],
                                            rnd,
                                            (index_t)this->_minK_local,
-                                           (index_t)this->_minIntronLen,
-                                           (index_t)this->_maxIntronLen,
-                                           this->_minAnchorLen,
-                                           this->_minAnchorLen_noncan,
+                                           (index_t)tpol.minIntronLen(),
+                                           (index_t)tpol.maxIntronLen(),
+                                           tpol.minAnchorLen(),
+                                           tpol.minAnchorLen_noncan(),
+                                           gpol.maxAltsTried(),
                                            leftext,
                                            rightext);
                             if(tempHit.len() <= 0)
                                 continue;
-                            if(!canHit.compatibleWith(tempHit, (index_t)this->_minIntronLen, (index_t)this->_maxIntronLen, this->_tpol.no_spliced_alignment())) continue;
+                            if(!canHit.compatibleWith(tempHit, (index_t)tpol.minIntronLen(), (index_t)tpol.maxIntronLen(), tpol.no_spliced_alignment())) continue;
                             GenomeHit<index_t> combinedHit = canHit;
                             int64_t minsc = max<int64_t>(this->_minsc[rdi], best_score);
                             bool combined = combinedHit.combineWith(
@@ -483,11 +497,13 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                                                     minsc,
                                                                     rnd,
                                                                     (index_t)this->_minK_local,
-                                                                    (index_t)this->_minIntronLen,
-                                                                    (index_t)this->_maxIntronLen,
+                                                                    (index_t)tpol.minIntronLen(),
+                                                                    (index_t)tpol.maxIntronLen(),
                                                                     1,
                                                                     1,
-                                                                    &ss);
+                                                                    gpol.maxAltsTried(),
+                                                                    &ss,
+                                                                    tpol.no_spliced_alignment());
                             if(rdi == 0) minsc = max(minsc, sink.bestUnp1());
                             else         minsc = max(minsc, sink.bestUnp2());
                             index_t rightAnchorLen = 0, nedits = 0;
@@ -528,13 +544,13 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
 		      this->addSearched(canHit, rdi);
 		    }
                     if(!this->redundant(sink, rdi, canHit)) {
-                        this->reportHit(sc, gfm, altdb, ref, ssdb, sink, rdi, canHit);
+                        this->reportHit(sc, pepol, tpol, gpol, gfm, altdb, ref, ssdb, sink, rdi, canHit);
                         maxsc = max<int64_t>(maxsc, canHit.score());
                     }
                 }
             }
             else {
-                this->reportHit(sc, gfm, altdb, ref, ssdb, sink, rdi, hit);
+                this->reportHit(sc, pepol, tpol, gpol, gfm, altdb, ref, ssdb, sink, rdi, hit);
                 maxsc = max<int64_t>(maxsc, hit.score());
             }
             return maxsc;
@@ -547,7 +563,7 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
             hit.getLeft(fragoff, fraglen, left);
             const index_t minMatchLen = (index_t)this->_minK_local;
             // make use of a list of known or novel splice sites to further align the read
-            if(fraglen >= minMatchLen && left >= minMatchLen && !this->_tpol.no_spliced_alignment()) {
+            if(fraglen >= minMatchLen && left >= minMatchLen && !tpol.no_spliced_alignment()) {
                 spliceSites.clear();
                 ssdb.getLeftSpliceSites(hit.ref(), left + minMatchLen, minMatchLen + min<index_t>(minMatchLen, fragoff), spliceSites);
                 for(size_t si = 0; si < spliceSites.size(); si++) {
@@ -597,15 +613,16 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                    this->_minsc[rdi],
                                    rnd,
                                    (index_t)this->_minK_local,
-                                   (index_t)this->_minIntronLen,
-                                   (index_t)this->_maxIntronLen,
-                                   this->_minAnchorLen,
-                                   this->_minAnchorLen_noncan,
+                                   (index_t)tpol.minIntronLen(),
+                                   (index_t)tpol.maxIntronLen(),
+                                   tpol.minAnchorLen(),
+                                   tpol.minAnchorLen_noncan(),
+                                   gpol.maxAltsTried(),
                                    leftext,
                                    rightext);
                     if(tempHit.len() <= 0)
                         continue;
-                    if(!tempHit.compatibleWith(hit, (index_t)this->_minIntronLen, (index_t)this->_maxIntronLen, this->_tpol.no_spliced_alignment())) continue;
+                    if(!tempHit.compatibleWith(hit, (index_t)tpol.minIntronLen(), (index_t)tpol.maxIntronLen(), tpol.no_spliced_alignment())) continue;
                     int64_t minsc = this->_minsc[rdi];
                     bool combined = tempHit.combineWith(
                                                         hit,
@@ -620,11 +637,13 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                                         minsc,
                                                         rnd,
                                                         (index_t)this->_minK_local,
-                                                        (index_t)this->_minIntronLen,
-                                                        (index_t)this->_maxIntronLen,
+                                                        (index_t)tpol.minIntronLen(),
+                                                        (index_t)tpol.maxIntronLen(),
                                                         1,
                                                         1,
-                                                        &ss);
+                                                        gpol.maxAltsTried(),
+                                                        &ss,
+                                                        tpol.no_spliced_alignment());
                     if(!this->_secondary) {
                         if(rdi == 0) minsc = max(minsc, sink.bestUnp1());
                         else         minsc = max(minsc, sink.bestUnp2());
@@ -637,6 +656,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                         assert_leq(tempHit.rdoff() + tempHit.len() + tempHit.trim3(), rdlen);
                         int64_t tmp_maxsc = hybridSearch_recur(
                                                                sc,
+                                                               pepol,
+                                                               tpol,
+                                                               gpol,
                                                                gfm,
                                                                altdb,
                                                                ref,
@@ -676,10 +698,11 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                            this->_minsc[rdi],
                            rnd,
                            (index_t)this->_minK_local,
-                           (index_t)this->_minIntronLen,
-                           (index_t)this->_maxIntronLen,
-                           this->_minAnchorLen,
-                           this->_minAnchorLen_noncan,
+                           (index_t)tpol.minIntronLen(),
+                           (index_t)tpol.maxIntronLen(),
+                           tpol.minAnchorLen(),
+                           tpol.minAnchorLen_noncan(),
+                           gpol.maxAltsTried(),
                            leftext,
                            rightext,
                            1);
@@ -712,8 +735,8 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
             local_index_t node_top = (local_index_t)INDEX_MAX, node_bot = (local_index_t)INDEX_MAX;
             index_t extoff = hitoff - 1;
             if(extoff > 0) extoff -= 1;
-            if(extoff < this->_minAnchorLen) {
-                extoff = this->_minAnchorLen;
+            if(extoff < tpol.minAnchorLen()) {
+                extoff = tpol.minAnchorLen();
             }
             index_t nelt = (index_t)INDEX_MAX;
             index_t max_nelt = std::max<index_t>(5, extlen);
@@ -752,7 +775,7 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
             assert_leq(extlen, extoff + 1);
             if(nelt > 0 &&
                nelt <= max_nelt &&
-               extlen >= this->_minAnchorLen &&
+               extlen >= tpol.minAnchorLen() &&
                !no_extension) {
                 assert_leq(nelt, max_nelt);
                 coords.clear();
@@ -791,9 +814,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                  (index_t)coord.off(),
                                  (index_t)coord.joinedOff(),
                                  this->_sharedVars);
-                    if(!tempHit.adjustWithALT(*this->_rds[rdi], gfm, altdb, ref)) continue;
+                    if(!tempHit.adjustWithALT(*this->_rds[rdi], gfm, altdb, ref, gpol.maxAltsTried())) continue;
                     // check if the partial alignment is compatible with the new alignment using the local index
-                    if(!tempHit.compatibleWith(hit, (index_t)this->_minIntronLen, (index_t)this->_maxIntronLen, this->_tpol.no_spliced_alignment())) {
+                    if(!tempHit.compatibleWith(hit, (index_t)tpol.minIntronLen(), (index_t)tpol.maxIntronLen(), tpol.no_spliced_alignment())) {
                         if(count == 1) continue;
                         else break;
                     }
@@ -813,10 +836,11 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                        this->_minsc[rdi],
                                        rnd,
                                        (index_t)this->_minK_local,
-                                       (index_t)this->_minIntronLen,
-                                       (index_t)this->_maxIntronLen,
-                                       this->_minAnchorLen,
-                                       this->_minAnchorLen_noncan,
+                                       (index_t)tpol.minIntronLen(),
+                                       (index_t)tpol.maxIntronLen(),
+                                       tpol.minAnchorLen(),
+                                       tpol.minAnchorLen_noncan(),
+                                       gpol.maxAltsTried(),
                                        leftext,
                                        rightext);
                     }
@@ -835,10 +859,13 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                                         minsc,
                                                         rnd,
                                                         (index_t)this->_minK_local,
-                                                        (index_t)this->_minIntronLen,
-                                                        (index_t)this->_maxIntronLen,
-                                                        this->_minAnchorLen,
-                                                        this->_minAnchorLen_noncan);
+                                                        (index_t)tpol.minIntronLen(),
+                                                        (index_t)tpol.maxIntronLen(),
+                                                        tpol.minAnchorLen(),
+                                                        tpol.minAnchorLen_noncan(),
+                                                        gpol.maxAltsTried(),
+                                                        NULL, // splice sites
+                                                        tpol.no_spliced_alignment());
                     if(!this->_secondary) {
                         if(rdi == 0) minsc = max(minsc, sink.bestUnp1());
                         else         minsc = max(minsc, sink.bestUnp2());
@@ -850,6 +877,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                             // extend the new partial alignment recursively
                             int64_t tmp_maxsc = hybridSearch_recur(
                                                                    sc,
+                                                                   pepol,
+                                                                   tpol,
+                                                                   gpol,
                                                                    gfm,
                                                                    altdb,
                                                                    ref,
@@ -886,6 +916,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                     if(tempHit.score() >= minsc) {
                         int64_t tmp_maxsc = hybridSearch_recur(
                                                                sc,
+                                                               pepol,
+                                                               tpol,
+                                                               gpol,
                                                                gfm,
                                                                altdb,
                                                                ref,
@@ -970,8 +1003,8 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                      (index_t)coord.off(),
                                      (index_t)coord.joinedOff(),
                                      this->_sharedVars);
-                        if(!tempHit.adjustWithALT(*this->_rds[rdi], gfm, altdb, ref)) continue;
-                        if(!tempHit.compatibleWith(hit, (index_t)this->_minIntronLen, (index_t)this->_maxIntronLen, this->_tpol.no_spliced_alignment())) continue;
+                        if(!tempHit.adjustWithALT(*this->_rds[rdi], gfm, altdb, ref, gpol.maxAltsTried())) continue;
+                        if(!tempHit.compatibleWith(hit, (index_t)tpol.minIntronLen(), (index_t)tpol.maxIntronLen(), tpol.no_spliced_alignment())) continue;
                         if(uniqueStop) {
                             assert_eq(coords.size(), 1);
                             index_t leftext = (index_t)INDEX_MAX, rightext = (index_t)0;
@@ -988,10 +1021,11 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                            this->_minsc[rdi],
                                            rnd,
                                            (index_t)this->_minK_local,
-                                           (index_t)this->_minIntronLen,
-                                           (index_t)this->_maxIntronLen,
-                                           this->_minAnchorLen,
-                                           this->_minAnchorLen_noncan,
+                                           (index_t)tpol.minIntronLen(),
+                                           (index_t)tpol.maxIntronLen(),
+                                           tpol.minAnchorLen(),
+                                           tpol.minAnchorLen_noncan(),
+                                           gpol.maxAltsTried(),
                                            leftext,
                                            rightext);
                         }
@@ -1009,10 +1043,13 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                                             minsc,
                                                             rnd,
                                                             (index_t)this->_minK_local,
-                                                            (index_t)this->_minIntronLen,
-                                                            (index_t)this->_maxIntronLen,
-                                                            this->_minAnchorLen,
-                                                            this->_minAnchorLen_noncan);
+                                                            (index_t)tpol.minIntronLen(),
+                                                            (index_t)tpol.maxIntronLen(),
+                                                            tpol.minAnchorLen(),
+                                                            tpol.minAnchorLen_noncan(),
+                                                            gpol.maxAltsTried(),
+                                                            NULL, // splice sites
+                                                            tpol.no_spliced_alignment());
                         if(!this->_secondary) {
                             if(rdi == 0) minsc = max(minsc, sink.bestUnp1());
                             else         minsc = max(minsc, sink.bestUnp2());
@@ -1022,6 +1059,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                             assert_leq(tempHit.rdoff() + tempHit.len() + tempHit.trim3(), rdlen);
                             int64_t tmp_maxsc = hybridSearch_recur(
                                                                    sc,
+                                                                   pepol,
+                                                                   tpol,
+                                                                   gpol,
                                                                    gfm,
                                                                    altdb,
                                                                    ref,
@@ -1053,16 +1093,19 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                 ssdb,
                                 sc,
                                 (index_t)this->_minK_local,
-                                (index_t)this->_minIntronLen,
-                                (index_t)this->_maxIntronLen,
-                                this->_minAnchorLen,
-                                this->_minAnchorLen_noncan,
+                                (index_t)tpol.minIntronLen(),
+                                (index_t)tpol.maxIntronLen(),
+                                tpol.minAnchorLen(),
+                                tpol.minAnchorLen_noncan(),
                                 ref);
                 assert_leq(trimedHit.len() + trimedHit.trim5() + trimedHit.trim3(), rdlen);
                 int64_t tmp_score = trimedHit.score();
-                if(tmp_score > max<int64_t>(maxsc, this->_minsc[rdi])) {
+                if(tmp_score > maxsc && tmp_score >= this->_minsc[rdi]) {
                     int64_t tmp_maxsc = hybridSearch_recur(
                                                            sc,
+                                                           pepol,
+                                                           tpol,
+                                                           gpol,
                                                            gfm,
                                                            altdb,
                                                            ref,
@@ -1107,10 +1150,11 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                            this->_minsc[rdi],
                            rnd,
                            (index_t)this->_minK_local,
-                           (index_t)this->_minIntronLen,
-                           (index_t)this->_maxIntronLen,
-                           this->_minAnchorLen,
-                           this->_minAnchorLen_noncan,
+                           (index_t)tpol.minIntronLen(),
+                           (index_t)tpol.maxIntronLen(),
+                           tpol.minAnchorLen(),
+                           tpol.minAnchorLen_noncan(),
+                           gpol.maxAltsTried(),
                            leftext,
                            rightext,
                            num_mismatch_allowed);
@@ -1123,6 +1167,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                 assert_leq(tempHit.rdoff() + tempHit.len() + tempHit.trim3(), rdlen);
                 int64_t tmp_maxsc = hybridSearch_recur(
                                                        sc,
+                                                       pepol,
+                                                       tpol,
+                                                       gpol,
                                                        gfm,
                                                        altdb,
                                                        ref,
@@ -1151,6 +1198,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                     assert_leq(hitoff + hitlen, rdlen);
                     int64_t tmp_maxsc = hybridSearch_recur(
                                                            sc,
+                                                           pepol,
+                                                           tpol,
+                                                           gpol,
                                                            gfm,
                                                            altdb,
                                                            ref,
@@ -1179,7 +1229,7 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
             hit.getRight(fragoff, fraglen, right);
             const index_t minMatchLen = (index_t)this->_minK_local;
             // make use of a list of known or novel splice sites to further align the read
-            if(fraglen >= minMatchLen && !this->_tpol.no_spliced_alignment()) {
+            if(fraglen >= minMatchLen && !tpol.no_spliced_alignment()) {
                 spliceSites.clear();
                 assert_gt(fraglen, 0);
                 assert_leq(fragoff + fraglen, rdlen);
@@ -1234,15 +1284,16 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                    this->_minsc[rdi],
                                    rnd,
                                    (index_t)this->_minK_local,
-                                   (index_t)this->_minIntronLen,
-                                   (index_t)this->_maxIntronLen,
-                                   this->_minAnchorLen,
-                                   this->_minAnchorLen_noncan,
+                                   (index_t)tpol.minIntronLen(),
+                                   (index_t)tpol.maxIntronLen(),
+                                   tpol.minAnchorLen(),
+                                   tpol.minAnchorLen_noncan(),
+                                   gpol.maxAltsTried(),
                                    leftext,
                                    rightext);
                     if(tempHit.len() <= 0)
                         continue;
-                    if(!hit.compatibleWith(tempHit, (index_t)this->_minIntronLen, (index_t)this->_maxIntronLen, this->_tpol.no_spliced_alignment())) continue;
+                    if(!hit.compatibleWith(tempHit, (index_t)tpol.minIntronLen(), (index_t)tpol.maxIntronLen(), tpol.no_spliced_alignment())) continue;
                     GenomeHit<index_t> combinedHit = hit;
                     int64_t minsc = this->_minsc[rdi];
                     bool combined = combinedHit.combineWith(
@@ -1258,11 +1309,13 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                                             minsc,
                                                             rnd,
                                                             (index_t)this->_minK_local,
-                                                            (index_t)this->_minIntronLen,
-                                                            (index_t)this->_maxIntronLen,
+                                                            (index_t)tpol.minIntronLen(),
+                                                            (index_t)tpol.maxIntronLen(),
                                                             1,
                                                             1,
-                                                            &ss);
+                                                            gpol.maxAltsTried(),
+                                                            &ss,
+                                                            tpol.no_spliced_alignment());
                     if(!this->_secondary) {
                         if(rdi == 0) minsc = max(minsc, sink.bestUnp1());
                         else         minsc = max(minsc, sink.bestUnp2());
@@ -1273,6 +1326,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                         assert_leq(combinedHit.trim5(), combinedHit.rdoff());
                         int64_t tmp_maxsc = hybridSearch_recur(
                                                                sc,
+                                                               pepol,
+                                                               tpol,
+                                                               gpol,
                                                                gfm,
                                                                altdb,
                                                                ref,
@@ -1312,10 +1368,11 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                            this->_minsc[rdi],
                            rnd,
                            (index_t)this->_minK_local,
-                           (index_t)this->_minIntronLen,
-                           (index_t)this->_maxIntronLen,
-                           this->_minAnchorLen,
-                           this->_minAnchorLen_noncan,
+                           (index_t)tpol.minIntronLen(),
+                           (index_t)tpol.maxIntronLen(),
+                           tpol.minAnchorLen(),
+                           tpol.minAnchorLen_noncan(),
+                           gpol.maxAltsTried(),
                            leftext,
                            rightext,
                            1);
@@ -1395,7 +1452,7 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
             assert_leq(extoff, rdlen);
             if(nelt > 0 &&
                nelt <= max_nelt &&
-               extlen >= this->_minAnchorLen &&
+               extlen >= tpol.minAnchorLen() &&
                !no_extension) {
                 assert_leq(nelt, max_nelt);
                 coords.clear();
@@ -1434,9 +1491,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                  (index_t)coord.off(),
                                  (index_t)coord.joinedOff(),
                                  this->_sharedVars);
-                    if(!tempHit.adjustWithALT(*this->_rds[rdi], gfm, altdb, ref)) continue;
+                    if(!tempHit.adjustWithALT(*this->_rds[rdi], gfm, altdb, ref, gpol.maxAltsTried())) continue;
                     // check if the partial alignment is compatible with the new alignment using the local index
-                    if(!hit.compatibleWith(tempHit, (index_t)this->_minIntronLen, (index_t)this->_maxIntronLen, this->_tpol.no_spliced_alignment())) {
+                    if(!hit.compatibleWith(tempHit, (index_t)tpol.minIntronLen(), (index_t)tpol.maxIntronLen(), tpol.no_spliced_alignment())) {
                         if(count == 1) continue;
                         else break;
                     }
@@ -1454,10 +1511,11 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                    this->_minsc[rdi],
                                    rnd,
                                    (index_t)this->_minK_local,
-                                   (index_t)this->_minIntronLen,
-                                   (index_t)this->_maxIntronLen,
-                                   this->_minAnchorLen,
-                                   this->_minAnchorLen_noncan,
+                                   (index_t)tpol.minIntronLen(),
+                                   (index_t)tpol.maxIntronLen(),
+                                   tpol.minAnchorLen(),
+                                   tpol.minAnchorLen_noncan(),
+                                   gpol.maxAltsTried(),
                                    leftext,
                                    rightext);
                     GenomeHit<index_t> combinedHit = hit;
@@ -1476,10 +1534,13 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                                             minsc,
                                                             rnd,
                                                             (index_t)this->_minK_local,
-                                                            (index_t)this->_minIntronLen,
-                                                            (index_t)this->_maxIntronLen,
-                                                            this->_minAnchorLen,
-                                                            this->_minAnchorLen_noncan);
+                                                            (index_t)tpol.minIntronLen(),
+                                                            (index_t)tpol.maxIntronLen(),
+                                                            tpol.minAnchorLen(),
+                                                            tpol.minAnchorLen_noncan(),
+                                                            gpol.maxAltsTried(),
+                                                            NULL, // splice sites
+                                                            tpol.no_spliced_alignment());
                     if(!this->_secondary) {
                         if(rdi == 0) minsc = max(minsc, sink.bestUnp1());
                         else         minsc = max(minsc, sink.bestUnp2());
@@ -1490,6 +1551,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                             // extend the new partial alignment recursively
                             int64_t tmp_maxsc = hybridSearch_recur(
                                                                    sc,
+                                                                   pepol,
+                                                                   tpol,
+                                                                   gpol,
                                                                    gfm,
                                                                    altdb,
                                                                    ref,
@@ -1527,6 +1591,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                     if(tempHit.score() >= minsc) {
                         int64_t tmp_maxsc = hybridSearch_recur(
                                                                sc,
+                                                               pepol,
+                                                               tpol,
+                                                               gpol,
                                                                gfm,
                                                                altdb,
                                                                ref,
@@ -1611,8 +1678,8 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                      (index_t)coord.off(),
                                      (index_t)coord.joinedOff(),
                                      this->_sharedVars);
-                        if(!tempHit.adjustWithALT(*this->_rds[rdi], gfm, altdb, ref)) continue;
-                        if(!hit.compatibleWith(tempHit, (index_t)this->_minIntronLen, (index_t)this->_maxIntronLen, this->_tpol.no_spliced_alignment())) continue;
+                        if(!tempHit.adjustWithALT(*this->_rds[rdi], gfm, altdb, ref, gpol.maxAltsTried())) continue;
+                        if(!hit.compatibleWith(tempHit, (index_t)tpol.minIntronLen(), (index_t)tpol.maxIntronLen(), tpol.no_spliced_alignment())) continue;
                         index_t leftext = (index_t)0, rightext = (index_t)INDEX_MAX;
                         tempHit.extend(
                                        rd,
@@ -1627,10 +1694,11 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                        this->_minsc[rdi],
                                        rnd,
                                        (index_t)this->_minK_local,
-                                       (index_t)this->_minIntronLen,
-                                       (index_t)this->_maxIntronLen,
-                                       this->_minAnchorLen,
-                                       this->_minAnchorLen_noncan,
+                                       (index_t)tpol.minIntronLen(),
+                                       (index_t)tpol.maxIntronLen(),
+                                       tpol.minAnchorLen(),
+                                       tpol.minAnchorLen_noncan(),
+                                       gpol.maxAltsTried(),
                                        leftext,
                                        rightext);
                         GenomeHit<index_t> combinedHit = hit;
@@ -1648,10 +1716,13 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                                                 minsc,
                                                                 rnd,
                                                                 (index_t)this->_minK_local,
-                                                                (index_t)this->_minIntronLen,
-                                                                (index_t)this->_maxIntronLen,
-                                                                this->_minAnchorLen,
-                                                                this->_minAnchorLen_noncan);
+                                                                (index_t)tpol.minIntronLen(),
+                                                                (index_t)tpol.maxIntronLen(),
+                                                                tpol.minAnchorLen(),
+                                                                tpol.minAnchorLen_noncan(),
+                                                                gpol.maxAltsTried(),
+                                                                NULL, // splice sites
+                                                                tpol.no_spliced_alignment());
                         if(!this->_secondary) {
                             if(rdi == 0) minsc = max(minsc, sink.bestUnp1());
                             else         minsc = max(minsc, sink.bestUnp2());
@@ -1660,6 +1731,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                             assert_leq(combinedHit.trim5(), combinedHit.rdoff());
                             int64_t tmp_maxsc = hybridSearch_recur(
                                                                    sc,
+                                                                   pepol,
+                                                                   tpol,
+                                                                   gpol,
                                                                    gfm,
                                                                    altdb,
                                                                    ref,
@@ -1693,17 +1767,20 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                                 ssdb,
                                 sc,
                                 (index_t)this->_minK_local,
-                                (index_t)this->_minIntronLen,
-                                (index_t)this->_maxIntronLen,
-                                this->_minAnchorLen,
-                                this->_minAnchorLen_noncan,
+                                (index_t)tpol.minIntronLen(),
+                                (index_t)tpol.maxIntronLen(),
+                                tpol.minAnchorLen(),
+                                tpol.minAnchorLen_noncan(),
                                 ref);
                 assert_leq(trimedHit.trim5(), trimedHit.rdoff());
                 assert_leq(trimedHit.len() + trimedHit.trim5() + trimedHit.trim3(), rdlen);
                 int64_t tmp_score = trimedHit.score();
-                if(tmp_score > max<int64_t>(maxsc, this->_minsc[rdi])) {
+                if(tmp_score > maxsc && tmp_score >= this->_minsc[rdi]) {
                     int64_t tmp_maxsc = hybridSearch_recur(
                                                            sc,
+                                                           pepol,
+                                                           tpol,
+                                                           gpol,
                                                            gfm,
                                                            altdb,
                                                            ref,
@@ -1748,10 +1825,11 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                            this->_minsc[rdi],
                            rnd,
                            (index_t)this->_minK_local,
-                           (index_t)this->_minIntronLen,
-                           (index_t)this->_maxIntronLen,
-                           this->_minAnchorLen,
-                           this->_minAnchorLen_noncan,
+                           (index_t)tpol.minIntronLen(),
+                           (index_t)tpol.maxIntronLen(),
+                           tpol.minAnchorLen(),
+                           tpol.minAnchorLen_noncan(),
+                           gpol.maxAltsTried(),
                            leftext,
                            rightext,
                            num_mismatch_allowed);
@@ -1764,6 +1842,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                 assert_leq(tempHit.trim5(), tempHit.rdoff());
                 int64_t tmp_maxsc = hybridSearch_recur(
                                                        sc,
+                                                       pepol,
+                                                       tpol,
+                                                       gpol,
                                                        gfm,
                                                        altdb,
                                                        ref,
@@ -1791,6 +1872,9 @@ int64_t SplicedAligner<index_t, local_index_t>::hybridSearch_recur(
                     assert_eq(hit.trim3(), 0);
                     int64_t tmp_maxsc = hybridSearch_recur(
                                                            sc,
+                                                           pepol,
+                                                           tpol,
+                                                           gpol,
                                                            gfm,
                                                            altdb,
                                                            ref,
diff --git a/tp.h b/tp.h
index f050f58..0219ba2 100644
--- a/tp.h
+++ b/tp.h
@@ -32,50 +32,77 @@
  * Encapsulates alignment policy for transcriptome
  */
 class TranscriptomePolicy {
-
+    
 public:
-
-	TranscriptomePolicy() { reset(); }
-	
-	TranscriptomePolicy(
+    
+    TranscriptomePolicy() { reset(); }
+    
+    TranscriptomePolicy(
+                        size_t minIntronLen,
+                        size_t maxIntronLen,
+                        uint32_t minAnchorLen = 7,
+                        uint32_t minAnchorLen_noncan = 14,
                         bool no_spliced_alignment = false,
                         bool transcriptome_mapping_only = false,
                         bool transcriptome_assembly = false,
                         bool xs_only = false)
-	{
-		init(
+    {
+        init(minIntronLen,
+             maxIntronLen,
+             minAnchorLen,
+             minAnchorLen_noncan,
              no_spliced_alignment,
              transcriptome_mapping_only,
              transcriptome_assembly,
              xs_only);
-	}
-
-	/** 
-	 */
-	void reset() {
-		init(false, false, false);
-	}
-
-	/**
-	 */
-	void init(
+    }
+    
+    /**
+     */
+    void reset() {
+        init(false, false, false);
+    }
+    
+    /**
+     */
+    void init(
+              size_t minIntronLen,
+              size_t maxIntronLen,
+              uint32_t minAnchorLen = 7,
+              uint32_t minAnchorLen_noncan = 14,
               bool no_spliced_alignment = false,
               bool transcriptome_mapping_only = false,
               bool transcriptome_assembly = false,
               bool xs_only = false)
-	{
+    {
+        minIntronLen_ = minIntronLen;
+        maxIntronLen_ = maxIntronLen;
+        minAnchorLen_ = minAnchorLen;
+        minAnchorLen_noncan_ = minAnchorLen_noncan;
         no_spliced_alignment_ = no_spliced_alignment;
         transcriptome_mapping_only_ = transcriptome_mapping_only;
         transcriptome_assembly_ = transcriptome_assembly;
         xs_only_ = xs_only;
-	}
+    }
     
+    size_t minIntronLen() const { return minIntronLen_; }
+    size_t maxIntronLen() const { return maxIntronLen_; }
+    uint32_t minAnchorLen() const { return minAnchorLen_; }
+    uint32_t minAnchorLen_noncan() const { return minAnchorLen_noncan_; }
     bool no_spliced_alignment() const { return no_spliced_alignment_; }
     bool transcriptome_mapping_only() const { return transcriptome_mapping_only_; }
     bool transcriptome_assembly() const { return transcriptome_assembly_; }
     bool xs_only() const { return xs_only_; }
-
+    
 private:
+    size_t   minIntronLen_;
+    size_t   maxIntronLen_;
+    
+    // Minimum anchor length required for canonical splice sites
+    uint32_t minAnchorLen_;
+    // Minimum anchor length required for non-canonical splice sites
+    uint32_t minAnchorLen_noncan_;
+    
     bool no_spliced_alignment_;
     bool transcriptome_mapping_only_;
     bool transcriptome_assembly_;

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/hisat2.git



More information about the debian-med-commit mailing list