[med-svn] [Git][med-team/igdiscover][master] 4 commits: New upstream version 0.12.2

Steffen Möller gitlab at salsa.debian.org
Sat Feb 29 19:19:02 GMT 2020



Steffen Möller pushed to branch master at Debian Med / igdiscover


Commits:
e643e65d by Steffen Moeller at 2020-02-29T19:41:33+01:00
New upstream version 0.12.2
- - - - -
093d29a9 by Steffen Moeller at 2020-02-29T19:41:34+01:00
Update upstream source from tag 'upstream/0.12.2'

Update to upstream version '0.12.2'
with Debian dir 0ffb7863b4d1188a90edc8aa97138f4126e759a7
- - - - -
93835c00 by Steffen Moeller at 2020-02-29T20:08:47+01:00
Interim round of adjustments

- - - - -
f69cd675 by Steffen Moeller at 2020-02-29T20:18:15+01:00
Functional, except for missing igblast binary

- - - - -


30 changed files:

- .travis.yml
- CHANGES.rst
- debian/changelog
- debian/control
- debian/patches/adjusting_tests_for_python3.patch
- + debian/patches/avoidTroubleWithPkgResources.patch
- debian/patches/series
- − debian/patches/versioneerversion.patch
- debian/rules
- doc/conf.py
- doc/guide.rst
- doc/installation.rst
- environment.lock.yml
- environment.yml
- src/igdiscover/Snakefile
- src/igdiscover/__main__.py
- src/igdiscover/cli/__init__.py
- src/igdiscover/cli/clonoquery.py
- src/igdiscover/cli/clonotypes.py
- src/igdiscover/cli/config.py
- src/igdiscover/cli/init.py
- src/igdiscover/cli/run.py
- src/igdiscover/readlenhistogram.py
- − tests/data/H1/README.md
- − tests/data/H1/V.fasta
- − tests/data/H1/candidates.tab.gz
- − tests/data/H1/expected.tab
- − tests/data/H1/test.sh
- − tests/run.sh
- tests/test_commands.py


Changes:

=====================================
.travis.yml
=====================================
@@ -1,11 +1,11 @@
 language: python
+
+os: linux
+
 cache:
   directories:
     - $HOME/.cache/igdiscover
 
-python:
-  - "3.6"
-
 before_install:
   - wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
   - bash miniconda.sh -b -p $HOME/miniconda
@@ -16,6 +16,7 @@ before_install:
   - conda info -a
   - wget https://bitbucket.org/igdiscover/testdata/downloads/igdiscover-testdata-0.6.tar.gz
   - tar xvf igdiscover-testdata-0.6.tar.gz
+  - ln -s igdiscover-testdata testdata
   - "echo 'use_cache: true' > $HOME/.config/igdiscover.conf"
 
 install:
@@ -24,4 +25,23 @@ install:
   - source activate testenv
   - pip install .
 
-script: tests/run.sh
+script: pytest
+
+env:
+  global:
+    - TWINE_USERNAME=__token__
+
+jobs:
+  include:
+    - python: "3.7"
+
+    - stage: deploy
+      python: "3.7"
+      install: python3 -m pip install twine
+      if: tag IS present
+      script:
+        - |
+          python3 setup.py sdist
+          python3 -m pip wheel -w dist/ .
+          ls -l dist/
+          python3 -m twine upload dist/igdiscover-*


=====================================
CHANGES.rst
=====================================
@@ -11,7 +11,7 @@ v0.12 (2020-01-20)
   reason for filters that compare candidates to each other. Now each filter
   criterion can be distinguised.
 * The somewhat vague “too similar sequence” germline filter criterion
-  incorrectly removed some candidates that have a mutation close to the 5' end.
+  incorrectly removed some candidates that have a mutation close to the 3’ end.
   This was replaced with a simpler filter that only ensures that there are no
   two candidates with the same sequence.
 * Use IgBLAST 1.10


=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+igdiscover (0.12.2-1) UNRELEASED; urgency=medium
+
+  * New upstream version
+  BLOCKER: Still needs igblast to test
+
+ -- Steffen Moeller <moeller at debian.org>  Sun, 26 Jan 2020 01:47:18 +0100
+
 igdiscover (0.12-1) UNRELEASED; urgency=medium
 
   * New upstream version


=====================================
debian/control
=====================================
@@ -15,6 +15,7 @@ Build-Depends: debhelper-compat (= 12),
                python3-cutadapt,
                python3-scipy (>=0.16.1),
                python3-xopen (>=0.3.2),
+               python3-pkg-resources,
                python3-ruamel.yaml,
                python3-xopen,
                python3-tinyalign,


=====================================
debian/patches/adjusting_tests_for_python3.patch
=====================================
@@ -1,21 +1,8 @@
-Index: igdiscover/tests/run.sh
-===================================================================
---- igdiscover.orig/tests/run.sh
-+++ igdiscover/tests/run.sh
-@@ -4,7 +4,7 @@ set -euo pipefail
- set -x
- unset DISPLAY
- 
--pytest
-+pytest-3
- 
- rm -rf testrun
- mkdir testrun
 Index: igdiscover/src/igdiscover/__main__.py
 ===================================================================
 --- igdiscover.orig/src/igdiscover/__main__.py
 +++ igdiscover/src/igdiscover/__main__.py
-@@ -59,7 +59,7 @@ def main(arguments=None):
+@@ -60,7 +60,7 @@ def main(arguments=None):
      parser = HelpfulArgumentParser(description=__doc__, prog='igdiscover')
      parser.add_argument('--profile', default=False, action='store_true',
          help='Save profiling information to igdiscover.prof')


=====================================
debian/patches/avoidTroubleWithPkgResources.patch
=====================================
@@ -0,0 +1,23 @@
+Index: igdiscover/doc/conf.py
+===================================================================
+--- igdiscover.orig/doc/conf.py
++++ igdiscover/doc/conf.py
+@@ -42,16 +42,8 @@ copyright = u'2015-2017, ' + authors
+ #
+ 
+ from pkg_resources import get_distribution
+-release = get_distribution('igdiscover').version
+-# Read The Docs modifies the conf.py script and we therefore get
+-# version numbers like 0.12+0.g27d0d31
+-if os.environ.get('READTHEDOCS') == 'True':
+-    version = '.'.join(release.split('.')[:2])
+-else:
+-    version = release
+-
+-# The full version, including alpha/beta/rc tags.
+-release = version
++release = "0.12"
++version = "0.12"
+ 
+ suppress_warnings = ['image.nonlocal_uri']
+ 


=====================================
debian/patches/series
=====================================
@@ -1,3 +1,3 @@
 dontusetagsforversion.patch
-versioneerversion.patch
 adjusting_tests_for_python3.patch
+avoidTroubleWithPkgResources.patch


=====================================
debian/patches/versioneerversion.patch deleted
=====================================
@@ -1,13 +0,0 @@
-Index: igdiscover/doc/conf.py
-===================================================================
---- igdiscover.orig/doc/conf.py
-+++ igdiscover/doc/conf.py
-@@ -47,7 +47,7 @@ copyright = u'2015-2017, ' + authors
- # will not work), so the following is what we need to do.
- import subprocess
- version = subprocess.check_output(
--	[sys.executable, '-c', 'import versioneer; print(versioneer.get_version())'],
-+	[sys.executable, '-c', 'import versioneer; print(versioneer.sys.version)'],
- 	cwd='..').decode().strip()
- 
- # Read The Docs modifies the conf.py script and we therefore get 'dirty'


=====================================
debian/rules
=====================================
@@ -22,6 +22,7 @@ override_dh_auto_clean:
 	dh_auto_clean
 	rm -rf .pytest_cache debian/python3-igdiscover
 	rm -f src/igdiscover/__version__.py
+	#rm -rf src/igdiscover.egg-info/
 
 override_dh_auto_test:
 	echo "**** NOT TESTING ******"


=====================================
doc/conf.py
=====================================
@@ -1,4 +1,5 @@
 # IgDiscover documentation build configuration file
+import time
 import sys
 import os
 
@@ -40,22 +41,14 @@ copyright = u'2015-2017, ' + authors
 # built documents.
 #
 
-# When generating the documentation, we currently do not require the
-# dependencies to be installed. We therefore cannot 'import' anything from
-# igdiscover since that may fail. The versioneer module can only be imported
-# from the project root (it has extra checks such that even changing sys.path
-# will not work), so the following is what we need to do.
-import subprocess
-version = subprocess.check_output(
-	[sys.executable, '-c', 'import versioneer; print(versioneer.get_version())'],
-	cwd='..').decode().strip()
-
-# Read The Docs modifies the conf.py script and we therefore get 'dirty'
-# version numbers like 0.12+0.g27d0d31.dirty from versioneer.
-if version.endswith('.dirty') and os.environ.get('READTHEDOCS') == 'True':
-	version, _, rest = version.partition('+')
-	if not rest.startswith('0.'):
-		version = version + '+' + rest[:-6]
+from pkg_resources import get_distribution
+release = get_distribution('igdiscover').version
+# Read The Docs modifies the conf.py script and we therefore get
+# version numbers like 0.12+0.g27d0d31
+if os.environ.get('READTHEDOCS') == 'True':
+    version = '.'.join(release.split('.')[:2])
+else:
+    version = release
 
 # The full version, including alpha/beta/rc tags.
 release = version


=====================================
doc/guide.rst
=====================================
@@ -41,7 +41,7 @@ To run an analysis, proceed as follows.
    Next, choose the directory with your database.
    The directory must contain the three files ``V.fasta``, ``D.fasta``, ``J.fasta``.
    These files contain the V, D, J gene sequences, respectively.
-   Even if have have only light chains in your data, a ``D.fasta`` file needs to be provided,
+   Even if you have only light chains in your data, a ``D.fasta`` file needs to be provided;
    just use one with the heavy chain D gene sequences.
 
    If you do not want a graphical user interface, use the two command-line
@@ -57,9 +57,7 @@ To run an analysis, proceed as follows.
 2. Adjust the configuration file
 
    The previous step created a configuration file named ``myexperiment/igdiscover.yaml``, which
-   you may :ref:`need to adjust <configuration>`. In particular, the number of discovery rounds
-   is set to 3 by default, which takes a long time. Reducing this to 2 or even 1 often works just
-   as well.
+   you may :ref:`need to adjust <configuration>`.
 
 3. Run the analysis
 
@@ -67,9 +65,9 @@ To run an analysis, proceed as follows.
 
        igdiscover run
 
-   Depending on the size of your library, your computer, and the number of iterations, this will
-   now take from a few hours to a day. See the :ref:`running IgDiscover <running>` section for
-   more fine-grained control over what to run and how to resume the process if something failed.
+   Depending on the size of your library, this will usually take a couple of hours. See the
+   :ref:`running IgDiscover <running>` section for more fine-grained control over what to run and
+   how to resume the process if something failed.
 
 
 .. _obtaining-database:
@@ -84,13 +82,13 @@ For discovering new VH genes, for example, you need to get the IGHV, IGHD and IG
 As IgDiscover uses this only as a starting point, using a similar species will also work.
 
 When using an IMGT database, it is very important to change the long IMGT sequence headers to
-short headers as IgBLAST does not accept the long headers. We recommend using the program
+short headers as IgBLAST does not accept the long headers. You can use the program
 ``edit_imgt_file.pl``. If you installed IgDiscover from Conda, the script is already installed and
 you can run it by typing the name. It is also
 `available on the IgBlast FTP site <ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/>`_.
 
-Run it for all three downloaded files, and then rename files appropritely to make sure that they
-named ``V.fasta``, ``D.fasta`` and ``J.fasta``.
+Run it for all three downloaded files, and then rename files appropriately to make sure that they
+are named ``V.fasta``, ``D.fasta`` and ``J.fasta``.
 
 You always need a file with D genes even if you analyze light chains.
 
@@ -131,16 +129,15 @@ that could not be merged are discarded. Single-end reads and merged paired-end r
 to follow this structure (from 5' to 3'):
 
 * The forward primer sequence. This is optional.
-* A random barcode (molecular identifier). This is optional. Set the
-  configuration option ``barcode_length_5p`` to 0 if you don’t have random barcodes
-  or if you don’t want the program to use them.
+* A UMI (random barcode). This is optional. Set the configuration option ``barcode_length_5p`` to 0
+  if you don’t have random barcodes or if you don’t want the program to use them.
 * Optionally, a run of G nucleotides. This is an artifact of the RACE protocol (Rapid
   amplification of cDNA ends). If you have this, set ``race_g`` to ``true`` in the configuration file.
 * 5' UTR
 * Leader
 * Re-arranged V, D and J gene sequences for heavy chains; only V and J for light chains
-* An optional random barcode. Set the configuration option ``barcode_length_3p`` to the length of
-  this barcode. You can currently not have both a 5' and a 3' barcode.
+* An optional UMI (random barcode). Set the configuration option ``barcode_length_3p`` to the
+  length of this UMI. You can currently not have both a 5' and a 3' UMI.
 * The reverse primer. This is optional.
 
 We use IgBLAST to detect the location of the V, D, J genes through the
@@ -180,19 +177,27 @@ A few rules that may be good to know are the following ones:
 
 To find out what the configuration options achieve, see the explanations in the configuration file itself.
 
-The main parameters parameters that may require adjusting are the following.
+The main parameters that may require adjusting are the following.
 
 The ``iterations`` option sets the number of rounds of V gene discovery
-that will be performed. By default, three iterations are run. Even with a very restricted
-starting V database (for example with only a single V gene sequence),
-this is usually sufficient to identify most novel germline sequences.
+that will be performed. By default, one iteration is run. In each
+iteration, all the sequences will be mapped with IgBLAST, which is the
+most time-consuming part of running the pipeline. Thus, when you go from 1 to 2
+iterations, you almost double the runtime requirements.
 
-When the starting database is more complete, for example, when analyzing
-a human IgM library with the current IMGT heavy chain database, a single
-iteration may be sufficient to produce an individualized database.
+In previous IgDiscover versions, more iterations than one were necessary,
+but we have improved sensitivity since then, so you should not need to increase
+this.
 
-If you do not want to discover any new genes and only want to produce an
-expression profile, for example, then use ``iterations: 0``.
+Especially for nearly complete starting databases, for example when
+analyzing a human IgM library with the current IMGT heavy chain database,
+a single iteration is totally sufficient to produce an individualized database.
+
+If you start with a very small V database (for example with only a single
+V gene sequence), you may get better results when you increase this to 2..
+
+If you do not want to discover any new genes, then use ``iterations: 0``..
+This may be useful to only produce an expression profile, for example.
 
 The ``ignore_j`` option should be set to ``true`` when producing a V gene
 database for a species where J sequences are unknown::
@@ -208,12 +213,27 @@ if you do not specify any primer sequences.
 Pregermline and germline filter criteria
 ----------------------------------------
 
-This provides IgDiscover with stringency requirements for V gene discovery
-that enable the program to filter out false positives. Usually the ”pregermline
-filter” can be used in the default mode since all these sequences will be
-subsequently passed to the higher stringency ”germline filter” where the
-criteria are set to maximize stringency. Here is how it looks in the configuration
-file::
+IgDiscover V gene discovery works in two stages: The program first generates
+a list of *candidate* V gene sequences. This list includes many false
+positives. In the subsequent *germline filtering* step, the list is
+therefore trimmed rigorously in order to produce the final list of germline
+sequences.
+
+The stringency requirements for the germline filter can be set in the
+configuration file in the `germline_filter` and `pregermline_filter`
+sections.
+
+The `pregermline_filter` section is used in all but the last iteration.
+That is, it is ignored if you use the default of running only a single
+iteration.
+
+The idea behind the pregermline filter is to initially use less stringent
+riteria, which allows to grow the starting database more quickly, but at
+the risk of adding some false positives. The last iteration, in which the
+more stringent `germline_filter` settings are used, will then remove those
+remaining false positives.
+
+Here is how it looks in the configuration file::
 
    pre_germline_filter:
      unique_cdr3s: 2      # Minimum number of unique CDR3s (within exact matches)
@@ -255,6 +275,18 @@ about the :ref:`germline filters <germline-filters>`..
    The ``differences`` configuration setting was removed.
 
 
+.. _jdiscovery:
+
+IgDiscover will also try to discover which J genes are used in the input sample. J discovery
+is configured in the ``j_discovery`` section in the configuration file. It looks like this::
+
+    j_discovery:
+      allele_ratio: 0.2         # Required minimum ratio between alleles of a single gene
+      cross_mapping_ratio: 0.1  # Threshold for removal of cross-mapping artifacts.
+      propagate: true           # Use J genes discovered in iteration 1 in subsequent ones
+
+
+
 .. _running:
 
 Running IgDiscover
@@ -376,8 +408,7 @@ Final results are found in the ``final/`` subdirectory of the analysis directory
 
 final/database/(V,D,J).fasta
     These three files represent the final, individualized V/D/J database found by IgDiscover.
-    The D and J files are copies of the original starting database;
-    they are not updated by IgDiscover.
+    The D file is a copy of the original starting database; it is not updated by IgDiscover.
 
 final/dendrogram_(V,D,J).pdf
     These three PDF files contain dendrograms of the V, D and J sequences in the individualized
@@ -441,6 +472,9 @@ iteration-xx/new_V_germline.fasta, iteration-xx/new_V_pregermline.fasta
 iteration-xx/annotated_V_germline.tab, iteration-xx/annotated_V_pregermline.tab
     A version of the ``candidates.tab`` file that is annotated with extra columns that describe why a candidate was filtered out. See :ref:`the description of this file <annotated_v_tab>`.
 
+iteration-xx/new_J.tab, iteration-xx/new_J.fasta
+    The discovered list of J genes for this iteration.
+
 
 Other files
 -----------
@@ -800,6 +834,9 @@ rename
 union
     Compute union of sequences in multiple FASTA files
 
+:ref:`clonotypes <clonotypes>`
+    List the clonotypes (unique V, J, CDR3 combinations) present in a sample
+
 
 The following subcommands are used internally, and listed here for completeness.
 
@@ -826,6 +863,60 @@ errorplot
     Plot histograms of differences to reference V gene
 
 
+.. _clonotypes:
+
+``igdiscover clonotypes``
+-------------------------
+
+The ``igdiscover clonotypes`` command lists the clonotypes present in a sample.
+The only required parameter is the name of a file with assigned sequences.
+Normally, this will be a ``filtered.tab.gz`` file.
+
+Two sequences are considered to be of the same clonotype if
+
+- their V and J assignments are the same
+- the length of their CDR3 is identical
+- their CDR3 sequences are similar (see below for what this means)
+
+That is, clonotypes are found by clustering the input sequences by V gene,
+J gene and CDR3 similarity (using single-linkage clustering).
+
+For each cluster, a representative row (assignment) is chosen and
+considered to be the clonotype. The output is a table with one row
+per clonotype. It is written to standard output.
+
+By default, the output table is sorted by V/D/J gene names.
+Use ``--sort`` to sort by group size (largest first).
+
+Similarity
+~~~~~~~~~~
+
+To determine whether two CDR3s are similar, the Hamming distance
+between the CDR3 nucleotide sequences (``CDR3_nt`` column) must be
+at most 1. To allow more differences, use ``--mismatches``. To
+compare amino acid sequences (``CDR3_aa``) instead, use ``--aa``.
+
+The members file
+~~~~~~~~~~~~~~~~
+
+If desired, the constituents (“members”) of each cluster can be
+output to a file using ``--members=outputfilename.tab``.
+Clusters are separated by empty lines and order the same as
+in the clonotypes table.
+
+In the members table, additional fields are added that are intended
+to describe “mutation rates”. These fields are named
+``XXX_mindiffrate``, where ``XXX`` is ``CDR3_nt``, ``CDR3_aa``, ``VDJ_nt``, and ``VDJ_aa``.
+
+Within each cluster, the row with the lowest ``V_SHM`` value
+(the least mutated V) is chosen as reference. If the ``V_SHM``
+is higher than ``--v-shm-threshold``, the ``_mindiffrate`` fields are not computed.
+
+To compute a field such as ``CDR3_nt_mindiffrate`` for a row, the edit distance between
+``CDR3_nt`` of this row and of the reference row are computed and divided by
+the length of ``CDR3_nt`` of the reference row (and multiplied by 100 to give a percentage).
+
+
 .. _germline-filters:
 
 Germline and pre-germline filtering


=====================================
doc/installation.rst
=====================================
@@ -22,7 +22,7 @@ if you cannot use Conda.
 Installing IgDiscover with Conda
 --------------------------------
 
-1. Install `Conda <https://conda.io/>`_ by following the `conda installation
+1. Install `Conda`_ by following the `conda installation
    instructions <https://conda.io/docs/user-guide/install/>`_
    as appropriate for your system. You will need to choose between a “Miniconda”
    and “Anaconda” installation. We recommend Miniconda as the download is


=====================================
environment.lock.yml
=====================================
@@ -10,26 +10,27 @@ dependencies:
   - blas=2.14=openblas
   - bzip2=1.0.8=h516909a_2
   - ca-certificates=2019.11.28=hecc5488_0
-  - certifi=2019.11.28=py36_0
-  - cffi=1.13.2=py36h8022711_0
-  - chardet=3.0.4=py36_1003
+  - certifi=2019.11.28=py37_0
+  - cffi=1.13.2=py37h8022711_0
+  - chardet=3.0.4=py37_1003
   - configargparse=0.13.0=py_1
-  - cryptography=2.8=py36h72c5cf5_1
-  - cutadapt=2.8=py36h516909a_0
+  - cryptography=2.8=py37h72c5cf5_1
+  - cutadapt=2.8=py37h516909a_0
   - cycler=0.10.0=py_2
-  - datrie=0.8=py36h516909a_0
-  - dnaio=0.4.1=py36h516909a_0
-  - docutils=0.16=py36_0
+  - datrie=0.8=py37h516909a_0
+  - dnaio=0.4.1=py37h516909a_0
+  - docutils=0.16=py37_0
   - flash=1.2.11=hed695b0_5
   - freetype=2.10.0=he983fc9_1
   - gitdb2=2.0.6=py_0
   - gitpython=3.0.5=py_0
   - icu=64.2=he1b5a44_1
-  - idna=2.8=py36_1000
+  - idna=2.8=py37_1000
   - igblast=1.10.0=h6ac72b6_1
-  - importlib_metadata=1.4.0=py36_0
-  - jsonschema=3.2.0=py36_0
-  - kiwisolver=1.1.0=py36hc9558a2_0
+  - importlib_metadata=1.4.0=py37_0
+  - jsonschema=3.2.0=py37_0
+  - kiwisolver=1.1.0=py37hc9558a2_0
+  - ld_impl_linux-64=2.33.1=h53a641e_7
   - libblas=3.8.0=14_openblas
   - libcblas=3.8.0=14_openblas
   - libffi=3.2.1=he1b5a44_1006
@@ -44,49 +45,49 @@ dependencies:
   - libpng=1.6.37=hed695b0_0
   - libstdcxx-ng=9.2.0=hdf63c60_2
   - libxml2=2.9.10=hee79883_0
-  - matplotlib-base=3.1.2=py36h250f245_1
+  - matplotlib-base=3.1.2=py37h250f245_1
   - more-itertools=8.1.0=py_0
   - muscle=3.8.1551=hc9558a2_5
   - ncurses=6.1=hf484d3e_1002
   - nomkl=3.0=0
-  - numpy=1.17.5=py36h95a1406_0
+  - numpy=1.17.5=py37h95a1406_0
   - openssl=1.1.1d=h516909a_0
-  - pandas=0.25.3=py36hb3f55d8_0
+  - pandas=0.25.3=py37hb3f55d8_0
   - patsy=0.5.1=py_0
   - pear=0.9.6=h98de208_5
   - perl=5.26.2=h516909a_1006
   - pigz=2.4=h84994c4_0
-  - pip=19.3.1=py36_0
-  - psutil=5.6.7=py36h516909a_0
-  - pycparser=2.19=py36_1
-  - pyopenssl=19.1.0=py36_0
+  - pip=19.3.1=py37_0
+  - psutil=5.6.7=py37h516909a_0
+  - pycparser=2.19=py37_1
+  - pyopenssl=19.1.0=py37_0
   - pyparsing=2.4.6=py_0
-  - pyrsistent=0.15.7=py36h516909a_0
-  - pysocks=1.7.1=py36_0
-  - python=3.6.7=h357f687_1006
+  - pyrsistent=0.15.7=py37h516909a_0
+  - pysocks=1.7.1=py37_0
+  - python=3.7.6=h357f687_2
   - python-dateutil=2.8.1=py_0
   - pytz=2019.3=py_0
-  - pyyaml=5.3=py36h516909a_0
-  - ratelimiter=1.2.0=py36_1000
+  - pyyaml=5.3=py37h516909a_0
+  - ratelimiter=1.2.0=py37_1000
   - readline=8.0=hf8c457e_0
-  - requests=2.22.0=py36_1
-  - ruamel.yaml=0.16.5=py36h516909a_1
-  - ruamel.yaml.clib=0.2.0=py36h516909a_0
-  - scipy=1.4.1=py36h921218d_0
+  - requests=2.22.0=py37_1
+  - ruamel.yaml=0.16.5=py37h516909a_1
+  - ruamel.yaml.clib=0.2.0=py37h516909a_0
+  - scipy=1.4.1=py37h921218d_0
   - seaborn=0.9.0=py_2
-  - setuptools=45.1.0=py36_0
-  - six=1.14.0=py36_0
+  - setuptools=45.1.0=py37_0
+  - six=1.14.0=py37_0
   - smmap2=2.0.5=py_0
   - snakemake-minimal=5.9.1=py_0
   - sqlite=3.30.1=hcee41ef_0
-  - statsmodels=0.10.2=py36hc1659b7_0
-  - tinyalign=0.2=py36h516909a_0
+  - statsmodels=0.10.2=py37hc1659b7_0
+  - tinyalign=0.2=py37h516909a_0
   - tk=8.6.10=hed695b0_0
-  - tornado=6.0.3=py36h516909a_0
-  - urllib3=1.25.7=py36_0
-  - wheel=0.33.6=py36_0
-  - wrapt=1.11.2=py36h516909a_0
-  - xopen=0.8.4=py36_0
+  - tornado=6.0.3=py37h516909a_0
+  - urllib3=1.25.7=py37_0
+  - wheel=0.33.6=py37_0
+  - wrapt=1.11.2=py37h516909a_0
+  - xopen=0.8.4=py37_0
   - xz=5.2.4=h14c3975_1001
   - yaml=0.2.2=h516909a_1
   - zipp=1.0.0=py_0


=====================================
environment.yml
=====================================
@@ -4,7 +4,7 @@ channels:
   - defaults
 dependencies:
   - nomkl
-  - python=3.6
+  - python=3.7
   - seaborn=0.9.*
   - snakemake-minimal
   - cutadapt>=2.5


=====================================
src/igdiscover/Snakefile
=====================================
@@ -7,7 +7,7 @@ import igdiscover
 from igdiscover.dna import reverse_complement
 from igdiscover.utils import relative_symlink
 from igdiscover.readlenhistogram import read_length_histogram
-from igdiscover.config import Config, GlobalConfig
+from igdiscover.config import Config
 
 
 try:


=====================================
src/igdiscover/__main__.py
=====================================
@@ -18,6 +18,7 @@ import warnings
 import resource
 
 import igdiscover.cli as cli_package
+from igdiscover.cli import CommandLineError
 
 from . import __version__
 
@@ -78,15 +79,24 @@ def main(arguments=None):
             show_cpustats[module.main] = False
 
     args = parser.parse_args(arguments)
-    if not hasattr(args, 'func'):
+    do_profiling = args.profile
+    del args.profile
+    subcommand = getattr(args, 'func', None)
+    del args.func
+    if not subcommand:
         parser.error('Please provide the name of a subcommand to run')
-    elif args.profile:
+    elif do_profiling:
         import cProfile as profile
-        profile.runctx('args.func(args)', globals(), locals(), filename='igdiscover.prof')
+        to_run = lambda: profile.runctx('subcommand(args)', globals(), locals(), filename='igdiscover.prof')
         logger.info('Wrote profiling data to igdiscover.prof')
     else:
-        args.func(args)
-    if sys.platform == 'linux' and show_cpustats.get(args.func, True):
+        to_run = lambda: subcommand(args)
+    try:
+        to_run()
+    except CommandLineError as e:
+        logger.error(e)
+        sys.exit(1)
+    if sys.platform == 'linux' and show_cpustats.get(subcommand, True):
         rself = resource.getrusage(resource.RUSAGE_SELF)
         rchildren = resource.getrusage(resource.RUSAGE_CHILDREN)
         memory_kb = rself.ru_maxrss + rchildren.ru_maxrss


=====================================
src/igdiscover/cli/__init__.py
=====================================
@@ -1 +1,2 @@
-# This module contains all subcommands
+class CommandLineError(Exception):
+    pass


=====================================
src/igdiscover/cli/clonoquery.py
=====================================
@@ -61,6 +61,7 @@ def collect(querytable, reftable, mismatches, cdr3_core_slice, cdr3_column):
     with all the rows that have the same result. similar_rows is a DataFrame
     whose rows are the ones matching the query.
     """
+
     # The vjlentype is a "clonotype without CDR3 sequence" (only V, J, CDR3 length)
     # Determine set of vjlentypes to query
     query_vjlentypes = defaultdict(list)
@@ -89,7 +90,7 @@ def collect(querytable, reftable, mismatches, cdr3_core_slice, cdr3_column):
         for indices, query_rows in results.items():
             if not indices:
                 for query_row in query_rows:
-                    yield ([query_row], [])
+                    yield ([query_row], reftable.head(0))
                 continue
 
             similar_group = vjlen_group.iloc[list(indices), :]
@@ -98,7 +99,7 @@ def collect(querytable, reftable, mismatches, cdr3_core_slice, cdr3_column):
     # Yield result tuples for all the queries that have not been found
     for queries in query_vjlentypes.values():
         for query_row in queries:
-            yield ([query_row], [])
+            yield ([query_row], reftable.head(0))
 
 
 def main(args):


=====================================
src/igdiscover/cli/clonotypes.py
=====================================
@@ -47,7 +47,8 @@ def add_arguments(parser):
         help='Sort by group size (largest first). Default: Sort by V/D/J gene names')
     arg('--limit', metavar='N', type=int, default=None,
         help='Print out only the first N groups')
-    arg('--v-shm-threshold', default=5, type=float, help='V SHM threshold for _mindiffrate computations')
+    arg('--v-shm-threshold', default=5, type=float,
+        help='V SHM threshold for _mindiffrate computations')
     arg('--cdr3-core', default=None,
         type=slice_arg, metavar='START:END',
         help='START:END defines the non-junction region of CDR3 '
@@ -68,24 +69,88 @@ def add_arguments(parser):
     arg('table', help='Table with parsed and filtered IgBLAST results')
 
 
-def is_similar_with_junction(s, t, mismatches, cdr3_core):
+def main(args):
+    run_clonotypes(**vars(args))
+
+
+def run_clonotypes(
+    table,
+    sort=False,
+    limit=None,
+    v_shm_threshold=5,
+    aa=False,
+    mismatches=1,
+    members=None,
+    cdr3_core=None,
+):
+    logger.info('Reading input table ...')
+    usecols = CLONOTYPE_COLUMNS
+    # TODO backwards compatibility
+    if 'FR1_aa_mut' not in pd.read_csv(table, nrows=0, sep='\t').columns:
+        usecols = [col for col in usecols if not col.endswith('_aa_mut')]
+
+    table = read_table(table, usecols=usecols)
+    table = table[usecols]
+    logger.info('Read table with %s rows', len(table))
+    table.insert(5, 'CDR3_length', table['CDR3_nt'].apply(len))
+    table = table[table['CDR3_length'] > 0]
+    table = table[table['CDR3_aa'].map(lambda s: '*' not in s)]
+    logger.info('After discarding rows with unusable CDR3, %s remain', len(table))
+    with ExitStack() as stack:
+        if members:
+            members_file = stack.enter_context(xopen(members, 'w'))
+        else:
+            members_file = None
+
+        columns = usecols[:]
+        columns.remove('barcode')
+        columns.remove('count')
+        columns.insert(0, 'count')
+        columns.insert(columns.index('CDR3_nt'), 'CDR3_length')
+        print(*columns, sep='\t')
+        print_header = True
+        n = 0
+        cdr3_column = 'CDR3_aa' if aa else 'CDR3_nt'
+        grouped = group_by_clonotype(table, mismatches, sort, cdr3_core, cdr3_column)
+        for group in islice(grouped, 0, limit):
+            group = augment_group(group, v_shm_threshold=v_shm_threshold)
+            if members_file:
+                # We get an intentional empty line between groups since
+                # to_csv() already includes a line break
+                print(group.to_csv(sep='\t', header=print_header, index=False), file=members_file)
+                print_header = False
+            rep = representative(group)
+            print(*[rep[col] for col in columns], sep='\t')
+            n += 1
+    logger.info('%d clonotypes written', n)
+
+
+def group_by_clonotype(table, mismatches, sort, cdr3_core, cdr3_column):
     """
-    Return whether strings s and t have at most the given number of mismatches
-    *and* have at least one identical junction.
+    Yield clonotype groups. Each item is a DataFrame with all the members of the
+    clonotype.
     """
-    # TODO see issue #81
-    if len(s) != len(t):
-        return False
-    if 0 < mismatches < 1:
-        delta = cdr3_core.start if cdr3_core is not None else 0
-        distance_ok = hamming_distance(s, t) <= (len(s) - delta) * mismatches
-    else:
-        distance_ok = hamming_distance(s, t) <= mismatches
-    if cdr3_core is None:
-        return distance_ok
-    return distance_ok and (
-            (s[:cdr3_core.start] == t[:cdr3_core.start]) or
-            (s[cdr3_core.stop:] == t[cdr3_core.stop:]))
+    logger.info('Computing clonotypes ...')
+    prev_v = None
+    groups = []
+    for (v_gene, j_gene, cdr3_length), vj_group in table.groupby(
+            ['V_gene', 'J_gene', 'CDR3_length']):
+        if prev_v != v_gene:
+            logger.info('Processing %s', v_gene)
+        prev_v = v_gene
+        cdr3_groups = group_by_cdr3(vj_group.copy(), mismatches=mismatches, cdr3_core=cdr3_core,
+            cdr3_column=cdr3_column)
+        if sort:
+            # When sorting by group size is requested, we need to buffer
+            # results
+            groups.extend(cdr3_groups)
+        else:
+            yield from cdr3_groups
+
+    if sort:
+        logger.info('Sorting by group size ...')
+        groups.sort(key=len, reverse=True)
+        yield from groups
 
 
 def group_by_cdr3(table, mismatches, cdr3_core, cdr3_column):
@@ -115,6 +180,26 @@ def group_by_cdr3(table, mismatches, cdr3_core, cdr3_column):
         yield group.drop('cluster_id', axis=1)
 
 
+def is_similar_with_junction(s, t, mismatches, cdr3_core):
+    """
+    Return whether strings s and t have at most the given number of mismatches
+    *and* have at least one identical junction.
+    """
+    # TODO see issue #81
+    if len(s) != len(t):
+        return False
+    if 0 < mismatches < 1:
+        delta = cdr3_core.start if cdr3_core is not None else 0
+        distance_ok = hamming_distance(s, t) <= (len(s) - delta) * mismatches
+    else:
+        distance_ok = hamming_distance(s, t) <= mismatches
+    if cdr3_core is None:
+        return distance_ok
+    return distance_ok and (
+            (s[:cdr3_core.start] == t[:cdr3_core.start]) or
+            (s[cdr3_core.stop:] == t[cdr3_core.stop:]))
+
+
 def representative(table):
     """
     Given a table with members of the same clonotype, return a representative
@@ -129,34 +214,6 @@ def representative(table):
     return result
 
 
-def group_by_clonotype(table, mismatches, sort, cdr3_core, cdr3_column):
-    """
-    Yield clonotype groups. Each item is a DataFrame with all the members of the
-    clonotype.
-    """
-    logger.info('Computing clonotypes ...')
-    prev_v = None
-    groups = []
-    for (v_gene, j_gene, cdr3_length), vj_group in table.groupby(
-            ['V_gene', 'J_gene', 'CDR3_length']):
-        if prev_v != v_gene:
-            logger.info('Processing %s', v_gene)
-        prev_v = v_gene
-        cdr3_groups = group_by_cdr3(vj_group.copy(), mismatches=mismatches, cdr3_core=cdr3_core,
-            cdr3_column=cdr3_column)
-        if sort:
-            # When sorting by group size is requested, we need to buffer
-            # results
-            groups.extend(cdr3_groups)
-        else:
-            yield from cdr3_groups
-
-    if sort:
-        logger.info('Sorting by group size ...')
-        groups.sort(key=len, reverse=True)
-        yield from groups
-
-
 def augment_group(table, v_shm_threshold=5, suffix='_mindiffrate'):
     """
     Add columns to the given table that contain percentage difference of VDJ_nt, VDJ_aa, CDR3_nt,
@@ -167,9 +224,11 @@ def augment_group(table, v_shm_threshold=5, suffix='_mindiffrate'):
     for column in columns[::-1]:
         table.insert(i, column + suffix, None)
 
+    if table.empty:
+        return table
+
     # Find row whose V is least mutated
     root = table.loc[table['V_SHM'].idxmin()]
-    import ipdb; ipdb.set_trace()
     if root['V_SHM'] > v_shm_threshold:
         return table
 
@@ -181,46 +240,3 @@ def augment_group(table, v_shm_threshold=5, suffix='_mindiffrate'):
         ]
 
     return table
-
-
-def main(args):
-    logger.info('Reading input table ...')
-    usecols = CLONOTYPE_COLUMNS
-    # TODO backwards compatibility
-    if 'FR1_aa_mut' not in pd.read_csv(args.table, nrows=0, sep='\t').columns:
-        usecols = [col for col in usecols if not col.endswith('_aa_mut')]
-
-    table = read_table(args.table, usecols=usecols)
-    table = table[usecols]
-    logger.info('Read table with %s rows', len(table))
-    table.insert(5, 'CDR3_length', table['CDR3_nt'].apply(len))
-    table = table[table['CDR3_length'] > 0]
-    table = table[table['CDR3_aa'].map(lambda s: '*' not in s)]
-    logger.info('After discarding rows with unusable CDR3, %s remain', len(table))
-    with ExitStack() as stack:
-        if args.members:
-            members_file = stack.enter_context(xopen(args.members, 'w'))
-        else:
-            members_file = None
-
-        columns = usecols[:]
-        columns.remove('barcode')
-        columns.remove('count')
-        columns.insert(0, 'count')
-        columns.insert(columns.index('CDR3_nt'), 'CDR3_length')
-        print(*columns, sep='\t')
-        print_header = True
-        n = 0
-        cdr3_column = 'CDR3_aa' if args.aa else 'CDR3_nt'
-        grouped = group_by_clonotype(table, args.mismatches, args.sort, args.cdr3_core, cdr3_column)
-        for group in islice(grouped, 0, args.limit):
-            group = augment_group(group, v_shm_threshold=args.v_shm_threshold)
-            if members_file:
-                # We get an intentional empty line between groups since
-                # to_csv() already includes a line break
-                print(group.to_csv(sep='\t', header=print_header, index=False), file=members_file)
-                print_header = False
-            rep = representative(group)
-            print(*[rep[col] for col in columns], sep='\t')
-            n += 1
-    logger.info('%d clonotypes written', n)


=====================================
src/igdiscover/cli/config.py
=====================================
@@ -26,22 +26,35 @@ def add_arguments(parser):
 
 def main(args):
     if args.set:
-        with open(args.file) as f:
-            config = ruamel.yaml.load(f, ruamel.yaml.RoundTripLoader)
-        for k, v in args.set:
-            v = ruamel.yaml.safe_load(v)
-            # config[k] = v
-            item = config
-            # allow nested keys
-            keys = k.split('.')
-            for i in keys[:-1]:
-                item = item[i]
-            item[keys[-1]] = v
-        tmpfile = args.file + '.tmp'
-        with open(tmpfile, 'w') as f:
-            print(ruamel.yaml.dump(config, Dumper=ruamel.yaml.RoundTripDumper), end='', file=f)
-        os.rename(tmpfile, args.file)
+        modify_configuration(args.set, args.file)
     else:
-        with open(args.file) as f:
-            config = ruamel.yaml.safe_load(f)
-        print(ruamel.yaml.dump(config), end='')
+        print_configuration(args.file)
+
+
+def modify_configuration(
+    settings,
+    path=Config.DEFAULT_PATH,
+):
+    with open(path) as f:
+        config = ruamel.yaml.load(f, ruamel.yaml.RoundTripLoader)
+    for k, v in settings:
+        if not isinstance(k, str) or not isinstance(v, str):
+            raise ValueError("key and value must both be strings")
+        v = ruamel.yaml.safe_load(v)
+        # config[k] = v
+        item = config
+        # allow nested keys
+        keys = k.split('.')
+        for i in keys[:-1]:
+            item = item[i]
+        item[keys[-1]] = v
+    tmpfile = path + '.tmp'
+    with open(tmpfile, 'w') as f:
+        print(ruamel.yaml.dump(config, Dumper=ruamel.yaml.RoundTripDumper), end='', file=f)
+    os.rename(tmpfile, path)
+
+
+def print_configuration(path=Config.DEFAULT_PATH):
+    with open(path) as f:
+        config = ruamel.yaml.safe_load(f)
+    print(ruamel.yaml.dump(config), end='')


=====================================
src/igdiscover/cli/init.py
=====================================
@@ -10,6 +10,9 @@ import sys
 import subprocess
 import pkg_resources
 import dnaio
+from xopen import xopen
+
+from . import CommandLineError
 from ..config import Config
 
 try:
@@ -19,13 +22,16 @@ try:
 except ImportError:
     tk = None
 
-from xopen import xopen
 
 logger = logging.getLogger(__name__)
 
 do_not_show_cpustats = 1
 
 
+class GuiCancelledError(CommandLineError):
+    pass
+
+
 def add_arguments(parser):
     parser.add_argument('--database', '--db', metavar='PATH', default=None,
         help='Directory with V.fasta, D.fasta and J.fasta files. If not given, a dialog is shown.')
@@ -169,8 +175,7 @@ def try_open(path):
         with open(path) as f:
             pass
     except OSError as e:
-        logger.error('Could not open %r: %s', path, e)
-        sys.exit(1)
+        raise CommandLineError(f'Could not open {path!r}: {e}')
 
 
 def read_and_repair_fasta(path):
@@ -210,28 +215,34 @@ def read_and_repair_fasta(path):
 
 
 def main(args):
-    if ' ' in args.directory:
-        logger.error('The name of the analysis directory must not contain spaces')
-        sys.exit(1)
+    run_init(**vars(args))
+
+
+def run_init(
+    directory,
+    database: str,
+    reads1=None,
+    single_reads=None,
+):
+    if ' ' in directory:
+        raise CommandLineError('The name of the analysis directory must not contain spaces')
 
-    if os.path.exists(args.directory):
-        logger.error('The target directory {!r} already exists.'.format(args.directory))
-        sys.exit(1)
+    if os.path.exists(directory):
+        raise CommandLineError(f'The target directory {directory!r} already exists.')
 
     # If reads files or database were not given, initialize the GUI
-    if (args.reads1 is None and args.single_reads is None) or args.database is None:
+    if (reads1 is None and single_reads is None) or database is None:
         try:
             gui = TkinterGui()
         except ImportError:  # TODO tk.TclError cannot be caught when import of tk fails
-            logger.error('GUI cannot be started. Please provide reads1 file '
+            raise CommandLineError('GUI cannot be started. Please provide reads1 file '
                 'and database directory on command line.')
-            sys.exit(1)
     else:
         gui = None
 
     # Find out whether data is paired or single
-    assert not (args.reads1 and args.single_reads)
-    if args.reads1 is None and args.single_reads is None:
+    assert not (reads1 and single_reads)
+    if reads1 is None and single_reads is None:
         paired = gui.yesno('Paired end or single-end reads',
             'Are your reads paired and need to be merged?\n\n'
             'If you answer "Yes", next select the FASTQ files '
@@ -239,64 +250,56 @@ def main(args):
             'If you answer "No", next select the FASTA or FASTQ '
             'file with single-end reads.')
         if paired is None:
-            logger.error('Cancelled')
-            sys.exit(2)
+            raise GuiCancelledError()
     else:
-        paired = bool(args.reads1)
+        paired = bool(reads1)
 
     # Assign reads1 and (if paired) also reads2
     if paired:
-        if args.reads1 is not None:
-            reads1 = args.reads1
+        if reads1 is not None:
             try_open(reads1)
         else:
             reads1 = gui.reads1_path()
             if not reads1:
-                logger.error('Cancelled')
-                sys.exit(2)
+                raise GuiCancelledError()
         reads2 = guess_paired_path(reads1)
         if reads2 is None:
-            logger.error('Could not determine second file of paired-end reads')
-            sys.exit(1)
+            raise CommandLineError('Could not determine second file of paired-end reads')
     else:
-        if args.single_reads is not None:
-            reads1 = args.single_reads
+        if single_reads is not None:
+            reads1 = single_reads
             try_open(reads1)
         else:
             reads1 = gui.single_reads_path()
             if not reads1:
-                logger.error('Cancelled')
-                sys.exit(2)
+                raise GuiCancelledError()
 
-    if args.database is not None:
-        dbpath = args.database
+    if database is not None:
+        dbpath = database
     else:
         # TODO as soon as we distribute our own database files, we can use this:
         # database_path = pkg_resources.resource_filename('igdiscover', 'databases')
         databases_path = None
         dbpath = gui.database_path(databases_path)
         if not dbpath:
-            logger.error('Cancelled')
-            sys.exit(2)
+            raise GuiCancelledError()
 
     database = dict()
     for g in ['V', 'D', 'J']:
         path = os.path.join(dbpath, g + '.fasta')
         if not os.path.exists(path):
-            logger.error(
-                'The database directory %r must contain the three files '
-                'V.fasta, D.fasta and J.fasta', dbpath)
-            logger.error(
-                'A dummy D.fasta is necessary even if analyzing light chains (see manual)')
-            sys.exit(2)
+            raise CommandLineError(
+                f'The database directory {dbpath!r} must contain the three files '
+                'V.fasta, D.fasta and J.fasta. A dummy D.fasta is necessary even '
+                'if analyzing light chains (see manual)'
+            )
         database[g] = list(read_and_repair_fasta(path))
 
     # Create the directory
     try:
-        os.mkdir(args.directory)
+        os.mkdir(directory)
     except OSError as e:
-        logger.error(e)
-        sys.exit(1)
+        raise CommandLineError(e)
 
     def create_symlink(readspath, dirname, target):
         gz = '.gz' if readspath.endswith('.gz') else ''
@@ -307,23 +310,22 @@ def main(args):
         os.symlink(src, os.path.join(dirname, target + gz))
 
     if paired:
-        create_symlink(reads1, args.directory, 'reads.1.fastq')
-        create_symlink(reads2, args.directory, 'reads.2.fastq')
+        create_symlink(reads1, directory, 'reads.1.fastq')
+        create_symlink(reads2, directory, 'reads.2.fastq')
     else:
         try:
             target = 'reads.' + file_type(reads1)
         except UnknownFileFormatError:
-            logger.error('Cannot determine whether reads file is FASTA or FASTQ')
-            sys.exit(1)
-        create_symlink(reads1, args.directory, target)
+            raise CommandLineError('Cannot determine whether reads file is FASTA or FASTQ')
+        create_symlink(reads1, directory, target)
 
     # Write the configuration file
     configuration = pkg_resources.resource_string('igdiscover', Config.DEFAULT_PATH).decode()
-    with open(os.path.join(args.directory, Config.DEFAULT_PATH), 'w') as f:
+    with open(os.path.join(directory, Config.DEFAULT_PATH), 'w') as f:
         f.write(configuration)
 
     # Create database files
-    database_dir = os.path.join(args.directory, 'database')
+    database_dir = os.path.join(directory, 'database')
     os.mkdir(database_dir)
     for gene in ['V', 'D', 'J']:
         with open(os.path.join(database_dir, gene + '.fasta'), 'w') as db_file:
@@ -334,6 +336,7 @@ def main(args):
         # Only suggest to edit the config file if at least one GUI dialog has been shown
         if gui.yesno('Directory initialized',
                 'Do you want to edit the configuration file now?'):
-            launch(os.path.join(args.directory, Config.DEFAULT_PATH))
-    logger.info('Directory %s initialized.', args.directory)
-    logger.info('Edit %s/%s, then run "cd %s && igdiscover run" to start the analysis', args.directory, Config.DEFAULT_PATH, args.directory)
+            launch(os.path.join(directory, Config.DEFAULT_PATH))
+    logger.info('Directory %s initialized.', directory)
+    logger.info('Edit %s/%s, then run "cd %s && igdiscover run" to start the analysis',
+        directory, Config.DEFAULT_PATH, directory)


=====================================
src/igdiscover/cli/run.py
=====================================
@@ -10,10 +10,11 @@ import platform
 import pkg_resources
 from snakemake import snakemake
 
+from .config import Config
+from . import CommandLineError
 from ..utils import available_cpu_count
 from .. import __version__
 
-from .config import Config
 
 logger = logging.getLogger(__name__)
 
@@ -32,6 +33,15 @@ def add_arguments(parser):
 
 
 def main(args):
+    run_snakemake(**vars(args))
+
+
+def run_snakemake(
+    dryrun=False,
+    cores=available_cpu_count(),
+    keepgoing=False,
+    targets=None,
+):
     try:
         config = Config.from_default_path()
     except FileNotFoundError as e:
@@ -46,26 +56,30 @@ def main(args):
         print('   ', k, ': ', repr(v), sep='')
     sys.stdout.flush()
 
-    # snakemake sets up its own logging and this cannot be easily changed
-    # (setting keep_logger=True crashes), so remove our own log handler
-    # for now
-    logger.root.handlers = []
+    old_root_handlers = logger.root.handlers
+    root = logging.getLogger()
+    root.handlers = []
+    file_handler = logging.FileHandler("log.txt")
+    root.addHandler(file_handler)
+
     snakefile_path = pkg_resources.resource_filename('igdiscover', 'Snakefile')
     success = snakemake(
         snakefile_path,
         snakemakepath='snakemake',  # Needed in snakemake 3.9.0
-        dryrun=args.dryrun,
-        cores=args.cores,
-        keepgoing=args.keepgoing,
+        dryrun=dryrun,
+        cores=cores,
+        keepgoing=keepgoing,
         printshellcmds=True,
-        targets=args.targets if args.targets else None,
+        targets=targets,
     )
+    logger.root.handlers = old_root_handlers
 
-    if sys.platform == 'linux' and not args.dryrun:
+    if sys.platform == 'linux' and not dryrun:
         cputime = resource.getrusage(resource.RUSAGE_SELF).ru_utime
         cputime += resource.getrusage(resource.RUSAGE_CHILDREN).ru_utime
         h = int(cputime // 3600)
         m = (cputime - h * 3600) / 60
         print('Total CPU time: {}h {:.2f}m'.format(h, m))
 
-    sys.exit(0 if success else 1)
+    if not success:
+        raise CommandLineError()


=====================================
src/igdiscover/readlenhistogram.py
=====================================
@@ -1,6 +1,7 @@
 from collections import Counter
 import dnaio
-
+import matplotlib
+matplotlib.use('Agg')
 import matplotlib.pyplot as plt
 import numpy as np
 


=====================================
tests/data/H1/README.md deleted
=====================================
@@ -1,4 +0,0 @@
-These are partial results from ERR1760498.
-
-V.fasta -- V gene sequences that actually appear in the `candidates.tab` (not
-the full V starting database).


=====================================
tests/data/H1/V.fasta deleted
=====================================
@@ -1,246 +0,0 @@
->IGHV1-18*01
-CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-18*03
-CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACATGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-18*04
-CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTACGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-2*02
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAACAGTGGTGGCACAAACTATGCACAGAAGTTTCAGGGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-2*04
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAACAGTGGTGGCACAAACTATGCACAGAAGTTTCAGGGCTGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-24*01
-CAGGTCCAGCTGGTACAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGTTTCCGGATACACCCTCACTGAATTATCCATGCACTGGGTGCGACAGGCTCCTGGAAAAGGGCTTGAGTGGATGGGAGGTTTTGATCCTGAAGATGGTGAAACAATCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCGAGGACACATCTACAGACACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCAACAGA
->IGHV1-3*01
-CAGGTCCAGCTTGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATGCTATGCATTGGGTGCGCCAGGCCCCCGGACAAAGGCTTGAGTGGATGGGATGGATCAACGCTGGCAATGGTAACACAAAATATTCACAGAAGTTCCAGGGCAGAGTCACCATTACCAGGGACACATCCGCGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAAGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV1-3*02
-CAGGTTCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATGCTATGCATTGGGTGCGCCAGGCCCCCGGACAAAGGCTTGAGTGGATGGGATGGAGCAACGCTGGCAATGGTAACACAAAATATTCACAGGAGTTCCAGGGCAGAGTCACCATTACCAGGGACACATCCGCGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACATGGCTGTGTATTACTGTGCGAGAGA
->IGHV1-45*02
-CAGATGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGACTGGGTCCTCAGTGAAGGTTTCCTGCAAGGCTTCCGGATACACCTTCACCTACCGCTACCTGCACTGGGTGCGACAGGCCCCCGGACAAGCGCTTGAGTGGATGGGATGGATCACACCTTTCAATGGTAACACCAACTACGCACAGAAATTCCAGGACAGAGTCACCATTACCAGGGACAGGTCTATGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACAGCCATGTATTACTGTGCAAGATA
->IGHV1-46*01
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCATCTGGATACACCTTCACCAGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAATAATCAACCCTAGTGGTGGTAGCACAAGCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGGACACGTCCACGAGCACAGTCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-46*02
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCATCTGGATACACCTTCAACAGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAATAATCAACCCTAGTGGTGGTAGCACAAGCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGGACACGTCCACGAGCACAGTCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-46*03
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCATCTGGATACACCTTCACCAGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAATAATCAACCCTAGTGGTGGTAGCACAAGCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGGACACGTCCACGAGCACAGTCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCTAGAGA
->IGHV1-58*01
-CAAATGCAGCTGGTGCAGTCTGGGCCTGAGGTGAAGAAGCCTGGGACCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATTCACCTTTACTAGCTCTGCTGTGCAGTGGGTGCGACAGGCTCGTGGACAACGCCTTGAGTGGATAGGATGGATCGTCGTTGGCAGTGGTAACACAAACTACGCACAGAAGTTCCAGGAAAGAGTCACCATTACCAGGGACATGTCCACAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCCGAGGACACGGCCGTGTATTACTGTGCGGCAGA
->IGHV1-69*01
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACGAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-69*06
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACAAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-69*12
-CAGGTCCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACGAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-69*13
-CAGGTCCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACGAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-69*14
-CAGGTCCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACAAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-69-2*01
-GAGGTCCAGCTGGTACAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCTACAGTGAAAATCTCCTGCAAGGTTTCTGGATACACCTTCACCGACTACTACATGCACTGGGTGCAACAGGCCCCTGGAAAAGGGCTTGAGTGGATGGGACTTGTTGATCCTGAAGATGGTGAAACAATATACGCAGAGAAGTTCCAGGGCAGAGTCACCATAACCGCGGACACGTCTACAGACACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCAACAGA
->IGHV1-8*01
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGTTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAACAGTGGTAACACAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGG
->IGHV1-8*02
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGCTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAACAGTGGTAACACAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGG
->IGHV2-26*01
-CAGGTCACCTTGAAGGAGTCTGGTCCTGTGCTGGTGAAACCCACAGAGACCCTCACGCTGACCTGCACCGTCTCTGGGTTCTCACTCAGCAATGCTAGAATGGGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACACATTTTTTCGAATGACGAAAAATCCTACAGCACATCTCTGAAGAGCAGGCTCACCATCTCCAAGGACACCTCCAAAAGCCAGGTGGTCCTTACCATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACGGATAC
->IGHV2-5*01
-CAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGAAACCCACACAGACCCTCACGCTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTGCACTCATTTATTGGAATGATGATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACACAGAC
->IGHV2-5*02
-CAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGAAACCCACACAGACCCTCACGCTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTGCACTCATTTATTGGGATGATGATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACACAGAC
->IGHV2-5*04
-CAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGAAACCCACACAGACCCTCACGCTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTGCACTCATTTATTGGAATGATGATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGGCACATATTACTGTGTAC
->IGHV2-70*01
-CAGGTCACCTTGAGGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGTGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACTCATTGATTGGGATGATGATAAATACTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
->IGHV2-70*10
-CAGGTCACCTTGAAGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGCGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGATTGCACGCATTGATTGGGATGATGATAAATACTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
->IGHV2-70*11
-CGGGTCACCTTGAGGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGTGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACGCATTGATTGGGATGATGATAAATACTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
->IGHV2-70*13
-CAGGTCACCTTGAGGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGTGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACTCATTGATTGGGATGATGATAAATACTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTATTGTGCACGGATAC
->IGHV2-70D*04
-CAGGTCACCTTGAAGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGCGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACGCATTGATTGGGATGATGATAAATTCTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
->IGHV2-70D*14
-CAGGTCACCTTGAAGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGCGTGTGAGCTGGATCCGTCAGCCCCCAGGTAAGGCCCTGGAGTGGCTTGCACGCATTGATTGGGATGATGATAAATTCTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
->IGHV3-11*01
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-11*04
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-11*06
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTTACACAAACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-13*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACGACATGCACTGGGTCCGCCAAGCTACAGGAAAAGGTCTGGAGTGGGTCTCAGCTATTGGTACTGCTGGTGACACATACTATCCAGGCTCCGTGAAGGGCCGATTCACCATCTCCAGAGAAAATGCCAAGAACTCCTTGTATCTTCAAATGAACAGCCTGAGAGCCGGGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV3-13*05
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACGACATGCACTGGGTCCGCCAAGCTACAGGAAAAGGTCTGGAGTGGGTCTCAGCTATTGGTACTGCTGGTGACCCATACTATCCAGGCTCCGTGAAGGGCCGATTCACCATCTCCAGAGAAAATGCCAAGAACTCCTTGTATCTTCAAATGAACAGCCTGAGAGCCGGGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV3-15*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-15*02
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGCCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-15*04
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTATTGAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-15*05
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGTCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-15*06
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAAACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-15*07
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGTTTCACTTTCAGTAACGCCTGGATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-20*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGTGTGGTACGGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGGCATGAGCTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGAGTGGGTCTCTGGTATTAATTGGAATGGTGGTAGCACAGGTTATGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCCTTGTATCACTGTGCGAGAGA
->IGHV3-21*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-21*02
-GAGGTGCAACTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-21*03
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACAGCTGTGTATTACTGTGCGAGAGA
->IGHV3-21*04
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-23*01
-GAGGTGCAGCTGTTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGCAGCTATGCCATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGCTATTAGTGGTAGTGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGGTTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTATATTACTGTGCGAAAGA
->IGHV3-23*04
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGCAGCTATGCCATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGCTATTAGTGGTAGTGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGGTTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTATATTACTGTGCGAAAGA
->IGHV3-23*05
-GAGGTGCAGCTGTTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGCAGCTATGCCATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGCTATTTATAGCAGTGGTAGTAGCACATACTATGCAGACTCCGTGAAGGGCCGGTTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTATATTACTGTGCGAAA
->IGHV3-30*02
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCGTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCATTTATACGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAAAGA
->IGHV3-30*03
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-30*04
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGCTATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-30*18
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAAAGA
->IGHV3-30-3*01
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGCTATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGCAATAAATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-33*01
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCGTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-33*03
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCGTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAACTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAAAGA
->IGHV3-33*06
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCGTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAAAGA
->IGHV3-43*01
-GAAGTGCAGCTGGTGGAGTCTGGGGGAGTCGTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATACCATGCACTGGGTCCGTCAAGCTCCGGGGAAGGGTCTGGAGTGGGTCTCTCTTATTAGTTGGGATGGTGGTAGCACATACTATGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACAGCAAAAACTCCCTGTATCTGCAAATGAACAGTCTGAGAACTGAGGACACCGCCTTGTATTACTGTGCAAAAGATA
->IGHV3-43D*01
-GAAGTGCAGCTGGTGGAGTCTGGGGGAGTCGTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGCCATGCACTGGGTCCGTCAAGCTCCGGGGAAGGGTCTGGAGTGGGTCTCTCTTATTAGTTGGGATGGTGGTAGCACCTACTATGCAGACTCTGTGAAGGGTCGATTCACCATCTCCAGAGACAACAGCAAAAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCTGAGGACACCGCCTTGTATTACTGTGCAAAAGATA
->IGHV3-48*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAATGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-48*02
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAATGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGACGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-48*03
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGTTATGAAATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTTTATTACTGTGCGAGAGA
->IGHV3-48*04
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-49*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACACCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGGTTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
->IGHV3-49*03
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
->IGHV3-49*04
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
->IGHV3-49*05
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
->IGHV3-53*01
-GAGGTGCAGCTGGTGGAGTCTGGAGGAGGCTTGATCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGGTTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-53*02
-GAGGTGCAGCTGGTGGAGACTGGAGGAGGCTTGATCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGGTTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-53*04
-GAGGTGCAGCTGGTGGAGTCTGGAGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGGTTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGACACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-64*02
-GAGGTGCAGCTGGTGGAGTCTGGGGAAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGCTATGCACTGGGTCCGCCAGGCTCCAGGGAAGGGACTGGAATATGTTTCAGCTATTAGTAGTAATGGGGGTAGCACATATTATGCAGACTCTGTGAAGGGCAGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGGGCAGCCTGAGAGCTGAGGACATGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-66*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCAGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-66*03
-GAGGTGCAGCTGGTGGAGTCTGGAGGAGGCTTGATCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGGTTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCTGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-7*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGTAGCTATTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTGGCCAACATAAAGCAAGATGGAAGTGAGAAATACTATGTGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-7*02
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGTAGCTATTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAAGGGCTGGAGTGGGTGGCCAACATAAAGCAAGATGGAAGTGAGAAATACTATGTGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGA
->IGHV3-7*03
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGTAGCTATTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTGGCCAACATAAAGCAAGATGGAAGTGAGAAATACTATGTGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-72*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACCACTACATGGACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTACTAGAAACAAAGCTAACAGTTACACCACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAGAACTCACTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACGGCCGTGTATTACTGTGCTAGAGA
->IGHV3-73*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGGTTCACCTTCAGTGGCTCTGCTATGCACTGGGTCCGCCAGGCTTCCGGGAAAGGGCTGGAGTGGGTTGGCCGTATTAGAAGCAAAGCTAACAGTTACGCGACAGCATATGCTGCGTCGGTGAAAGGCAGGTTCACCATCTCCAGAGATGATTCAAAGAACACGGCGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACGGCCGTGTATTACTGTACTAGACA
->IGHV3-73*02
-GAGGTGCAGCTGGTGGAGTCCGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGGTTCACCTTCAGTGGCTCTGCTATGCACTGGGTCCGCCAGGCTTCCGGGAAAGGGCTGGAGTGGGTTGGCCGTATTAGAAGCAAAGCTAACAGTTACGCGACAGCATATGCTGCGTCGGTGAAAGGCAGGTTCACCATCTCCAGAGATGATTCAAAGAACACGGCGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACGGCCGTGTATTACTGTACTAGACA
->IGHV3-74*01
-GAGGTGCAGCTGGTGGAGTCCGGGGGAGGCTTAGTTCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACTGGATGCACTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGTGTGGGTCTCACGTATTAATAGTGATGGGAGTAGCACAAGCTACGCGGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACACGCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV3-74*02
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTAGTTCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACTGGATGCACTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGTGTGGGTCTCACGTATTAATAGTGATGGGAGTAGCACAAGCTACGCGGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACACGCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCAAGA
->IGHV3-74*03
-GAGGTGCAGCTGGTGGAGTCCGGGGGAGGCTTAGTTCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACTGGATGCACTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGTGTGGGTCTCACGTATTAATAGTGATGGGAGTAGCACAACGTACGCGGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACACGCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV3-9*01
-GAAGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGCAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGCCATGCACTGGGTCCGGCAAGCTCCAGGGAAGGGCCTGGAGTGGGTCTCAGGTATTAGTTGGAATAGTGGTAGCATAGGCTATGCGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCTGAGGACACGGCCTTGTATTACTGTGCAAAAGATA
->IGHV3-9*03
-GAAGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGCAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGCCATGCACTGGGTCCGGCAAGCTCCAGGGAAGGGCCTGGAGTGGGTCTCAGGTATTAGTTGGAATAGTGGTAGCATAGGCTATGCGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCTGAGGACATGGCCTTGTATTACTGTGCAAAAGATA
->IGHV4-28*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGACACCCTGTCCCTCACCTGCGCTGTCTCTGGTTACTCCATCAGCAGTAGTAACTGGTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTACATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGTGGACACGGCCGTGTATTACTGTGCGAGAAA
->IGHV4-28*03
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGACACCCTGTCCCTCACCTGCGCTGTCTCTGGTTACTCCATCAGCAGTAGTAACTGGTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTACATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGTGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-31*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTGGTGGTTACTACTGGAGCTGGATCCGCCAGCACCCAGGGAAGGGCCTGGAGTGGATTGGGTACATCTATTACAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCTAGTTACCATATCAGTAGACACGTCTAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACTGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-31*03
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTGGTGGTTACTACTGGAGCTGGATCCGCCAGCACCCAGGGAAGGGCCTGGAGTGGATTGGGTACATCTATTACAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTTACCATATCAGTAGACACGTCTAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACTGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-34*01
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCGAGAGG
->IGHV4-34*02
-CAGGTGCAGCTACAACAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCGAGAGG
->IGHV4-34*04
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACAACAACCCGTCCCTCAAGAGTCGAGCCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCGAGAGG
->IGHV4-34*08
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGACCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCG
->IGHV4-34*10
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAATCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTACCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGATA
->IGHV4-34*11
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCGTCAGTGGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGTATATCTATTATAGTGGGAGCACCAACAACAACCCCTCCCTCAAGAGTCGAGCCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAACCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTGCTGTGCGAGAGA
->IGHV4-34*12
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCATTCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCGAGA
->IGHV4-38-2*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTCTGGTTACTCCATCAGCAGTGGTTACTACTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATCATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGA
->IGHV4-38-2*02
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTTACTCCATCAGCAGTGGTTACTACTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATCATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-39*01
-CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACTGTGCGAGACA
->IGHV4-39*02
-CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCACTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV4-39*06
-CGGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCCCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-39*07
-CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-4*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTCCGGGGACCCTGTCCCTCACCTGCGCTGTCTCTGGTGGCTCCATCAGCAGTAGTAACTGGTGGAGTTGGGTCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCTATCATAGTGGGAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACAAGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTGCTGTGCGAGAGA
->IGHV4-4*02
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGGGACCCTGTCCCTCACCTGCGCTGTCTCTGGTGGCTCCATCAGCAGTAGTAACTGGTGGAGTTGGGTCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCTATCATAGTGGGAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACAAGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-4*07
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-59*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-59*02
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCGTCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-59*03
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAATTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCG
->IGHV4-59*05
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCGCCGGGGAAGGGACTGGAGTGGATTGGGCGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACTGTGCG
->IGHV4-59*07
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGACACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGA
->IGHV4-59*08
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGACA
->IGHV4-59*10
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGGCTGGAGTGGATTGGGCGTATCTATACCAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGATA
->IGHV4-61*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCGTCAGCAGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-61*02
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-61*03
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCGTCAGCAGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCACTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-61*05
-CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACAAGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGA
->IGHV4-61*08
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCGTCAGCAGTGGTGGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV5-10-1*03
-GAAGTGCAGCTGGTGCAGTCCGGAGCAGAGGTGAAAAAGCCCGGGGAGTCTCTGAGGATCTCCTGTAAGGGTTCTGGATACAGCTTTACCAGCTACTGGATCAGCTGGGTGCGCCAGATGCCCGGGAAAGGCCTGGAGTGGATGGGGAGGATTGATCCTAGTGACTCTTATACCAACTACAGCCCGTCCTTCCAAGGCCACGTCACCATCTCAGCTGACAAGTCCATCAGCACTGCCTACCTGCAGTGGAGCAGCCTGAAGGCCTCGGACACCGCCATGTATTACTGTGCGAGA
->IGHV5-51*01
-GAGGTGCAGCTGGTGCAGTCTGGAGCAGAGGTGAAAAAGCCCGGGGAGTCTCTGAAGATCTCCTGTAAGGGTTCTGGATACAGCTTTACCAGCTACTGGATCGGCTGGGTGCGCCAGATGCCCGGGAAAGGCCTGGAGTGGATGGGGATCATCTATCCTGGTGACTCTGATACCAGATACAGCCCGTCCTTCCAAGGCCAGGTCACCATCTCAGCCGACAAGTCCATCAGCACCGCCTACCTGCAGTGGAGCAGCCTGAAGGCCTCGGACACCGCCATGTATTACTGTGCGAGACA
->IGHV5-51*03
-GAGGTGCAGCTGGTGCAGTCTGGAGCAGAGGTGAAAAAGCCGGGGGAGTCTCTGAAGATCTCCTGTAAGGGTTCTGGATACAGCTTTACCAGCTACTGGATCGGCTGGGTGCGCCAGATGCCCGGGAAAGGCCTGGAGTGGATGGGGATCATCTATCCTGGTGACTCTGATACCAGATACAGCCCGTCCTTCCAAGGCCAGGTCACCATCTCAGCCGACAAGTCCATCAGCACCGCCTACCTGCAGTGGAGCAGCCTGAAGGCCTCGGACACCGCCATGTATTACTGTGCGAGA
->IGHV6-1*01
-CAGGTACAGCTGCAGCAGTCAGGTCCAGGACTGGTGAAGCCCTCGCAGACCCTCTCACTCACCTGTGCCATCTCCGGGGACAGTGTCTCTAGCAACAGTGCTGCTTGGAACTGGATCAGGCAGTCCCCATCGAGAGGCCTTGAGTGGCTGGGAAGGACATACTACAGGTCCAAGTGGTATAATGATTATGCAGTATCTGTGAAAAGTCGAATAACCATCAACCCAGACACATCCAAGAACCAGTTCTCCCTGCAGCTGAACTCTGTGACTCCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV6-1*02
-CAGGTACAGCTGCAGCAGTCAGGTCCGGGACTGGTGAAGCCCTCGCAGACCCTCTCACTCACCTGTGCCATCTCCGGGGACAGTGTCTCTAGCAACAGTGCTGCTTGGAACTGGATCAGGCAGTCCCCATCGAGAGGCCTTGAGTGGCTGGGAAGGACATACTACAGGTCCAAGTGGTATAATGATTATGCAGTATCTGTGAAAAGTCGAATAACCATCAACCCAGACACATCCAAGAACCAGTTCTCCCTGCAGCTGAACTCTGTGACTCCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV7-4-1*01
-CAGGTGCAGCTGGTGCAATCTGGGTCTGAGTTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATGCTATGAATTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACACCAACACTGGGAACCCAACGTATGCCCAGGGCTTCACAGGACGGTTTGTCTTCTCCTTGGACACCTCTGTCAGCACGGCATATCTGCAGATCTGCAGCCTAAAGGCTGAGGACACTGCCGTGTATTACTGTGCGAGA


=====================================
tests/data/H1/candidates.tab.gz deleted
=====================================
Binary files a/tests/data/H1/candidates.tab.gz and /dev/null differ


=====================================
tests/data/H1/expected.tab deleted
=====================================
@@ -1,57 +0,0 @@
-name	source	chain	cluster	cluster_size	Js	CDR3s	exact	Js_exact	CDR3s_exact	CDR3_exact_ratio	database_diff	has_stop	looks_like_V	CDR3_start	whitelist_diff	closest_whitelist	consensus
-IGHV1-18*01	IGHV1-18*01	VH	all	38082	12	9605	18063	10	4649	3.9	0	0	1	288	0	IGHV1-18*01	CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-2*02	IGHV1-2*02	VH	all	18707	13	4522	8398	10	2133	3.9	0	0	1	288	0	IGHV1-2*02	CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAACAGTGGTGGCACAAACTATGCACAGAAGTTTCAGGGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-2*04	IGHV1-2*04	VH	all	3081	9	731	1015	5	277	3.7	0	0	1	288	0	IGHV1-2*04	CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAACAGTGGTGGCACAAACTATGCACAGAAGTTTCAGGGCTGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-24*01	IGHV1-24*01	VH	all	1303	7	341	735	7	222	3.3	0	0	1	288	0	IGHV1-24*01	CAGGTCCAGCTGGTACAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGTTTCCGGATACACCCTCACTGAATTATCCATGCACTGGGTGCGACAGGCTCCTGGAAAAGGGCTTGAGTGGATGGGAGGTTTTGATCCTGAAGATGGTGAAACAATCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCGAGGACACATCTACAGACACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCAACAGA
-IGHV1-3*01	IGHV1-3*01	VH	all	14794	9	3281	6096	7	1431	4.3	0	0	1	288	0	IGHV1-3*01	CAGGTCCAGCTTGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATGCTATGCATTGGGTGCGCCAGGCCCCCGGACAAAGGCTTGAGTGGATGGGATGGATCAACGCTGGCAATGGTAACACAAAATATTCACAGAAGTTCCAGGGCAGAGTCACCATTACCAGGGACACATCCGCGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAAGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV1-46*01	IGHV1-46*01	VH	all	30999	11	7474	14728	9	3665	4.0	0	0	1	288	0	IGHV1-46*01	CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCATCTGGATACACCTTCACCAGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAATAATCAACCCTAGTGGTGGTAGCACAAGCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGGACACGTCCACGAGCACAGTCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-58*01	IGHV1-58*01	VH	all	2038	6	512	1023	6	281	3.6	0	0	1	288	0	IGHV1-58*01	CAAATGCAGCTGGTGCAGTCTGGGCCTGAGGTGAAGAAGCCTGGGACCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATTCACCTTTACTAGCTCTGCTGTGCAGTGGGTGCGACAGGCTCGTGGACAACGCCTTGAGTGGATAGGATGGATCGTCGTTGGCAGTGGTAACACAAACTACGCACAGAAGTTCCAGGAAAGAGTCACCATTACCAGGGACATGTCCACAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCCGAGGACACGGCCGTGTATTACTGTGCGGCAGA
-IGHV1-69*01	IGHV1-69*01	VH	all	41033	12	10647	21253	10	6013	3.5	0	0	1	288	0	IGHV1-69*01	CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACGAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-69*06	IGHV1-69*06	VH	all	13719	11	3718	7062	9	2100	3.4	0	0	1	288	0	IGHV1-69*06	CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACAAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-69-2*01	IGHV1-69-2*01	VH	all	416	5	115	235	5	70	3.4	0	0	1	288	0	IGHV1-69-2*01	GAGGTCCAGCTGGTACAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCTACAGTGAAAATCTCCTGCAAGGTTTCTGGATACACCTTCACCGACTACTACATGCACTGGGTGCAACAGGCCCCTGGAAAAGGGCTTGAGTGGATGGGACTTGTTGATCCTGAAGATGGTGAAACAATATACGCAGAGAAGTTCCAGGGCAGAGTCACCATAACCGCGGACACGTCTACAGACACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCAACAGA
-IGHV1-8*01	IGHV1-8*01	VH	all	12011	11	3186	6200	9	1760	3.5	0	0	1	288	0	IGHV1-8*01	CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGTTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAACAGTGGTAACACAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGG
-IGHV2-26*01	IGHV2-26*01	VH	all	4191	10	968	2112	6	511	4.1	0	0	1	291	0	IGHV2-26*01	CAGGTCACCTTGAAGGAGTCTGGTCCTGTGCTGGTGAAACCCACAGAGACCCTCACGCTGACCTGCACCGTCTCTGGGTTCTCACTCAGCAATGCTAGAATGGGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACACATTTTTTCGAATGACGAAAAATCCTACAGCACATCTCTGAAGAGCAGGCTCACCATCTCCAAGGACACCTCCAAAAGCCAGGTGGTCCTTACCATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACGGATAC
-IGHV2-5*01	IGHV2-5*01	VH	all	4418	9	1039	1888	6	478	4.0	0	0	1	291	0	IGHV2-5*01	CAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGAAACCCACACAGACCCTCACGCTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTGCACTCATTTATTGGAATGATGATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACACAGAC
-IGHV2-5*02	IGHV2-5*02	VH	all	2028	10	493	744	6	205	3.6	0	0	1	291	0	IGHV2-5*02	CAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGAAACCCACACAGACCCTCACGCTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTGCACTCATTTATTGGGATGATGATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACACAGAC
-IGHV2-70*01	IGHV2-70*01	VH	all	4746	9	1152	2400	7	566	4.2	0	0	1	291	0	IGHV2-70*01	CAGGTCACCTTGAGGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGTGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACTCATTGATTGGGATGATGATAAATACTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
-IGHV2-70D*04	IGHV2-70D*04	VH	all	2957	8	652	1144	7	246	4.7	0	0	1	291	0	IGHV2-70D*04	CAGGTCACCTTGAAGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGCGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACGCATTGATTGGGATGATGATAAATTCTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
-IGHV3-11*01	IGHV3-11*01	VH	all	1787	8	506	822	7	264	3.1	0	0	1	288	0	IGHV3-11*01	CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV3-11*06	IGHV3-11*06	VH	all	221	7	80	92	5	38	2.4	0	0	1	288	0	IGHV3-11*06	CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTTACACAAACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-13*01	IGHV3-13*01	VH	all	821	6	199	380	5	97	3.9	0	0	1	285	0	IGHV3-13*01	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACGACATGCACTGGGTCCGCCAAGCTACAGGAAAAGGTCTGGAGTGGGTCTCAGCTATTGGTACTGCTGGTGACACATACTATCCAGGCTCCGTGAAGGGCCGATTCACCATCTCCAGAGAAAATGCCAAGAACTCCTTGTATCTTCAAATGAACAGCCTGAGAGCCGGGGACACGGCTGTGTATTACTGTGCAAGAGA
-IGHV3-13*01_S2321	IGHV3-13*01_S2321	VH	all	805	6	192	324	5	86	3.8	0	0	1	285	3	IGHV3-13*01	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACGACATGCACTGGGTCCGCCAAGCTACAGGAAAAGGTCTGGAGTGGGTCTCAGCTATTGGTACTGCTGGTGACACATACTATCCAGGCTCCGTGAAGGGCCGATTCACCATCTCCAGAGAAAATGCCAAGAACTCCTTGTATCTTCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCAAGAG
-IGHV3-15*01	IGHV3-15*01	VH	all	13813	11	2773	5650	9	1217	4.6	0	0	1	294	0	IGHV3-15*01	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
-IGHV3-15*07	IGHV3-15*07	VH	all	16932	11	3269	6926	8	1394	5.0	0	0	1	294	0	IGHV3-15*07	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGTTTCACTTTCAGTAACGCCTGGATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
-IGHV3-20*01	IGHV3-20*01	VH	all	529	6	170	200	6	61	3.3	0	0	1	288	0	IGHV3-20*01	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGTGTGGTACGGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGGCATGAGCTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGAGTGGGTCTCTGGTATTAATTGGAATGGTGGTAGCACAGGTTATGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCCTTGTATCACTGTGCGAGAGA
-IGHV3-20*01_S7413	IGHV3-20*01_S7413	VH	all	369	6	114	152	5	50	3.0	0	0	1	288	2	IGHV3-20*01	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGTGTGGTACGGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGGCATGAGCTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGAGTGGGTCTCTGGTATTAATTGGAATGGTGGTAGCACAGGTTATGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCCTTGTATTACTGTGCGAGAG
-IGHV3-21*01	IGHV3-21*01	VH	all	14645	9	3602	6690	8	1831	3.6	0	0	1	288	0	IGHV3-21*01	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-23*01	IGHV3-23*01	VH	all	4960	10	1433	1736	6	612	2.8	0	0	1	288	0	IGHV3-23*01	GAGGTGCAGCTGTTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGCAGCTATGCCATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGCTATTAGTGGTAGTGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGGTTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTATATTACTGTGCGAAAGA
-IGHV3-30*03_S9223	IGHV3-30*03_S9223	VH	all	379	3	53	175	3	16	10.9	0	0	1	288	16	IGHV3-30*03	CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGTAGTCTCTGGATTCACCTTCAGTAGTTATGGCATACACTGGGTCCGTCAGGCTCCAGTCAAGGGGCTGGAGTGGGTGGCAGTTATATCACATGATGGAAGTACTAAGTACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCCGAGACAATTCCAAGAACACATTGTATCTGCAAATGAACAGCCTGACATTTGAGGACACGGCTGTGTATTACTGTGCGAGGGA
-IGHV3-30-5*01	IGHV3-30-5*01	VH	all	25649	11	5549	12485	9	2841	4.4	0	0	1	288	0	IGHV3-30-5*01	CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAAAGA
-IGHV3-33*01	IGHV3-33*01	VH	all	19630	9	4104	8200	9	1900	4.3	0	0	1	288	0	IGHV3-33*01	CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCGTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-43*01	IGHV3-43*01	VH	all	1738	8	467	862	7	249	3.5	0	0	1	288	0	IGHV3-43*01	GAAGTGCAGCTGGTGGAGTCTGGGGGAGTCGTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATACCATGCACTGGGTCCGTCAAGCTCCGGGGAAGGGTCTGGAGTGGGTCTCTCTTATTAGTTGGGATGGTGGTAGCACATACTATGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACAGCAAAAACTCCCTGTATCTGCAAATGAACAGTCTGAGAACTGAGGACACCGCCTTGTATTACTGTGCAAAAGATA
-IGHV3-43D*01	IGHV3-43D*01	VH	all	1228	6	318	553	6	143	3.9	0	0	1	288	0	IGHV3-43D*01	GAAGTGCAGCTGGTGGAGTCTGGGGGAGTCGTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGCCATGCACTGGGTCCGTCAAGCTCCGGGGAAGGGTCTGGAGTGGGTCTCTCTTATTAGTTGGGATGGTGGTAGCACCTACTATGCAGACTCTGTGAAGGGTCGATTCACCATCTCCAGAGACAACAGCAAAAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCTGAGGACACCGCCTTGTATTACTGTGCAAAAGATA
-IGHV3-48*02	IGHV3-48*02	VH	all	6851	11	1741	2591	8	763	3.4	0	0	1	288	0	IGHV3-48*02	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAATGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGACGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-48*04	IGHV3-48*04	VH	all	8560	11	2086	3710	8	1004	3.7	0	0	1	288	0	IGHV3-48*04	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-49*03	IGHV3-49*03	VH	all	3686	8	983	1798	7	482	3.7	0	0	1	294	0	IGHV3-49*03	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
-IGHV3-49*05	IGHV3-49*05	VH	all	2237	9	584	1149	7	327	3.5	0	0	1	294	0	IGHV3-49*05	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
-IGHV3-53*01	IGHV3-53*01	VH	all	16832	10	3372	5862	9	1384	4.2	0	0	1	285	0	IGHV3-53*01	GAGGTGCAGCTGGTGGAGTCTGGAGGAGGCTTGATCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGGTTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV3-64*02	IGHV3-64*02	VH	all	133	5	33	41	4	12	3.4	0	0	1	288	0	IGHV3-64*02	GAGGTGCAGCTGGTGGAGTCTGGGGAAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGCTATGCACTGGGTCCGCCAGGCTCCAGGGAAGGGACTGGAATATGTTTCAGCTATTAGTAGTAATGGGGGTAGCACATATTATGCAGACTCTGTGAAGGGCAGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGGGCAGCCTGAGAGCTGAGGACATGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-7*01	IGHV3-7*01	VH	all	9716	11	2189	2560	7	729	3.5	0	0	1	288	0	IGHV3-7*01	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGTAGCTATTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTGGCCAACATAAAGCAAGATGGAAGTGAGAAATACTATGTGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-7*03	IGHV3-7*03	VH	all	6298	9	1498	2114	6	623	3.4	0	0	1	288	0	IGHV3-7*03	GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGTAGCTATTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTGGCCAACATAAAGCAAGATGGAAGTGAGAAATACTATGTGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV3-73*02	IGHV3-73*02	VH	all	6251	10	1185	1736	6	402	4.3	0	0	1	294	0	IGHV3-73*02	GAGGTGCAGCTGGTGGAGTCCGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGGTTCACCTTCAGTGGCTCTGCTATGCACTGGGTCCGCCAGGCTTCCGGGAAAGGGCTGGAGTGGGTTGGCCGTATTAGAAGCAAAGCTAACAGTTACGCGACAGCATATGCTGCGTCGGTGAAAGGCAGGTTCACCATCTCCAGAGATGATTCAAAGAACACGGCGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACGGCCGTGTATTACTGTACTAGACA
-IGHV3-74*01	IGHV3-74*01	VH	all	7208	9	1584	2390	8	586	4.1	0	0	1	288	0	IGHV3-74*01	GAGGTGCAGCTGGTGGAGTCCGGGGGAGGCTTAGTTCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACTGGATGCACTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGTGTGGGTCTCACGTATTAATAGTGATGGGAGTAGCACAAGCTACGCGGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACACGCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
-IGHV3-9*01	IGHV3-9*01	VH	all	805	6	261	373	6	126	3.0	0	0	1	288	0	IGHV3-9*01	GAAGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGCAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGCCATGCACTGGGTCCGGCAAGCTCCAGGGAAGGGCCTGGAGTGGGTCTCAGGTATTAGTTGGAATAGTGGTAGCATAGGCTATGCGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCTGAGGACACGGCCTTGTATTACTGTGCAAAAGATA
-IGHV4-28*01	IGHV4-28*01	VH	all	55	5	23	24	4	11	2.2	0	0	1	288	0	IGHV4-28*01	CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGACACCCTGTCCCTCACCTGCGCTGTCTCTGGTTACTCCATCAGCAGTAGTAACTGGTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTACATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGTGGACACGGCCGTGTATTACTGTGCGAGAAA
-IGHV4-31*03	IGHV4-31*03	VH	all	5511	11	1727	2519	8	905	2.8	0	0	1	291	0	IGHV4-31*03	CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTGGTGGTTACTACTGGAGCTGGATCCGCCAGCACCCAGGGAAGGGCCTGGAGTGGATTGGGTACATCTATTACAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTTACCATATCAGTAGACACGTCTAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACTGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-34*01	IGHV4-34*01	VH	all	38338	12	11990	18928	10	6988	2.7	0	0	1	285	0	IGHV4-34*01	CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCGAGAGG
-IGHV4-38-2*02	IGHV4-38-2*02	VH	all	5678	9	1579	2099	7	720	2.9	0	0	1	288	0	IGHV4-38-2*02	CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTTACTCCATCAGCAGTGGTTACTACTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATCATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-39*01	IGHV4-39*01	VH	all	12250	10	3346	5180	7	1708	3.0	0	0	1	291	0	IGHV4-39*01	CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACTGTGCGAGACA
-IGHV4-39*07	IGHV4-39*07	VH	all	19396	11	5780	7910	9	2829	2.8	0	0	1	291	0	IGHV4-39*07	CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-4*02	IGHV4-4*02	VH	all	4465	8	1382	2030	7	734	2.8	0	0	1	288	0	IGHV4-4*02	CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGGGACCCTGTCCCTCACCTGCGCTGTCTCTGGTGGCTCCATCAGCAGTAGTAACTGGTGGAGTTGGGTCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCTATCATAGTGGGAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACAAGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-4*07	IGHV4-4*07	VH	all	5043	10	1391	1886	7	678	2.8	0	0	1	285	0	IGHV4-4*07	CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-59*01	IGHV4-59*01	VH	all	16798	10	5022	7023	9	2549	2.8	0	0	1	285	0	IGHV4-59*01	CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-61*01	IGHV4-61*01	VH	all	4946	8	1515	1862	8	709	2.6	0	0	1	291	0	IGHV4-61*01	CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCGTCAGCAGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV5-10-1*03	IGHV5-10-1*03	VH	all	4317	7	1046	1720	7	476	3.6	0	0	1	288	0	IGHV5-10-1*03	GAAGTGCAGCTGGTGCAGTCCGGAGCAGAGGTGAAAAAGCCCGGGGAGTCTCTGAGGATCTCCTGTAAGGGTTCTGGATACAGCTTTACCAGCTACTGGATCAGCTGGGTGCGCCAGATGCCCGGGAAAGGCCTGGAGTGGATGGGGAGGATTGATCCTAGTGACTCTTATACCAACTACAGCCCGTCCTTCCAAGGCCACGTCACCATCTCAGCTGACAAGTCCATCAGCACTGCCTACCTGCAGTGGAGCAGCCTGAAGGCCTCGGACACCGCCATGTATTACTGTGCGAGA
-IGHV5-51*01	IGHV5-51*01	VH	all	30679	10	7649	11953	9	3386	3.5	0	0	1	288	0	IGHV5-51*01	GAGGTGCAGCTGGTGCAGTCTGGAGCAGAGGTGAAAAAGCCCGGGGAGTCTCTGAAGATCTCCTGTAAGGGTTCTGGATACAGCTTTACCAGCTACTGGATCGGCTGGGTGCGCCAGATGCCCGGGAAAGGCCTGGAGTGGATGGGGATCATCTATCCTGGTGACTCTGATACCAGATACAGCCCGTCCTTCCAAGGCCAGGTCACCATCTCAGCCGACAAGTCCATCAGCACCGCCTACCTGCAGTGGAGCAGCCTGAAGGCCTCGGACACCGCCATGTATTACTGTGCGAGACA
-IGHV6-1*01	IGHV6-1*01	VH	all	18773	11	3603	6639	8	1316	5.0	0	0	1	297	0	IGHV6-1*01	CAGGTACAGCTGCAGCAGTCAGGTCCAGGACTGGTGAAGCCCTCGCAGACCCTCTCACTCACCTGTGCCATCTCCGGGGACAGTGTCTCTAGCAACAGTGCTGCTTGGAACTGGATCAGGCAGTCCCCATCGAGAGGCCTTGAGTGGCTGGGAAGGACATACTACAGGTCCAAGTGGTATAATGATTATGCAGTATCTGTGAAAAGTCGAATAACCATCAACCCAGACACATCCAAGAACCAGTTCTCCCTGCAGCTGAACTCTGTGACTCCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
-IGHV7-4-1*01	IGHV7-4-1*01	VH	all	140	5	42	44	3	14	3.1	0	0	1	288	0	IGHV7-4-1*01	CAGGTGCAGCTGGTGCAATCTGGGTCTGAGTTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATGCTATGAATTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACACCAACACTGGGAACCCAACGTATGCCCAGGGCTTCACAGGACGGTTTGTCTTCTCCTTGGACACCTCTGTCAGCACGGCATATCTGCAGATCTGCAGCCTAAAGGCTGAGGACACTGCCGTGTATTACTGTGCGAGA


=====================================
tests/data/H1/test.sh deleted
=====================================
@@ -1,5 +0,0 @@
-#!/bin/bash
-set -euo pipefail
-
-igdiscover germlinefilter --whitelist=V.fasta --max-differences=0 --unique-CDR3=5 --cluster-size=100 --unique-J=3 --cross-mapping-ratio=0.02 --allele-ratio=0.1 candidates.tab.gz > new_V_germline.tab
-diff -U 0 <(cut -f1 expected.tab) <(cut -f1 new_V_germline.tab) | grep '^[+-]' | sed 1,2d


=====================================
tests/run.sh deleted
=====================================
@@ -1,56 +0,0 @@
-#!/bin/bash
-# Run this within an activated igdiscover environment
-set -euo pipefail
-set -x
-unset DISPLAY
-
-pytest
-
-rm -rf testrun
-mkdir testrun
-[[ -L testdata ]] || ln -s igdiscover-testdata testdata
-
-
-# Test whether specifying primer sequences leads to a SyntaxError
-igdiscover init --db=testdata/database --reads=testdata/reads.1.fastq.gz testrun/primers
-pushd testrun/primers
-igdiscover config \
-	--set forward_primers "['CGTGA']" \
-	--set reverse_primers "['TTCAC']"
-igdiscover run -n stats/reads.json
-popd
-
-# Test using FLASH and parsing its log output
-igdiscover init --db=testdata/database --reads=testdata/reads.1.fastq.gz testrun/flash
-pushd testrun/flash
-igdiscover config --set merge_program flash
-igdiscover run stats/reads.json
-popd
-
-
-igdiscover init --db=testdata/database --reads=testdata/reads.1.fastq.gz testrun/paired
-pushd testrun/paired
-igdiscover config --set barcode_length_3prime 21
-
-igdiscover run nofinal
-if [[ -d final/ ]]; then
-	echo "ERROR: nofinal failed"
-	exit 1
-fi
-
-# run final iteration
-igdiscover run
-
-igdiscover run iteration-01/exact.tab
-popd
-
-# Use the merged file from above as input again
-igdiscover init --db=testdata/database --single-reads=testrun/paired/reads/2-merged.fastq.gz testrun/singlefastq
-cp -p testrun/paired/igdiscover.yaml testrun/singlefastq/
-( cd testrun/singlefastq && igdiscover run stats/reads.json )
-
-# Test FASTA input
-cutadapt --quiet -o testrun/reads.fasta testrun/paired/reads/2-merged.fastq.gz
-igdiscover init --db=testdata/database --single-reads=testrun/reads.fasta testrun/singlefasta
-cp -p testrun/paired/igdiscover.yaml testrun/singlefastq/
-( cd testrun/singlefasta && igdiscover run stats/reads.json )


=====================================
tests/test_commands.py
=====================================
@@ -1,9 +1,16 @@
 import os
 import sys
 import pytest
+import contextlib
+import shutil
+
 
 from igdiscover.__main__ import main
 from .utils import datapath, resultpath, files_equal
+from igdiscover.cli.init import run_init
+from igdiscover.cli.config import print_configuration, modify_configuration
+from igdiscover.cli.run import run_snakemake
+from igdiscover.cli.clonotypes import run_clonotypes
 
 
 @pytest.fixture
@@ -25,6 +32,59 @@ def run(tmpdir):
     return _run
 
 
+ at pytest.fixture
+def pipeline_dir(tmp_path):
+    """An initialized pipeline directory"""
+    pipeline_path = tmp_path / "initializedpipeline"
+    init_testdata(pipeline_path)
+    return pipeline_path
+
+
+def init_testdata(directory):
+    run_init(
+        database="testdata/database",
+        reads1="testdata/reads.1.fastq.gz",
+        directory=str(directory),
+    )
+    with chdir(directory):
+        modify_configuration([("barcode_length_3prime", "21")])
+
+
+ at contextlib.contextmanager
+def chdir(path):
+    previous_path = os.getcwd()
+    os.chdir(path)
+    yield
+    os.chdir(previous_path)
+
+
+ at pytest.fixture(scope="session")
+def filtered_tab_session(tmp_path_factory):
+    """Generate iteration-01/filtered.tab.gz"""
+
+    pipeline_dir = tmp_path_factory.mktemp("pipedir") / "pipedir"
+    init_testdata(pipeline_dir)
+    with chdir(pipeline_dir):
+        run_snakemake(targets=["iteration-01/filtered.tab.gz"])
+    return pipeline_dir
+
+
+ at pytest.fixture
+def has_filtered_tab(filtered_tab_session, tmp_path):
+    """
+    Give a fresh copy of a pipeline dir in which iteration-01/filtered.tab.gz
+    is guaranteed to exist
+    """
+    pipeline_dir = tmp_path / "has_filtered_tab"
+    shutil.copytree(
+        filtered_tab_session,
+        pipeline_dir,
+        symlinks=True,
+        ignore=shutil.ignore_patterns((".snakemake")),
+    )
+    return pipeline_dir
+
+
 def test_main():
     with pytest.raises(SystemExit) as exc:
         main(['--version'])
@@ -54,3 +114,115 @@ def test_clusterplot(tmpdir):
 def test_igblast(run):
     args = ['igblast', '--threads=1', datapath('database/'), datapath('igblast.fasta')]
     run(args, resultpath('assigned.tab'))
+
+
+def test_run_init(pipeline_dir):
+    assert pipeline_dir.is_dir()
+    assert (pipeline_dir / "igdiscover.yaml").exists()
+
+
+def test_print_configuration(pipeline_dir):
+    print_configuration(path=pipeline_dir / "igdiscover.yaml")
+
+
+def test_modify_configuration(pipeline_dir):
+    modify_configuration(
+        settings=[("d_coverage", "12"), ("j_discovery.allele_ratio", "0.37")],
+        path=str(pipeline_dir / "igdiscover.yaml"),
+    )
+    import ruamel.yaml
+    with open(pipeline_dir / "igdiscover.yaml") as f:
+        config = ruamel.yaml.safe_load(f)
+    assert config["d_coverage"] == 12
+    assert config["j_discovery"]["allele_ratio"] == 0.37
+
+
+def test_dryrun(pipeline_dir):
+    with chdir(pipeline_dir):
+        run_snakemake(dryrun=True)
+
+
+def test_primers(pipeline_dir):
+    # Test whether specifying primer sequences leads to a SyntaxError
+    with chdir(pipeline_dir):
+        modify_configuration(
+            settings=[
+                ("forward_primers", "['CGTGA']"),
+                ("reverse_primers", "['TTCAC']"),
+            ],
+        )
+        run_snakemake(dryrun=True)
+
+
+def test_flash(pipeline_dir):
+    # Test using FLASH and parsing its log output
+    with chdir(pipeline_dir):
+        modify_configuration(settings=[("merge_program", "flash")])
+        run_snakemake(targets=["stats/reads.json"])
+        # Ensure FLASH was actually run
+        assert (pipeline_dir / "reads/2-flash.log").exists()
+
+
+def test_snakemake_assigned_tab(has_filtered_tab):
+    assert (has_filtered_tab / "iteration-01/filtered.tab.gz").exists()
+    assert not (has_filtered_tab / "iteration-01/new_V_germline.tab").exists()
+
+
+def test_snakemake_exact_tab(has_filtered_tab):
+    with chdir(has_filtered_tab):
+        run_snakemake(targets=["iteration-01/exact.tab"])
+    assert (has_filtered_tab / "iteration-01/exact.tab").exists()
+
+
+def test_snakemake_final(has_filtered_tab):
+    with chdir(has_filtered_tab):
+        run_snakemake(targets=["nofinal"])
+    assert (has_filtered_tab / "iteration-01/new_V_germline.tab").exists()
+    assert not (has_filtered_tab / "final/assigned.tab.gz").exists()
+
+    with chdir(has_filtered_tab):
+        run_snakemake()
+    assert (has_filtered_tab / "final/assigned.tab.gz").exists()
+
+
+def test_clonotypes(has_filtered_tab):
+    run_clonotypes(has_filtered_tab / "iteration-01/assigned.tab.gz", limit=5)
+
+
+def test_fastq_input(has_filtered_tab, tmp_path):
+    # Use merged reads from already-run pipeline as input for a new run
+    single_reads = has_filtered_tab / "reads" / "2-merged.fastq.gz"
+    directory = tmp_path / "singleend-fastq"
+    run_init(
+        database="testdata/database",
+        single_reads=str(single_reads),
+        directory=str(directory),
+    )
+    with chdir(directory):
+        modify_configuration([("barcode_length_3prime", "21")])
+        run_snakemake(targets=["stats/reads.json"])
+
+
+def test_fasta_input(has_filtered_tab, tmp_path):
+    fasta_path = tmp_path / "justfasta.fasta"
+    convert_fastq_to_fasta(
+        has_filtered_tab / "reads" / "2-merged.fastq.gz",
+        fasta_path,
+    )
+    directory = tmp_path / "singleend-fasta"
+    run_init(
+        database="testdata/database",
+        single_reads=str(fasta_path),
+        directory=str(directory),
+    )
+    with chdir(directory):
+        modify_configuration([("barcode_length_3prime", "21")])
+        run_snakemake(targets=["stats/reads.json"])
+
+
+def convert_fastq_to_fasta(fastq, fasta):
+    import dnaio
+    with dnaio.open(fastq) as inf:
+        with dnaio.open(fasta, mode="w") as outf:
+            for record in inf:
+                outf.write(record)



View it on GitLab: https://salsa.debian.org/med-team/igdiscover/-/compare/1182895c2727a9d13216b8ee068504472bdbc1e4...f69cd6756a7c988593763c95e3dda9a8114e854d

-- 
View it on GitLab: https://salsa.debian.org/med-team/igdiscover/-/compare/1182895c2727a9d13216b8ee068504472bdbc1e4...f69cd6756a7c988593763c95e3dda9a8114e854d
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200229/a53bb2b1/attachment-0001.html>


More information about the debian-med-commit mailing list