[med-svn] [Git][med-team/igdiscover][master] 4 commits: New upstream version 0.12.2
Steffen Möller
gitlab at salsa.debian.org
Sat Feb 29 19:19:02 GMT 2020
Steffen Möller pushed to branch master at Debian Med / igdiscover
Commits:
e643e65d by Steffen Moeller at 2020-02-29T19:41:33+01:00
New upstream version 0.12.2
- - - - -
093d29a9 by Steffen Moeller at 2020-02-29T19:41:34+01:00
Update upstream source from tag 'upstream/0.12.2'
Update to upstream version '0.12.2'
with Debian dir 0ffb7863b4d1188a90edc8aa97138f4126e759a7
- - - - -
93835c00 by Steffen Moeller at 2020-02-29T20:08:47+01:00
Interim round of adjustments
- - - - -
f69cd675 by Steffen Moeller at 2020-02-29T20:18:15+01:00
Functional, except for missing igblast binary
- - - - -
30 changed files:
- .travis.yml
- CHANGES.rst
- debian/changelog
- debian/control
- debian/patches/adjusting_tests_for_python3.patch
- + debian/patches/avoidTroubleWithPkgResources.patch
- debian/patches/series
- − debian/patches/versioneerversion.patch
- debian/rules
- doc/conf.py
- doc/guide.rst
- doc/installation.rst
- environment.lock.yml
- environment.yml
- src/igdiscover/Snakefile
- src/igdiscover/__main__.py
- src/igdiscover/cli/__init__.py
- src/igdiscover/cli/clonoquery.py
- src/igdiscover/cli/clonotypes.py
- src/igdiscover/cli/config.py
- src/igdiscover/cli/init.py
- src/igdiscover/cli/run.py
- src/igdiscover/readlenhistogram.py
- − tests/data/H1/README.md
- − tests/data/H1/V.fasta
- − tests/data/H1/candidates.tab.gz
- − tests/data/H1/expected.tab
- − tests/data/H1/test.sh
- − tests/run.sh
- tests/test_commands.py
Changes:
=====================================
.travis.yml
=====================================
@@ -1,11 +1,11 @@
language: python
+
+os: linux
+
cache:
directories:
- $HOME/.cache/igdiscover
-python:
- - "3.6"
-
before_install:
- wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- bash miniconda.sh -b -p $HOME/miniconda
@@ -16,6 +16,7 @@ before_install:
- conda info -a
- wget https://bitbucket.org/igdiscover/testdata/downloads/igdiscover-testdata-0.6.tar.gz
- tar xvf igdiscover-testdata-0.6.tar.gz
+ - ln -s igdiscover-testdata testdata
- "echo 'use_cache: true' > $HOME/.config/igdiscover.conf"
install:
@@ -24,4 +25,23 @@ install:
- source activate testenv
- pip install .
-script: tests/run.sh
+script: pytest
+
+env:
+ global:
+ - TWINE_USERNAME=__token__
+
+jobs:
+ include:
+ - python: "3.7"
+
+ - stage: deploy
+ python: "3.7"
+ install: python3 -m pip install twine
+ if: tag IS present
+ script:
+ - |
+ python3 setup.py sdist
+ python3 -m pip wheel -w dist/ .
+ ls -l dist/
+ python3 -m twine upload dist/igdiscover-*
=====================================
CHANGES.rst
=====================================
@@ -11,7 +11,7 @@ v0.12 (2020-01-20)
reason for filters that compare candidates to each other. Now each filter
criterion can be distinguised.
* The somewhat vague “too similar sequence” germline filter criterion
- incorrectly removed some candidates that have a mutation close to the 5' end.
+ incorrectly removed some candidates that have a mutation close to the 3’ end.
This was replaced with a simpler filter that only ensures that there are no
two candidates with the same sequence.
* Use IgBLAST 1.10
=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+igdiscover (0.12.2-1) UNRELEASED; urgency=medium
+
+ * New upstream version
+ BLOCKER: Still needs igblast to test
+
+ -- Steffen Moeller <moeller at debian.org> Sun, 26 Jan 2020 01:47:18 +0100
+
igdiscover (0.12-1) UNRELEASED; urgency=medium
* New upstream version
=====================================
debian/control
=====================================
@@ -15,6 +15,7 @@ Build-Depends: debhelper-compat (= 12),
python3-cutadapt,
python3-scipy (>=0.16.1),
python3-xopen (>=0.3.2),
+ python3-pkg-resources,
python3-ruamel.yaml,
python3-xopen,
python3-tinyalign,
=====================================
debian/patches/adjusting_tests_for_python3.patch
=====================================
@@ -1,21 +1,8 @@
-Index: igdiscover/tests/run.sh
-===================================================================
---- igdiscover.orig/tests/run.sh
-+++ igdiscover/tests/run.sh
-@@ -4,7 +4,7 @@ set -euo pipefail
- set -x
- unset DISPLAY
-
--pytest
-+pytest-3
-
- rm -rf testrun
- mkdir testrun
Index: igdiscover/src/igdiscover/__main__.py
===================================================================
--- igdiscover.orig/src/igdiscover/__main__.py
+++ igdiscover/src/igdiscover/__main__.py
-@@ -59,7 +59,7 @@ def main(arguments=None):
+@@ -60,7 +60,7 @@ def main(arguments=None):
parser = HelpfulArgumentParser(description=__doc__, prog='igdiscover')
parser.add_argument('--profile', default=False, action='store_true',
help='Save profiling information to igdiscover.prof')
=====================================
debian/patches/avoidTroubleWithPkgResources.patch
=====================================
@@ -0,0 +1,23 @@
+Index: igdiscover/doc/conf.py
+===================================================================
+--- igdiscover.orig/doc/conf.py
++++ igdiscover/doc/conf.py
+@@ -42,16 +42,8 @@ copyright = u'2015-2017, ' + authors
+ #
+
+ from pkg_resources import get_distribution
+-release = get_distribution('igdiscover').version
+-# Read The Docs modifies the conf.py script and we therefore get
+-# version numbers like 0.12+0.g27d0d31
+-if os.environ.get('READTHEDOCS') == 'True':
+- version = '.'.join(release.split('.')[:2])
+-else:
+- version = release
+-
+-# The full version, including alpha/beta/rc tags.
+-release = version
++release = "0.12"
++version = "0.12"
+
+ suppress_warnings = ['image.nonlocal_uri']
+
=====================================
debian/patches/series
=====================================
@@ -1,3 +1,3 @@
dontusetagsforversion.patch
-versioneerversion.patch
adjusting_tests_for_python3.patch
+avoidTroubleWithPkgResources.patch
=====================================
debian/patches/versioneerversion.patch deleted
=====================================
@@ -1,13 +0,0 @@
-Index: igdiscover/doc/conf.py
-===================================================================
---- igdiscover.orig/doc/conf.py
-+++ igdiscover/doc/conf.py
-@@ -47,7 +47,7 @@ copyright = u'2015-2017, ' + authors
- # will not work), so the following is what we need to do.
- import subprocess
- version = subprocess.check_output(
-- [sys.executable, '-c', 'import versioneer; print(versioneer.get_version())'],
-+ [sys.executable, '-c', 'import versioneer; print(versioneer.sys.version)'],
- cwd='..').decode().strip()
-
- # Read The Docs modifies the conf.py script and we therefore get 'dirty'
=====================================
debian/rules
=====================================
@@ -22,6 +22,7 @@ override_dh_auto_clean:
dh_auto_clean
rm -rf .pytest_cache debian/python3-igdiscover
rm -f src/igdiscover/__version__.py
+ #rm -rf src/igdiscover.egg-info/
override_dh_auto_test:
echo "**** NOT TESTING ******"
=====================================
doc/conf.py
=====================================
@@ -1,4 +1,5 @@
# IgDiscover documentation build configuration file
+import time
import sys
import os
@@ -40,22 +41,14 @@ copyright = u'2015-2017, ' + authors
# built documents.
#
-# When generating the documentation, we currently do not require the
-# dependencies to be installed. We therefore cannot 'import' anything from
-# igdiscover since that may fail. The versioneer module can only be imported
-# from the project root (it has extra checks such that even changing sys.path
-# will not work), so the following is what we need to do.
-import subprocess
-version = subprocess.check_output(
- [sys.executable, '-c', 'import versioneer; print(versioneer.get_version())'],
- cwd='..').decode().strip()
-
-# Read The Docs modifies the conf.py script and we therefore get 'dirty'
-# version numbers like 0.12+0.g27d0d31.dirty from versioneer.
-if version.endswith('.dirty') and os.environ.get('READTHEDOCS') == 'True':
- version, _, rest = version.partition('+')
- if not rest.startswith('0.'):
- version = version + '+' + rest[:-6]
+from pkg_resources import get_distribution
+release = get_distribution('igdiscover').version
+# Read The Docs modifies the conf.py script and we therefore get
+# version numbers like 0.12+0.g27d0d31
+if os.environ.get('READTHEDOCS') == 'True':
+ version = '.'.join(release.split('.')[:2])
+else:
+ version = release
# The full version, including alpha/beta/rc tags.
release = version
=====================================
doc/guide.rst
=====================================
@@ -41,7 +41,7 @@ To run an analysis, proceed as follows.
Next, choose the directory with your database.
The directory must contain the three files ``V.fasta``, ``D.fasta``, ``J.fasta``.
These files contain the V, D, J gene sequences, respectively.
- Even if have have only light chains in your data, a ``D.fasta`` file needs to be provided,
+ Even if you have only light chains in your data, a ``D.fasta`` file needs to be provided;
just use one with the heavy chain D gene sequences.
If you do not want a graphical user interface, use the two command-line
@@ -57,9 +57,7 @@ To run an analysis, proceed as follows.
2. Adjust the configuration file
The previous step created a configuration file named ``myexperiment/igdiscover.yaml``, which
- you may :ref:`need to adjust <configuration>`. In particular, the number of discovery rounds
- is set to 3 by default, which takes a long time. Reducing this to 2 or even 1 often works just
- as well.
+ you may :ref:`need to adjust <configuration>`.
3. Run the analysis
@@ -67,9 +65,9 @@ To run an analysis, proceed as follows.
igdiscover run
- Depending on the size of your library, your computer, and the number of iterations, this will
- now take from a few hours to a day. See the :ref:`running IgDiscover <running>` section for
- more fine-grained control over what to run and how to resume the process if something failed.
+ Depending on the size of your library, this will usually take a couple of hours. See the
+ :ref:`running IgDiscover <running>` section for more fine-grained control over what to run and
+ how to resume the process if something failed.
.. _obtaining-database:
@@ -84,13 +82,13 @@ For discovering new VH genes, for example, you need to get the IGHV, IGHD and IG
As IgDiscover uses this only as a starting point, using a similar species will also work.
When using an IMGT database, it is very important to change the long IMGT sequence headers to
-short headers as IgBLAST does not accept the long headers. We recommend using the program
+short headers as IgBLAST does not accept the long headers. You can use the program
``edit_imgt_file.pl``. If you installed IgDiscover from Conda, the script is already installed and
you can run it by typing the name. It is also
`available on the IgBlast FTP site <ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/>`_.
-Run it for all three downloaded files, and then rename files appropritely to make sure that they
-named ``V.fasta``, ``D.fasta`` and ``J.fasta``.
+Run it for all three downloaded files, and then rename files appropriately to make sure that they
+are named ``V.fasta``, ``D.fasta`` and ``J.fasta``.
You always need a file with D genes even if you analyze light chains.
@@ -131,16 +129,15 @@ that could not be merged are discarded. Single-end reads and merged paired-end r
to follow this structure (from 5' to 3'):
* The forward primer sequence. This is optional.
-* A random barcode (molecular identifier). This is optional. Set the
- configuration option ``barcode_length_5p`` to 0 if you don’t have random barcodes
- or if you don’t want the program to use them.
+* A UMI (random barcode). This is optional. Set the configuration option ``barcode_length_5p`` to 0
+ if you don’t have random barcodes or if you don’t want the program to use them.
* Optionally, a run of G nucleotides. This is an artifact of the RACE protocol (Rapid
amplification of cDNA ends). If you have this, set ``race_g`` to ``true`` in the configuration file.
* 5' UTR
* Leader
* Re-arranged V, D and J gene sequences for heavy chains; only V and J for light chains
-* An optional random barcode. Set the configuration option ``barcode_length_3p`` to the length of
- this barcode. You can currently not have both a 5' and a 3' barcode.
+* An optional UMI (random barcode). Set the configuration option ``barcode_length_3p`` to the
+ length of this UMI. You can currently not have both a 5' and a 3' UMI.
* The reverse primer. This is optional.
We use IgBLAST to detect the location of the V, D, J genes through the
@@ -180,19 +177,27 @@ A few rules that may be good to know are the following ones:
To find out what the configuration options achieve, see the explanations in the configuration file itself.
-The main parameters parameters that may require adjusting are the following.
+The main parameters that may require adjusting are the following.
The ``iterations`` option sets the number of rounds of V gene discovery
-that will be performed. By default, three iterations are run. Even with a very restricted
-starting V database (for example with only a single V gene sequence),
-this is usually sufficient to identify most novel germline sequences.
+that will be performed. By default, one iteration is run. In each
+iteration, all the sequences will be mapped with IgBLAST, which is the
+most time-consuming part of running the pipeline. Thus, when you go from 1 to 2
+iterations, you almost double the runtime requirements.
-When the starting database is more complete, for example, when analyzing
-a human IgM library with the current IMGT heavy chain database, a single
-iteration may be sufficient to produce an individualized database.
+In previous IgDiscover versions, more iterations than one were necessary,
+but we have improved sensitivity since then, so you should not need to increase
+this.
-If you do not want to discover any new genes and only want to produce an
-expression profile, for example, then use ``iterations: 0``.
+Especially for nearly complete starting databases, for example when
+analyzing a human IgM library with the current IMGT heavy chain database,
+a single iteration is totally sufficient to produce an individualized database.
+
+If you start with a very small V database (for example with only a single
+V gene sequence), you may get better results when you increase this to 2..
+
+If you do not want to discover any new genes, then use ``iterations: 0``..
+This may be useful to only produce an expression profile, for example.
The ``ignore_j`` option should be set to ``true`` when producing a V gene
database for a species where J sequences are unknown::
@@ -208,12 +213,27 @@ if you do not specify any primer sequences.
Pregermline and germline filter criteria
----------------------------------------
-This provides IgDiscover with stringency requirements for V gene discovery
-that enable the program to filter out false positives. Usually the ”pregermline
-filter” can be used in the default mode since all these sequences will be
-subsequently passed to the higher stringency ”germline filter” where the
-criteria are set to maximize stringency. Here is how it looks in the configuration
-file::
+IgDiscover V gene discovery works in two stages: The program first generates
+a list of *candidate* V gene sequences. This list includes many false
+positives. In the subsequent *germline filtering* step, the list is
+therefore trimmed rigorously in order to produce the final list of germline
+sequences.
+
+The stringency requirements for the germline filter can be set in the
+configuration file in the `germline_filter` and `pregermline_filter`
+sections.
+
+The `pregermline_filter` section is used in all but the last iteration.
+That is, it is ignored if you use the default of running only a single
+iteration.
+
+The idea behind the pregermline filter is to initially use less stringent
+riteria, which allows to grow the starting database more quickly, but at
+the risk of adding some false positives. The last iteration, in which the
+more stringent `germline_filter` settings are used, will then remove those
+remaining false positives.
+
+Here is how it looks in the configuration file::
pre_germline_filter:
unique_cdr3s: 2 # Minimum number of unique CDR3s (within exact matches)
@@ -255,6 +275,18 @@ about the :ref:`germline filters <germline-filters>`..
The ``differences`` configuration setting was removed.
+.. _jdiscovery:
+
+IgDiscover will also try to discover which J genes are used in the input sample. J discovery
+is configured in the ``j_discovery`` section in the configuration file. It looks like this::
+
+ j_discovery:
+ allele_ratio: 0.2 # Required minimum ratio between alleles of a single gene
+ cross_mapping_ratio: 0.1 # Threshold for removal of cross-mapping artifacts.
+ propagate: true # Use J genes discovered in iteration 1 in subsequent ones
+
+
+
.. _running:
Running IgDiscover
@@ -376,8 +408,7 @@ Final results are found in the ``final/`` subdirectory of the analysis directory
final/database/(V,D,J).fasta
These three files represent the final, individualized V/D/J database found by IgDiscover.
- The D and J files are copies of the original starting database;
- they are not updated by IgDiscover.
+ The D file is a copy of the original starting database; it is not updated by IgDiscover.
final/dendrogram_(V,D,J).pdf
These three PDF files contain dendrograms of the V, D and J sequences in the individualized
@@ -441,6 +472,9 @@ iteration-xx/new_V_germline.fasta, iteration-xx/new_V_pregermline.fasta
iteration-xx/annotated_V_germline.tab, iteration-xx/annotated_V_pregermline.tab
A version of the ``candidates.tab`` file that is annotated with extra columns that describe why a candidate was filtered out. See :ref:`the description of this file <annotated_v_tab>`.
+iteration-xx/new_J.tab, iteration-xx/new_J.fasta
+ The discovered list of J genes for this iteration.
+
Other files
-----------
@@ -800,6 +834,9 @@ rename
union
Compute union of sequences in multiple FASTA files
+:ref:`clonotypes <clonotypes>`
+ List the clonotypes (unique V, J, CDR3 combinations) present in a sample
+
The following subcommands are used internally, and listed here for completeness.
@@ -826,6 +863,60 @@ errorplot
Plot histograms of differences to reference V gene
+.. _clonotypes:
+
+``igdiscover clonotypes``
+-------------------------
+
+The ``igdiscover clonotypes`` command lists the clonotypes present in a sample.
+The only required parameter is the name of a file with assigned sequences.
+Normally, this will be a ``filtered.tab.gz`` file.
+
+Two sequences are considered to be of the same clonotype if
+
+- their V and J assignments are the same
+- the length of their CDR3 is identical
+- their CDR3 sequences are similar (see below for what this means)
+
+That is, clonotypes are found by clustering the input sequences by V gene,
+J gene and CDR3 similarity (using single-linkage clustering).
+
+For each cluster, a representative row (assignment) is chosen and
+considered to be the clonotype. The output is a table with one row
+per clonotype. It is written to standard output.
+
+By default, the output table is sorted by V/D/J gene names.
+Use ``--sort`` to sort by group size (largest first).
+
+Similarity
+~~~~~~~~~~
+
+To determine whether two CDR3s are similar, the Hamming distance
+between the CDR3 nucleotide sequences (``CDR3_nt`` column) must be
+at most 1. To allow more differences, use ``--mismatches``. To
+compare amino acid sequences (``CDR3_aa``) instead, use ``--aa``.
+
+The members file
+~~~~~~~~~~~~~~~~
+
+If desired, the constituents (“members”) of each cluster can be
+output to a file using ``--members=outputfilename.tab``.
+Clusters are separated by empty lines and order the same as
+in the clonotypes table.
+
+In the members table, additional fields are added that are intended
+to describe “mutation rates”. These fields are named
+``XXX_mindiffrate``, where ``XXX`` is ``CDR3_nt``, ``CDR3_aa``, ``VDJ_nt``, and ``VDJ_aa``.
+
+Within each cluster, the row with the lowest ``V_SHM`` value
+(the least mutated V) is chosen as reference. If the ``V_SHM``
+is higher than ``--v-shm-threshold``, the ``_mindiffrate`` fields are not computed.
+
+To compute a field such as ``CDR3_nt_mindiffrate`` for a row, the edit distance between
+``CDR3_nt`` of this row and of the reference row are computed and divided by
+the length of ``CDR3_nt`` of the reference row (and multiplied by 100 to give a percentage).
+
+
.. _germline-filters:
Germline and pre-germline filtering
=====================================
doc/installation.rst
=====================================
@@ -22,7 +22,7 @@ if you cannot use Conda.
Installing IgDiscover with Conda
--------------------------------
-1. Install `Conda <https://conda.io/>`_ by following the `conda installation
+1. Install `Conda`_ by following the `conda installation
instructions <https://conda.io/docs/user-guide/install/>`_
as appropriate for your system. You will need to choose between a “Miniconda”
and “Anaconda” installation. We recommend Miniconda as the download is
=====================================
environment.lock.yml
=====================================
@@ -10,26 +10,27 @@ dependencies:
- blas=2.14=openblas
- bzip2=1.0.8=h516909a_2
- ca-certificates=2019.11.28=hecc5488_0
- - certifi=2019.11.28=py36_0
- - cffi=1.13.2=py36h8022711_0
- - chardet=3.0.4=py36_1003
+ - certifi=2019.11.28=py37_0
+ - cffi=1.13.2=py37h8022711_0
+ - chardet=3.0.4=py37_1003
- configargparse=0.13.0=py_1
- - cryptography=2.8=py36h72c5cf5_1
- - cutadapt=2.8=py36h516909a_0
+ - cryptography=2.8=py37h72c5cf5_1
+ - cutadapt=2.8=py37h516909a_0
- cycler=0.10.0=py_2
- - datrie=0.8=py36h516909a_0
- - dnaio=0.4.1=py36h516909a_0
- - docutils=0.16=py36_0
+ - datrie=0.8=py37h516909a_0
+ - dnaio=0.4.1=py37h516909a_0
+ - docutils=0.16=py37_0
- flash=1.2.11=hed695b0_5
- freetype=2.10.0=he983fc9_1
- gitdb2=2.0.6=py_0
- gitpython=3.0.5=py_0
- icu=64.2=he1b5a44_1
- - idna=2.8=py36_1000
+ - idna=2.8=py37_1000
- igblast=1.10.0=h6ac72b6_1
- - importlib_metadata=1.4.0=py36_0
- - jsonschema=3.2.0=py36_0
- - kiwisolver=1.1.0=py36hc9558a2_0
+ - importlib_metadata=1.4.0=py37_0
+ - jsonschema=3.2.0=py37_0
+ - kiwisolver=1.1.0=py37hc9558a2_0
+ - ld_impl_linux-64=2.33.1=h53a641e_7
- libblas=3.8.0=14_openblas
- libcblas=3.8.0=14_openblas
- libffi=3.2.1=he1b5a44_1006
@@ -44,49 +45,49 @@ dependencies:
- libpng=1.6.37=hed695b0_0
- libstdcxx-ng=9.2.0=hdf63c60_2
- libxml2=2.9.10=hee79883_0
- - matplotlib-base=3.1.2=py36h250f245_1
+ - matplotlib-base=3.1.2=py37h250f245_1
- more-itertools=8.1.0=py_0
- muscle=3.8.1551=hc9558a2_5
- ncurses=6.1=hf484d3e_1002
- nomkl=3.0=0
- - numpy=1.17.5=py36h95a1406_0
+ - numpy=1.17.5=py37h95a1406_0
- openssl=1.1.1d=h516909a_0
- - pandas=0.25.3=py36hb3f55d8_0
+ - pandas=0.25.3=py37hb3f55d8_0
- patsy=0.5.1=py_0
- pear=0.9.6=h98de208_5
- perl=5.26.2=h516909a_1006
- pigz=2.4=h84994c4_0
- - pip=19.3.1=py36_0
- - psutil=5.6.7=py36h516909a_0
- - pycparser=2.19=py36_1
- - pyopenssl=19.1.0=py36_0
+ - pip=19.3.1=py37_0
+ - psutil=5.6.7=py37h516909a_0
+ - pycparser=2.19=py37_1
+ - pyopenssl=19.1.0=py37_0
- pyparsing=2.4.6=py_0
- - pyrsistent=0.15.7=py36h516909a_0
- - pysocks=1.7.1=py36_0
- - python=3.6.7=h357f687_1006
+ - pyrsistent=0.15.7=py37h516909a_0
+ - pysocks=1.7.1=py37_0
+ - python=3.7.6=h357f687_2
- python-dateutil=2.8.1=py_0
- pytz=2019.3=py_0
- - pyyaml=5.3=py36h516909a_0
- - ratelimiter=1.2.0=py36_1000
+ - pyyaml=5.3=py37h516909a_0
+ - ratelimiter=1.2.0=py37_1000
- readline=8.0=hf8c457e_0
- - requests=2.22.0=py36_1
- - ruamel.yaml=0.16.5=py36h516909a_1
- - ruamel.yaml.clib=0.2.0=py36h516909a_0
- - scipy=1.4.1=py36h921218d_0
+ - requests=2.22.0=py37_1
+ - ruamel.yaml=0.16.5=py37h516909a_1
+ - ruamel.yaml.clib=0.2.0=py37h516909a_0
+ - scipy=1.4.1=py37h921218d_0
- seaborn=0.9.0=py_2
- - setuptools=45.1.0=py36_0
- - six=1.14.0=py36_0
+ - setuptools=45.1.0=py37_0
+ - six=1.14.0=py37_0
- smmap2=2.0.5=py_0
- snakemake-minimal=5.9.1=py_0
- sqlite=3.30.1=hcee41ef_0
- - statsmodels=0.10.2=py36hc1659b7_0
- - tinyalign=0.2=py36h516909a_0
+ - statsmodels=0.10.2=py37hc1659b7_0
+ - tinyalign=0.2=py37h516909a_0
- tk=8.6.10=hed695b0_0
- - tornado=6.0.3=py36h516909a_0
- - urllib3=1.25.7=py36_0
- - wheel=0.33.6=py36_0
- - wrapt=1.11.2=py36h516909a_0
- - xopen=0.8.4=py36_0
+ - tornado=6.0.3=py37h516909a_0
+ - urllib3=1.25.7=py37_0
+ - wheel=0.33.6=py37_0
+ - wrapt=1.11.2=py37h516909a_0
+ - xopen=0.8.4=py37_0
- xz=5.2.4=h14c3975_1001
- yaml=0.2.2=h516909a_1
- zipp=1.0.0=py_0
=====================================
environment.yml
=====================================
@@ -4,7 +4,7 @@ channels:
- defaults
dependencies:
- nomkl
- - python=3.6
+ - python=3.7
- seaborn=0.9.*
- snakemake-minimal
- cutadapt>=2.5
=====================================
src/igdiscover/Snakefile
=====================================
@@ -7,7 +7,7 @@ import igdiscover
from igdiscover.dna import reverse_complement
from igdiscover.utils import relative_symlink
from igdiscover.readlenhistogram import read_length_histogram
-from igdiscover.config import Config, GlobalConfig
+from igdiscover.config import Config
try:
=====================================
src/igdiscover/__main__.py
=====================================
@@ -18,6 +18,7 @@ import warnings
import resource
import igdiscover.cli as cli_package
+from igdiscover.cli import CommandLineError
from . import __version__
@@ -78,15 +79,24 @@ def main(arguments=None):
show_cpustats[module.main] = False
args = parser.parse_args(arguments)
- if not hasattr(args, 'func'):
+ do_profiling = args.profile
+ del args.profile
+ subcommand = getattr(args, 'func', None)
+ del args.func
+ if not subcommand:
parser.error('Please provide the name of a subcommand to run')
- elif args.profile:
+ elif do_profiling:
import cProfile as profile
- profile.runctx('args.func(args)', globals(), locals(), filename='igdiscover.prof')
+ to_run = lambda: profile.runctx('subcommand(args)', globals(), locals(), filename='igdiscover.prof')
logger.info('Wrote profiling data to igdiscover.prof')
else:
- args.func(args)
- if sys.platform == 'linux' and show_cpustats.get(args.func, True):
+ to_run = lambda: subcommand(args)
+ try:
+ to_run()
+ except CommandLineError as e:
+ logger.error(e)
+ sys.exit(1)
+ if sys.platform == 'linux' and show_cpustats.get(subcommand, True):
rself = resource.getrusage(resource.RUSAGE_SELF)
rchildren = resource.getrusage(resource.RUSAGE_CHILDREN)
memory_kb = rself.ru_maxrss + rchildren.ru_maxrss
=====================================
src/igdiscover/cli/__init__.py
=====================================
@@ -1 +1,2 @@
-# This module contains all subcommands
+class CommandLineError(Exception):
+ pass
=====================================
src/igdiscover/cli/clonoquery.py
=====================================
@@ -61,6 +61,7 @@ def collect(querytable, reftable, mismatches, cdr3_core_slice, cdr3_column):
with all the rows that have the same result. similar_rows is a DataFrame
whose rows are the ones matching the query.
"""
+
# The vjlentype is a "clonotype without CDR3 sequence" (only V, J, CDR3 length)
# Determine set of vjlentypes to query
query_vjlentypes = defaultdict(list)
@@ -89,7 +90,7 @@ def collect(querytable, reftable, mismatches, cdr3_core_slice, cdr3_column):
for indices, query_rows in results.items():
if not indices:
for query_row in query_rows:
- yield ([query_row], [])
+ yield ([query_row], reftable.head(0))
continue
similar_group = vjlen_group.iloc[list(indices), :]
@@ -98,7 +99,7 @@ def collect(querytable, reftable, mismatches, cdr3_core_slice, cdr3_column):
# Yield result tuples for all the queries that have not been found
for queries in query_vjlentypes.values():
for query_row in queries:
- yield ([query_row], [])
+ yield ([query_row], reftable.head(0))
def main(args):
=====================================
src/igdiscover/cli/clonotypes.py
=====================================
@@ -47,7 +47,8 @@ def add_arguments(parser):
help='Sort by group size (largest first). Default: Sort by V/D/J gene names')
arg('--limit', metavar='N', type=int, default=None,
help='Print out only the first N groups')
- arg('--v-shm-threshold', default=5, type=float, help='V SHM threshold for _mindiffrate computations')
+ arg('--v-shm-threshold', default=5, type=float,
+ help='V SHM threshold for _mindiffrate computations')
arg('--cdr3-core', default=None,
type=slice_arg, metavar='START:END',
help='START:END defines the non-junction region of CDR3 '
@@ -68,24 +69,88 @@ def add_arguments(parser):
arg('table', help='Table with parsed and filtered IgBLAST results')
-def is_similar_with_junction(s, t, mismatches, cdr3_core):
+def main(args):
+ run_clonotypes(**vars(args))
+
+
+def run_clonotypes(
+ table,
+ sort=False,
+ limit=None,
+ v_shm_threshold=5,
+ aa=False,
+ mismatches=1,
+ members=None,
+ cdr3_core=None,
+):
+ logger.info('Reading input table ...')
+ usecols = CLONOTYPE_COLUMNS
+ # TODO backwards compatibility
+ if 'FR1_aa_mut' not in pd.read_csv(table, nrows=0, sep='\t').columns:
+ usecols = [col for col in usecols if not col.endswith('_aa_mut')]
+
+ table = read_table(table, usecols=usecols)
+ table = table[usecols]
+ logger.info('Read table with %s rows', len(table))
+ table.insert(5, 'CDR3_length', table['CDR3_nt'].apply(len))
+ table = table[table['CDR3_length'] > 0]
+ table = table[table['CDR3_aa'].map(lambda s: '*' not in s)]
+ logger.info('After discarding rows with unusable CDR3, %s remain', len(table))
+ with ExitStack() as stack:
+ if members:
+ members_file = stack.enter_context(xopen(members, 'w'))
+ else:
+ members_file = None
+
+ columns = usecols[:]
+ columns.remove('barcode')
+ columns.remove('count')
+ columns.insert(0, 'count')
+ columns.insert(columns.index('CDR3_nt'), 'CDR3_length')
+ print(*columns, sep='\t')
+ print_header = True
+ n = 0
+ cdr3_column = 'CDR3_aa' if aa else 'CDR3_nt'
+ grouped = group_by_clonotype(table, mismatches, sort, cdr3_core, cdr3_column)
+ for group in islice(grouped, 0, limit):
+ group = augment_group(group, v_shm_threshold=v_shm_threshold)
+ if members_file:
+ # We get an intentional empty line between groups since
+ # to_csv() already includes a line break
+ print(group.to_csv(sep='\t', header=print_header, index=False), file=members_file)
+ print_header = False
+ rep = representative(group)
+ print(*[rep[col] for col in columns], sep='\t')
+ n += 1
+ logger.info('%d clonotypes written', n)
+
+
+def group_by_clonotype(table, mismatches, sort, cdr3_core, cdr3_column):
"""
- Return whether strings s and t have at most the given number of mismatches
- *and* have at least one identical junction.
+ Yield clonotype groups. Each item is a DataFrame with all the members of the
+ clonotype.
"""
- # TODO see issue #81
- if len(s) != len(t):
- return False
- if 0 < mismatches < 1:
- delta = cdr3_core.start if cdr3_core is not None else 0
- distance_ok = hamming_distance(s, t) <= (len(s) - delta) * mismatches
- else:
- distance_ok = hamming_distance(s, t) <= mismatches
- if cdr3_core is None:
- return distance_ok
- return distance_ok and (
- (s[:cdr3_core.start] == t[:cdr3_core.start]) or
- (s[cdr3_core.stop:] == t[cdr3_core.stop:]))
+ logger.info('Computing clonotypes ...')
+ prev_v = None
+ groups = []
+ for (v_gene, j_gene, cdr3_length), vj_group in table.groupby(
+ ['V_gene', 'J_gene', 'CDR3_length']):
+ if prev_v != v_gene:
+ logger.info('Processing %s', v_gene)
+ prev_v = v_gene
+ cdr3_groups = group_by_cdr3(vj_group.copy(), mismatches=mismatches, cdr3_core=cdr3_core,
+ cdr3_column=cdr3_column)
+ if sort:
+ # When sorting by group size is requested, we need to buffer
+ # results
+ groups.extend(cdr3_groups)
+ else:
+ yield from cdr3_groups
+
+ if sort:
+ logger.info('Sorting by group size ...')
+ groups.sort(key=len, reverse=True)
+ yield from groups
def group_by_cdr3(table, mismatches, cdr3_core, cdr3_column):
@@ -115,6 +180,26 @@ def group_by_cdr3(table, mismatches, cdr3_core, cdr3_column):
yield group.drop('cluster_id', axis=1)
+def is_similar_with_junction(s, t, mismatches, cdr3_core):
+ """
+ Return whether strings s and t have at most the given number of mismatches
+ *and* have at least one identical junction.
+ """
+ # TODO see issue #81
+ if len(s) != len(t):
+ return False
+ if 0 < mismatches < 1:
+ delta = cdr3_core.start if cdr3_core is not None else 0
+ distance_ok = hamming_distance(s, t) <= (len(s) - delta) * mismatches
+ else:
+ distance_ok = hamming_distance(s, t) <= mismatches
+ if cdr3_core is None:
+ return distance_ok
+ return distance_ok and (
+ (s[:cdr3_core.start] == t[:cdr3_core.start]) or
+ (s[cdr3_core.stop:] == t[cdr3_core.stop:]))
+
+
def representative(table):
"""
Given a table with members of the same clonotype, return a representative
@@ -129,34 +214,6 @@ def representative(table):
return result
-def group_by_clonotype(table, mismatches, sort, cdr3_core, cdr3_column):
- """
- Yield clonotype groups. Each item is a DataFrame with all the members of the
- clonotype.
- """
- logger.info('Computing clonotypes ...')
- prev_v = None
- groups = []
- for (v_gene, j_gene, cdr3_length), vj_group in table.groupby(
- ['V_gene', 'J_gene', 'CDR3_length']):
- if prev_v != v_gene:
- logger.info('Processing %s', v_gene)
- prev_v = v_gene
- cdr3_groups = group_by_cdr3(vj_group.copy(), mismatches=mismatches, cdr3_core=cdr3_core,
- cdr3_column=cdr3_column)
- if sort:
- # When sorting by group size is requested, we need to buffer
- # results
- groups.extend(cdr3_groups)
- else:
- yield from cdr3_groups
-
- if sort:
- logger.info('Sorting by group size ...')
- groups.sort(key=len, reverse=True)
- yield from groups
-
-
def augment_group(table, v_shm_threshold=5, suffix='_mindiffrate'):
"""
Add columns to the given table that contain percentage difference of VDJ_nt, VDJ_aa, CDR3_nt,
@@ -167,9 +224,11 @@ def augment_group(table, v_shm_threshold=5, suffix='_mindiffrate'):
for column in columns[::-1]:
table.insert(i, column + suffix, None)
+ if table.empty:
+ return table
+
# Find row whose V is least mutated
root = table.loc[table['V_SHM'].idxmin()]
- import ipdb; ipdb.set_trace()
if root['V_SHM'] > v_shm_threshold:
return table
@@ -181,46 +240,3 @@ def augment_group(table, v_shm_threshold=5, suffix='_mindiffrate'):
]
return table
-
-
-def main(args):
- logger.info('Reading input table ...')
- usecols = CLONOTYPE_COLUMNS
- # TODO backwards compatibility
- if 'FR1_aa_mut' not in pd.read_csv(args.table, nrows=0, sep='\t').columns:
- usecols = [col for col in usecols if not col.endswith('_aa_mut')]
-
- table = read_table(args.table, usecols=usecols)
- table = table[usecols]
- logger.info('Read table with %s rows', len(table))
- table.insert(5, 'CDR3_length', table['CDR3_nt'].apply(len))
- table = table[table['CDR3_length'] > 0]
- table = table[table['CDR3_aa'].map(lambda s: '*' not in s)]
- logger.info('After discarding rows with unusable CDR3, %s remain', len(table))
- with ExitStack() as stack:
- if args.members:
- members_file = stack.enter_context(xopen(args.members, 'w'))
- else:
- members_file = None
-
- columns = usecols[:]
- columns.remove('barcode')
- columns.remove('count')
- columns.insert(0, 'count')
- columns.insert(columns.index('CDR3_nt'), 'CDR3_length')
- print(*columns, sep='\t')
- print_header = True
- n = 0
- cdr3_column = 'CDR3_aa' if args.aa else 'CDR3_nt'
- grouped = group_by_clonotype(table, args.mismatches, args.sort, args.cdr3_core, cdr3_column)
- for group in islice(grouped, 0, args.limit):
- group = augment_group(group, v_shm_threshold=args.v_shm_threshold)
- if members_file:
- # We get an intentional empty line between groups since
- # to_csv() already includes a line break
- print(group.to_csv(sep='\t', header=print_header, index=False), file=members_file)
- print_header = False
- rep = representative(group)
- print(*[rep[col] for col in columns], sep='\t')
- n += 1
- logger.info('%d clonotypes written', n)
=====================================
src/igdiscover/cli/config.py
=====================================
@@ -26,22 +26,35 @@ def add_arguments(parser):
def main(args):
if args.set:
- with open(args.file) as f:
- config = ruamel.yaml.load(f, ruamel.yaml.RoundTripLoader)
- for k, v in args.set:
- v = ruamel.yaml.safe_load(v)
- # config[k] = v
- item = config
- # allow nested keys
- keys = k.split('.')
- for i in keys[:-1]:
- item = item[i]
- item[keys[-1]] = v
- tmpfile = args.file + '.tmp'
- with open(tmpfile, 'w') as f:
- print(ruamel.yaml.dump(config, Dumper=ruamel.yaml.RoundTripDumper), end='', file=f)
- os.rename(tmpfile, args.file)
+ modify_configuration(args.set, args.file)
else:
- with open(args.file) as f:
- config = ruamel.yaml.safe_load(f)
- print(ruamel.yaml.dump(config), end='')
+ print_configuration(args.file)
+
+
+def modify_configuration(
+ settings,
+ path=Config.DEFAULT_PATH,
+):
+ with open(path) as f:
+ config = ruamel.yaml.load(f, ruamel.yaml.RoundTripLoader)
+ for k, v in settings:
+ if not isinstance(k, str) or not isinstance(v, str):
+ raise ValueError("key and value must both be strings")
+ v = ruamel.yaml.safe_load(v)
+ # config[k] = v
+ item = config
+ # allow nested keys
+ keys = k.split('.')
+ for i in keys[:-1]:
+ item = item[i]
+ item[keys[-1]] = v
+ tmpfile = path + '.tmp'
+ with open(tmpfile, 'w') as f:
+ print(ruamel.yaml.dump(config, Dumper=ruamel.yaml.RoundTripDumper), end='', file=f)
+ os.rename(tmpfile, path)
+
+
+def print_configuration(path=Config.DEFAULT_PATH):
+ with open(path) as f:
+ config = ruamel.yaml.safe_load(f)
+ print(ruamel.yaml.dump(config), end='')
=====================================
src/igdiscover/cli/init.py
=====================================
@@ -10,6 +10,9 @@ import sys
import subprocess
import pkg_resources
import dnaio
+from xopen import xopen
+
+from . import CommandLineError
from ..config import Config
try:
@@ -19,13 +22,16 @@ try:
except ImportError:
tk = None
-from xopen import xopen
logger = logging.getLogger(__name__)
do_not_show_cpustats = 1
+class GuiCancelledError(CommandLineError):
+ pass
+
+
def add_arguments(parser):
parser.add_argument('--database', '--db', metavar='PATH', default=None,
help='Directory with V.fasta, D.fasta and J.fasta files. If not given, a dialog is shown.')
@@ -169,8 +175,7 @@ def try_open(path):
with open(path) as f:
pass
except OSError as e:
- logger.error('Could not open %r: %s', path, e)
- sys.exit(1)
+ raise CommandLineError(f'Could not open {path!r}: {e}')
def read_and_repair_fasta(path):
@@ -210,28 +215,34 @@ def read_and_repair_fasta(path):
def main(args):
- if ' ' in args.directory:
- logger.error('The name of the analysis directory must not contain spaces')
- sys.exit(1)
+ run_init(**vars(args))
+
+
+def run_init(
+ directory,
+ database: str,
+ reads1=None,
+ single_reads=None,
+):
+ if ' ' in directory:
+ raise CommandLineError('The name of the analysis directory must not contain spaces')
- if os.path.exists(args.directory):
- logger.error('The target directory {!r} already exists.'.format(args.directory))
- sys.exit(1)
+ if os.path.exists(directory):
+ raise CommandLineError(f'The target directory {directory!r} already exists.')
# If reads files or database were not given, initialize the GUI
- if (args.reads1 is None and args.single_reads is None) or args.database is None:
+ if (reads1 is None and single_reads is None) or database is None:
try:
gui = TkinterGui()
except ImportError: # TODO tk.TclError cannot be caught when import of tk fails
- logger.error('GUI cannot be started. Please provide reads1 file '
+ raise CommandLineError('GUI cannot be started. Please provide reads1 file '
'and database directory on command line.')
- sys.exit(1)
else:
gui = None
# Find out whether data is paired or single
- assert not (args.reads1 and args.single_reads)
- if args.reads1 is None and args.single_reads is None:
+ assert not (reads1 and single_reads)
+ if reads1 is None and single_reads is None:
paired = gui.yesno('Paired end or single-end reads',
'Are your reads paired and need to be merged?\n\n'
'If you answer "Yes", next select the FASTQ files '
@@ -239,64 +250,56 @@ def main(args):
'If you answer "No", next select the FASTA or FASTQ '
'file with single-end reads.')
if paired is None:
- logger.error('Cancelled')
- sys.exit(2)
+ raise GuiCancelledError()
else:
- paired = bool(args.reads1)
+ paired = bool(reads1)
# Assign reads1 and (if paired) also reads2
if paired:
- if args.reads1 is not None:
- reads1 = args.reads1
+ if reads1 is not None:
try_open(reads1)
else:
reads1 = gui.reads1_path()
if not reads1:
- logger.error('Cancelled')
- sys.exit(2)
+ raise GuiCancelledError()
reads2 = guess_paired_path(reads1)
if reads2 is None:
- logger.error('Could not determine second file of paired-end reads')
- sys.exit(1)
+ raise CommandLineError('Could not determine second file of paired-end reads')
else:
- if args.single_reads is not None:
- reads1 = args.single_reads
+ if single_reads is not None:
+ reads1 = single_reads
try_open(reads1)
else:
reads1 = gui.single_reads_path()
if not reads1:
- logger.error('Cancelled')
- sys.exit(2)
+ raise GuiCancelledError()
- if args.database is not None:
- dbpath = args.database
+ if database is not None:
+ dbpath = database
else:
# TODO as soon as we distribute our own database files, we can use this:
# database_path = pkg_resources.resource_filename('igdiscover', 'databases')
databases_path = None
dbpath = gui.database_path(databases_path)
if not dbpath:
- logger.error('Cancelled')
- sys.exit(2)
+ raise GuiCancelledError()
database = dict()
for g in ['V', 'D', 'J']:
path = os.path.join(dbpath, g + '.fasta')
if not os.path.exists(path):
- logger.error(
- 'The database directory %r must contain the three files '
- 'V.fasta, D.fasta and J.fasta', dbpath)
- logger.error(
- 'A dummy D.fasta is necessary even if analyzing light chains (see manual)')
- sys.exit(2)
+ raise CommandLineError(
+ f'The database directory {dbpath!r} must contain the three files '
+ 'V.fasta, D.fasta and J.fasta. A dummy D.fasta is necessary even '
+ 'if analyzing light chains (see manual)'
+ )
database[g] = list(read_and_repair_fasta(path))
# Create the directory
try:
- os.mkdir(args.directory)
+ os.mkdir(directory)
except OSError as e:
- logger.error(e)
- sys.exit(1)
+ raise CommandLineError(e)
def create_symlink(readspath, dirname, target):
gz = '.gz' if readspath.endswith('.gz') else ''
@@ -307,23 +310,22 @@ def main(args):
os.symlink(src, os.path.join(dirname, target + gz))
if paired:
- create_symlink(reads1, args.directory, 'reads.1.fastq')
- create_symlink(reads2, args.directory, 'reads.2.fastq')
+ create_symlink(reads1, directory, 'reads.1.fastq')
+ create_symlink(reads2, directory, 'reads.2.fastq')
else:
try:
target = 'reads.' + file_type(reads1)
except UnknownFileFormatError:
- logger.error('Cannot determine whether reads file is FASTA or FASTQ')
- sys.exit(1)
- create_symlink(reads1, args.directory, target)
+ raise CommandLineError('Cannot determine whether reads file is FASTA or FASTQ')
+ create_symlink(reads1, directory, target)
# Write the configuration file
configuration = pkg_resources.resource_string('igdiscover', Config.DEFAULT_PATH).decode()
- with open(os.path.join(args.directory, Config.DEFAULT_PATH), 'w') as f:
+ with open(os.path.join(directory, Config.DEFAULT_PATH), 'w') as f:
f.write(configuration)
# Create database files
- database_dir = os.path.join(args.directory, 'database')
+ database_dir = os.path.join(directory, 'database')
os.mkdir(database_dir)
for gene in ['V', 'D', 'J']:
with open(os.path.join(database_dir, gene + '.fasta'), 'w') as db_file:
@@ -334,6 +336,7 @@ def main(args):
# Only suggest to edit the config file if at least one GUI dialog has been shown
if gui.yesno('Directory initialized',
'Do you want to edit the configuration file now?'):
- launch(os.path.join(args.directory, Config.DEFAULT_PATH))
- logger.info('Directory %s initialized.', args.directory)
- logger.info('Edit %s/%s, then run "cd %s && igdiscover run" to start the analysis', args.directory, Config.DEFAULT_PATH, args.directory)
+ launch(os.path.join(directory, Config.DEFAULT_PATH))
+ logger.info('Directory %s initialized.', directory)
+ logger.info('Edit %s/%s, then run "cd %s && igdiscover run" to start the analysis',
+ directory, Config.DEFAULT_PATH, directory)
=====================================
src/igdiscover/cli/run.py
=====================================
@@ -10,10 +10,11 @@ import platform
import pkg_resources
from snakemake import snakemake
+from .config import Config
+from . import CommandLineError
from ..utils import available_cpu_count
from .. import __version__
-from .config import Config
logger = logging.getLogger(__name__)
@@ -32,6 +33,15 @@ def add_arguments(parser):
def main(args):
+ run_snakemake(**vars(args))
+
+
+def run_snakemake(
+ dryrun=False,
+ cores=available_cpu_count(),
+ keepgoing=False,
+ targets=None,
+):
try:
config = Config.from_default_path()
except FileNotFoundError as e:
@@ -46,26 +56,30 @@ def main(args):
print(' ', k, ': ', repr(v), sep='')
sys.stdout.flush()
- # snakemake sets up its own logging and this cannot be easily changed
- # (setting keep_logger=True crashes), so remove our own log handler
- # for now
- logger.root.handlers = []
+ old_root_handlers = logger.root.handlers
+ root = logging.getLogger()
+ root.handlers = []
+ file_handler = logging.FileHandler("log.txt")
+ root.addHandler(file_handler)
+
snakefile_path = pkg_resources.resource_filename('igdiscover', 'Snakefile')
success = snakemake(
snakefile_path,
snakemakepath='snakemake', # Needed in snakemake 3.9.0
- dryrun=args.dryrun,
- cores=args.cores,
- keepgoing=args.keepgoing,
+ dryrun=dryrun,
+ cores=cores,
+ keepgoing=keepgoing,
printshellcmds=True,
- targets=args.targets if args.targets else None,
+ targets=targets,
)
+ logger.root.handlers = old_root_handlers
- if sys.platform == 'linux' and not args.dryrun:
+ if sys.platform == 'linux' and not dryrun:
cputime = resource.getrusage(resource.RUSAGE_SELF).ru_utime
cputime += resource.getrusage(resource.RUSAGE_CHILDREN).ru_utime
h = int(cputime // 3600)
m = (cputime - h * 3600) / 60
print('Total CPU time: {}h {:.2f}m'.format(h, m))
- sys.exit(0 if success else 1)
+ if not success:
+ raise CommandLineError()
=====================================
src/igdiscover/readlenhistogram.py
=====================================
@@ -1,6 +1,7 @@
from collections import Counter
import dnaio
-
+import matplotlib
+matplotlib.use('Agg')
import matplotlib.pyplot as plt
import numpy as np
=====================================
tests/data/H1/README.md deleted
=====================================
@@ -1,4 +0,0 @@
-These are partial results from ERR1760498.
-
-V.fasta -- V gene sequences that actually appear in the `candidates.tab` (not
-the full V starting database).
=====================================
tests/data/H1/V.fasta deleted
=====================================
@@ -1,246 +0,0 @@
->IGHV1-18*01
-CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-18*03
-CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACATGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-18*04
-CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTACGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-2*02
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAACAGTGGTGGCACAAACTATGCACAGAAGTTTCAGGGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-2*04
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAACAGTGGTGGCACAAACTATGCACAGAAGTTTCAGGGCTGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-24*01
-CAGGTCCAGCTGGTACAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGTTTCCGGATACACCCTCACTGAATTATCCATGCACTGGGTGCGACAGGCTCCTGGAAAAGGGCTTGAGTGGATGGGAGGTTTTGATCCTGAAGATGGTGAAACAATCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCGAGGACACATCTACAGACACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCAACAGA
->IGHV1-3*01
-CAGGTCCAGCTTGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATGCTATGCATTGGGTGCGCCAGGCCCCCGGACAAAGGCTTGAGTGGATGGGATGGATCAACGCTGGCAATGGTAACACAAAATATTCACAGAAGTTCCAGGGCAGAGTCACCATTACCAGGGACACATCCGCGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAAGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV1-3*02
-CAGGTTCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATGCTATGCATTGGGTGCGCCAGGCCCCCGGACAAAGGCTTGAGTGGATGGGATGGAGCAACGCTGGCAATGGTAACACAAAATATTCACAGGAGTTCCAGGGCAGAGTCACCATTACCAGGGACACATCCGCGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACATGGCTGTGTATTACTGTGCGAGAGA
->IGHV1-45*02
-CAGATGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGACTGGGTCCTCAGTGAAGGTTTCCTGCAAGGCTTCCGGATACACCTTCACCTACCGCTACCTGCACTGGGTGCGACAGGCCCCCGGACAAGCGCTTGAGTGGATGGGATGGATCACACCTTTCAATGGTAACACCAACTACGCACAGAAATTCCAGGACAGAGTCACCATTACCAGGGACAGGTCTATGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACAGCCATGTATTACTGTGCAAGATA
->IGHV1-46*01
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCATCTGGATACACCTTCACCAGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAATAATCAACCCTAGTGGTGGTAGCACAAGCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGGACACGTCCACGAGCACAGTCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-46*02
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCATCTGGATACACCTTCAACAGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAATAATCAACCCTAGTGGTGGTAGCACAAGCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGGACACGTCCACGAGCACAGTCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-46*03
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCATCTGGATACACCTTCACCAGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAATAATCAACCCTAGTGGTGGTAGCACAAGCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGGACACGTCCACGAGCACAGTCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCTAGAGA
->IGHV1-58*01
-CAAATGCAGCTGGTGCAGTCTGGGCCTGAGGTGAAGAAGCCTGGGACCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATTCACCTTTACTAGCTCTGCTGTGCAGTGGGTGCGACAGGCTCGTGGACAACGCCTTGAGTGGATAGGATGGATCGTCGTTGGCAGTGGTAACACAAACTACGCACAGAAGTTCCAGGAAAGAGTCACCATTACCAGGGACATGTCCACAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCCGAGGACACGGCCGTGTATTACTGTGCGGCAGA
->IGHV1-69*01
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACGAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-69*06
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACAAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-69*12
-CAGGTCCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACGAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-69*13
-CAGGTCCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACGAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-69*14
-CAGGTCCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACAAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV1-69-2*01
-GAGGTCCAGCTGGTACAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCTACAGTGAAAATCTCCTGCAAGGTTTCTGGATACACCTTCACCGACTACTACATGCACTGGGTGCAACAGGCCCCTGGAAAAGGGCTTGAGTGGATGGGACTTGTTGATCCTGAAGATGGTGAAACAATATACGCAGAGAAGTTCCAGGGCAGAGTCACCATAACCGCGGACACGTCTACAGACACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCAACAGA
->IGHV1-8*01
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGTTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAACAGTGGTAACACAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGG
->IGHV1-8*02
-CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGCTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAACAGTGGTAACACAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGG
->IGHV2-26*01
-CAGGTCACCTTGAAGGAGTCTGGTCCTGTGCTGGTGAAACCCACAGAGACCCTCACGCTGACCTGCACCGTCTCTGGGTTCTCACTCAGCAATGCTAGAATGGGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACACATTTTTTCGAATGACGAAAAATCCTACAGCACATCTCTGAAGAGCAGGCTCACCATCTCCAAGGACACCTCCAAAAGCCAGGTGGTCCTTACCATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACGGATAC
->IGHV2-5*01
-CAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGAAACCCACACAGACCCTCACGCTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTGCACTCATTTATTGGAATGATGATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACACAGAC
->IGHV2-5*02
-CAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGAAACCCACACAGACCCTCACGCTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTGCACTCATTTATTGGGATGATGATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACACAGAC
->IGHV2-5*04
-CAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGAAACCCACACAGACCCTCACGCTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTGCACTCATTTATTGGAATGATGATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGGCACATATTACTGTGTAC
->IGHV2-70*01
-CAGGTCACCTTGAGGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGTGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACTCATTGATTGGGATGATGATAAATACTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
->IGHV2-70*10
-CAGGTCACCTTGAAGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGCGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGATTGCACGCATTGATTGGGATGATGATAAATACTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
->IGHV2-70*11
-CGGGTCACCTTGAGGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGTGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACGCATTGATTGGGATGATGATAAATACTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
->IGHV2-70*13
-CAGGTCACCTTGAGGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGTGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACTCATTGATTGGGATGATGATAAATACTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTATTGTGCACGGATAC
->IGHV2-70D*04
-CAGGTCACCTTGAAGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGCGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACGCATTGATTGGGATGATGATAAATTCTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
->IGHV2-70D*14
-CAGGTCACCTTGAAGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGCGTGTGAGCTGGATCCGTCAGCCCCCAGGTAAGGCCCTGGAGTGGCTTGCACGCATTGATTGGGATGATGATAAATTCTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
->IGHV3-11*01
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-11*04
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-11*06
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTTACACAAACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-13*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACGACATGCACTGGGTCCGCCAAGCTACAGGAAAAGGTCTGGAGTGGGTCTCAGCTATTGGTACTGCTGGTGACACATACTATCCAGGCTCCGTGAAGGGCCGATTCACCATCTCCAGAGAAAATGCCAAGAACTCCTTGTATCTTCAAATGAACAGCCTGAGAGCCGGGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV3-13*05
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACGACATGCACTGGGTCCGCCAAGCTACAGGAAAAGGTCTGGAGTGGGTCTCAGCTATTGGTACTGCTGGTGACCCATACTATCCAGGCTCCGTGAAGGGCCGATTCACCATCTCCAGAGAAAATGCCAAGAACTCCTTGTATCTTCAAATGAACAGCCTGAGAGCCGGGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV3-15*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-15*02
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGCCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-15*04
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTATTGAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-15*05
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGTCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-15*06
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAAACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-15*07
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGTTTCACTTTCAGTAACGCCTGGATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
->IGHV3-20*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGTGTGGTACGGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGGCATGAGCTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGAGTGGGTCTCTGGTATTAATTGGAATGGTGGTAGCACAGGTTATGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCCTTGTATCACTGTGCGAGAGA
->IGHV3-21*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-21*02
-GAGGTGCAACTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-21*03
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACAGCTGTGTATTACTGTGCGAGAGA
->IGHV3-21*04
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-23*01
-GAGGTGCAGCTGTTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGCAGCTATGCCATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGCTATTAGTGGTAGTGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGGTTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTATATTACTGTGCGAAAGA
->IGHV3-23*04
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGCAGCTATGCCATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGCTATTAGTGGTAGTGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGGTTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTATATTACTGTGCGAAAGA
->IGHV3-23*05
-GAGGTGCAGCTGTTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGCAGCTATGCCATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGCTATTTATAGCAGTGGTAGTAGCACATACTATGCAGACTCCGTGAAGGGCCGGTTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTATATTACTGTGCGAAA
->IGHV3-30*02
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCGTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCATTTATACGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAAAGA
->IGHV3-30*03
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-30*04
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGCTATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-30*18
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAAAGA
->IGHV3-30-3*01
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGCTATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGCAATAAATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-33*01
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCGTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-33*03
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCGTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAACTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAAAGA
->IGHV3-33*06
-CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCGTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAAAGA
->IGHV3-43*01
-GAAGTGCAGCTGGTGGAGTCTGGGGGAGTCGTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATACCATGCACTGGGTCCGTCAAGCTCCGGGGAAGGGTCTGGAGTGGGTCTCTCTTATTAGTTGGGATGGTGGTAGCACATACTATGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACAGCAAAAACTCCCTGTATCTGCAAATGAACAGTCTGAGAACTGAGGACACCGCCTTGTATTACTGTGCAAAAGATA
->IGHV3-43D*01
-GAAGTGCAGCTGGTGGAGTCTGGGGGAGTCGTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGCCATGCACTGGGTCCGTCAAGCTCCGGGGAAGGGTCTGGAGTGGGTCTCTCTTATTAGTTGGGATGGTGGTAGCACCTACTATGCAGACTCTGTGAAGGGTCGATTCACCATCTCCAGAGACAACAGCAAAAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCTGAGGACACCGCCTTGTATTACTGTGCAAAAGATA
->IGHV3-48*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAATGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-48*02
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAATGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGACGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-48*03
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGTTATGAAATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTTTATTACTGTGCGAGAGA
->IGHV3-48*04
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-49*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACACCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGGTTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
->IGHV3-49*03
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
->IGHV3-49*04
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
->IGHV3-49*05
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
->IGHV3-53*01
-GAGGTGCAGCTGGTGGAGTCTGGAGGAGGCTTGATCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGGTTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-53*02
-GAGGTGCAGCTGGTGGAGACTGGAGGAGGCTTGATCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGGTTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-53*04
-GAGGTGCAGCTGGTGGAGTCTGGAGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGGTTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGACACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-64*02
-GAGGTGCAGCTGGTGGAGTCTGGGGAAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGCTATGCACTGGGTCCGCCAGGCTCCAGGGAAGGGACTGGAATATGTTTCAGCTATTAGTAGTAATGGGGGTAGCACATATTATGCAGACTCTGTGAAGGGCAGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGGGCAGCCTGAGAGCTGAGGACATGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-66*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCAGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-66*03
-GAGGTGCAGCTGGTGGAGTCTGGAGGAGGCTTGATCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGGTTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCTGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-7*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGTAGCTATTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTGGCCAACATAAAGCAAGATGGAAGTGAGAAATACTATGTGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV3-7*02
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGTAGCTATTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAAGGGCTGGAGTGGGTGGCCAACATAAAGCAAGATGGAAGTGAGAAATACTATGTGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGA
->IGHV3-7*03
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGTAGCTATTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTGGCCAACATAAAGCAAGATGGAAGTGAGAAATACTATGTGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV3-72*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACCACTACATGGACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTACTAGAAACAAAGCTAACAGTTACACCACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAGAACTCACTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACGGCCGTGTATTACTGTGCTAGAGA
->IGHV3-73*01
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGGTTCACCTTCAGTGGCTCTGCTATGCACTGGGTCCGCCAGGCTTCCGGGAAAGGGCTGGAGTGGGTTGGCCGTATTAGAAGCAAAGCTAACAGTTACGCGACAGCATATGCTGCGTCGGTGAAAGGCAGGTTCACCATCTCCAGAGATGATTCAAAGAACACGGCGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACGGCCGTGTATTACTGTACTAGACA
->IGHV3-73*02
-GAGGTGCAGCTGGTGGAGTCCGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGGTTCACCTTCAGTGGCTCTGCTATGCACTGGGTCCGCCAGGCTTCCGGGAAAGGGCTGGAGTGGGTTGGCCGTATTAGAAGCAAAGCTAACAGTTACGCGACAGCATATGCTGCGTCGGTGAAAGGCAGGTTCACCATCTCCAGAGATGATTCAAAGAACACGGCGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACGGCCGTGTATTACTGTACTAGACA
->IGHV3-74*01
-GAGGTGCAGCTGGTGGAGTCCGGGGGAGGCTTAGTTCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACTGGATGCACTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGTGTGGGTCTCACGTATTAATAGTGATGGGAGTAGCACAAGCTACGCGGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACACGCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV3-74*02
-GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTAGTTCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACTGGATGCACTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGTGTGGGTCTCACGTATTAATAGTGATGGGAGTAGCACAAGCTACGCGGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACACGCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCAAGA
->IGHV3-74*03
-GAGGTGCAGCTGGTGGAGTCCGGGGGAGGCTTAGTTCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACTGGATGCACTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGTGTGGGTCTCACGTATTAATAGTGATGGGAGTAGCACAACGTACGCGGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACACGCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV3-9*01
-GAAGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGCAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGCCATGCACTGGGTCCGGCAAGCTCCAGGGAAGGGCCTGGAGTGGGTCTCAGGTATTAGTTGGAATAGTGGTAGCATAGGCTATGCGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCTGAGGACACGGCCTTGTATTACTGTGCAAAAGATA
->IGHV3-9*03
-GAAGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGCAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGCCATGCACTGGGTCCGGCAAGCTCCAGGGAAGGGCCTGGAGTGGGTCTCAGGTATTAGTTGGAATAGTGGTAGCATAGGCTATGCGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCTGAGGACATGGCCTTGTATTACTGTGCAAAAGATA
->IGHV4-28*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGACACCCTGTCCCTCACCTGCGCTGTCTCTGGTTACTCCATCAGCAGTAGTAACTGGTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTACATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGTGGACACGGCCGTGTATTACTGTGCGAGAAA
->IGHV4-28*03
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGACACCCTGTCCCTCACCTGCGCTGTCTCTGGTTACTCCATCAGCAGTAGTAACTGGTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTACATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGTGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-31*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTGGTGGTTACTACTGGAGCTGGATCCGCCAGCACCCAGGGAAGGGCCTGGAGTGGATTGGGTACATCTATTACAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCTAGTTACCATATCAGTAGACACGTCTAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACTGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-31*03
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTGGTGGTTACTACTGGAGCTGGATCCGCCAGCACCCAGGGAAGGGCCTGGAGTGGATTGGGTACATCTATTACAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTTACCATATCAGTAGACACGTCTAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACTGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-34*01
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCGAGAGG
->IGHV4-34*02
-CAGGTGCAGCTACAACAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCGAGAGG
->IGHV4-34*04
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACAACAACCCGTCCCTCAAGAGTCGAGCCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCGAGAGG
->IGHV4-34*08
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGACCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCG
->IGHV4-34*10
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAATCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTACCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGATA
->IGHV4-34*11
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCGTCAGTGGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGTATATCTATTATAGTGGGAGCACCAACAACAACCCCTCCCTCAAGAGTCGAGCCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAACCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTGCTGTGCGAGAGA
->IGHV4-34*12
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCATTCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCGAGA
->IGHV4-38-2*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTCTGGTTACTCCATCAGCAGTGGTTACTACTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATCATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGA
->IGHV4-38-2*02
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTTACTCCATCAGCAGTGGTTACTACTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATCATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-39*01
-CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACTGTGCGAGACA
->IGHV4-39*02
-CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCACTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACTGTGCGAGAGA
->IGHV4-39*06
-CGGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCCCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-39*07
-CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-4*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTCCGGGGACCCTGTCCCTCACCTGCGCTGTCTCTGGTGGCTCCATCAGCAGTAGTAACTGGTGGAGTTGGGTCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCTATCATAGTGGGAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACAAGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTGCTGTGCGAGAGA
->IGHV4-4*02
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGGGACCCTGTCCCTCACCTGCGCTGTCTCTGGTGGCTCCATCAGCAGTAGTAACTGGTGGAGTTGGGTCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCTATCATAGTGGGAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACAAGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-4*07
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-59*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-59*02
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCGTCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-59*03
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAATTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCG
->IGHV4-59*05
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCGCCGGGGAAGGGACTGGAGTGGATTGGGCGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACTGTGCG
->IGHV4-59*07
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGACACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGA
->IGHV4-59*08
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGACA
->IGHV4-59*10
-CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGGCTGGAGTGGATTGGGCGTATCTATACCAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGATA
->IGHV4-61*01
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCGTCAGCAGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-61*02
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-61*03
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCGTCAGCAGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCACTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV4-61*05
-CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACAAGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGA
->IGHV4-61*08
-CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCGTCAGCAGTGGTGGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
->IGHV5-10-1*03
-GAAGTGCAGCTGGTGCAGTCCGGAGCAGAGGTGAAAAAGCCCGGGGAGTCTCTGAGGATCTCCTGTAAGGGTTCTGGATACAGCTTTACCAGCTACTGGATCAGCTGGGTGCGCCAGATGCCCGGGAAAGGCCTGGAGTGGATGGGGAGGATTGATCCTAGTGACTCTTATACCAACTACAGCCCGTCCTTCCAAGGCCACGTCACCATCTCAGCTGACAAGTCCATCAGCACTGCCTACCTGCAGTGGAGCAGCCTGAAGGCCTCGGACACCGCCATGTATTACTGTGCGAGA
->IGHV5-51*01
-GAGGTGCAGCTGGTGCAGTCTGGAGCAGAGGTGAAAAAGCCCGGGGAGTCTCTGAAGATCTCCTGTAAGGGTTCTGGATACAGCTTTACCAGCTACTGGATCGGCTGGGTGCGCCAGATGCCCGGGAAAGGCCTGGAGTGGATGGGGATCATCTATCCTGGTGACTCTGATACCAGATACAGCCCGTCCTTCCAAGGCCAGGTCACCATCTCAGCCGACAAGTCCATCAGCACCGCCTACCTGCAGTGGAGCAGCCTGAAGGCCTCGGACACCGCCATGTATTACTGTGCGAGACA
->IGHV5-51*03
-GAGGTGCAGCTGGTGCAGTCTGGAGCAGAGGTGAAAAAGCCGGGGGAGTCTCTGAAGATCTCCTGTAAGGGTTCTGGATACAGCTTTACCAGCTACTGGATCGGCTGGGTGCGCCAGATGCCCGGGAAAGGCCTGGAGTGGATGGGGATCATCTATCCTGGTGACTCTGATACCAGATACAGCCCGTCCTTCCAAGGCCAGGTCACCATCTCAGCCGACAAGTCCATCAGCACCGCCTACCTGCAGTGGAGCAGCCTGAAGGCCTCGGACACCGCCATGTATTACTGTGCGAGA
->IGHV6-1*01
-CAGGTACAGCTGCAGCAGTCAGGTCCAGGACTGGTGAAGCCCTCGCAGACCCTCTCACTCACCTGTGCCATCTCCGGGGACAGTGTCTCTAGCAACAGTGCTGCTTGGAACTGGATCAGGCAGTCCCCATCGAGAGGCCTTGAGTGGCTGGGAAGGACATACTACAGGTCCAAGTGGTATAATGATTATGCAGTATCTGTGAAAAGTCGAATAACCATCAACCCAGACACATCCAAGAACCAGTTCTCCCTGCAGCTGAACTCTGTGACTCCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV6-1*02
-CAGGTACAGCTGCAGCAGTCAGGTCCGGGACTGGTGAAGCCCTCGCAGACCCTCTCACTCACCTGTGCCATCTCCGGGGACAGTGTCTCTAGCAACAGTGCTGCTTGGAACTGGATCAGGCAGTCCCCATCGAGAGGCCTTGAGTGGCTGGGAAGGACATACTACAGGTCCAAGTGGTATAATGATTATGCAGTATCTGTGAAAAGTCGAATAACCATCAACCCAGACACATCCAAGAACCAGTTCTCCCTGCAGCTGAACTCTGTGACTCCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
->IGHV7-4-1*01
-CAGGTGCAGCTGGTGCAATCTGGGTCTGAGTTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATGCTATGAATTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACACCAACACTGGGAACCCAACGTATGCCCAGGGCTTCACAGGACGGTTTGTCTTCTCCTTGGACACCTCTGTCAGCACGGCATATCTGCAGATCTGCAGCCTAAAGGCTGAGGACACTGCCGTGTATTACTGTGCGAGA
=====================================
tests/data/H1/candidates.tab.gz deleted
=====================================
Binary files a/tests/data/H1/candidates.tab.gz and /dev/null differ
=====================================
tests/data/H1/expected.tab deleted
=====================================
@@ -1,57 +0,0 @@
-name source chain cluster cluster_size Js CDR3s exact Js_exact CDR3s_exact CDR3_exact_ratio database_diff has_stop looks_like_V CDR3_start whitelist_diff closest_whitelist consensus
-IGHV1-18*01 IGHV1-18*01 VH all 38082 12 9605 18063 10 4649 3.9 0 0 1 288 0 IGHV1-18*01 CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-2*02 IGHV1-2*02 VH all 18707 13 4522 8398 10 2133 3.9 0 0 1 288 0 IGHV1-2*02 CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAACAGTGGTGGCACAAACTATGCACAGAAGTTTCAGGGCAGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-2*04 IGHV1-2*04 VH all 3081 9 731 1015 5 277 3.7 0 0 1 288 0 IGHV1-2*04 CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACCCTAACAGTGGTGGCACAAACTATGCACAGAAGTTTCAGGGCTGGGTCACCATGACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-24*01 IGHV1-24*01 VH all 1303 7 341 735 7 222 3.3 0 0 1 288 0 IGHV1-24*01 CAGGTCCAGCTGGTACAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGTTTCCGGATACACCCTCACTGAATTATCCATGCACTGGGTGCGACAGGCTCCTGGAAAAGGGCTTGAGTGGATGGGAGGTTTTGATCCTGAAGATGGTGAAACAATCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCGAGGACACATCTACAGACACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCAACAGA
-IGHV1-3*01 IGHV1-3*01 VH all 14794 9 3281 6096 7 1431 4.3 0 0 1 288 0 IGHV1-3*01 CAGGTCCAGCTTGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATGCTATGCATTGGGTGCGCCAGGCCCCCGGACAAAGGCTTGAGTGGATGGGATGGATCAACGCTGGCAATGGTAACACAAAATATTCACAGAAGTTCCAGGGCAGAGTCACCATTACCAGGGACACATCCGCGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAAGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV1-46*01 IGHV1-46*01 VH all 30999 11 7474 14728 9 3665 4.0 0 0 1 288 0 IGHV1-46*01 CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCATCTGGATACACCTTCACCAGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAATAATCAACCCTAGTGGTGGTAGCACAAGCTACGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGGACACGTCCACGAGCACAGTCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-58*01 IGHV1-58*01 VH all 2038 6 512 1023 6 281 3.6 0 0 1 288 0 IGHV1-58*01 CAAATGCAGCTGGTGCAGTCTGGGCCTGAGGTGAAGAAGCCTGGGACCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATTCACCTTTACTAGCTCTGCTGTGCAGTGGGTGCGACAGGCTCGTGGACAACGCCTTGAGTGGATAGGATGGATCGTCGTTGGCAGTGGTAACACAAACTACGCACAGAAGTTCCAGGAAAGAGTCACCATTACCAGGGACATGTCCACAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCCGAGGACACGGCCGTGTATTACTGTGCGGCAGA
-IGHV1-69*01 IGHV1-69*01 VH all 41033 12 10647 21253 10 6013 3.5 0 0 1 288 0 IGHV1-69*01 CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACGAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-69*06 IGHV1-69*06 VH all 13719 11 3718 7062 9 2100 3.4 0 0 1 288 0 IGHV1-69*06 CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTCAGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATCTTTGGTACAGCAAACTACGCACAGAAGTTCCAGGGCAGAGTCACGATTACCGCGGACAAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV1-69-2*01 IGHV1-69-2*01 VH all 416 5 115 235 5 70 3.4 0 0 1 288 0 IGHV1-69-2*01 GAGGTCCAGCTGGTACAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCTACAGTGAAAATCTCCTGCAAGGTTTCTGGATACACCTTCACCGACTACTACATGCACTGGGTGCAACAGGCCCCTGGAAAAGGGCTTGAGTGGATGGGACTTGTTGATCCTGAAGATGGTGAAACAATATACGCAGAGAAGTTCCAGGGCAGAGTCACCATAACCGCGGACACGTCTACAGACACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCAACAGA
-IGHV1-8*01 IGHV1-8*01 VH all 12011 11 3186 6200 9 1760 3.5 0 0 1 288 0 IGHV1-8*01 CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCAGTTATGATATCAACTGGGTGCGACAGGCCACTGGACAAGGGCTTGAGTGGATGGGATGGATGAACCCTAACAGTGGTAACACAGGCTATGCACAGAAGTTCCAGGGCAGAGTCACCATGACCAGGAACACCTCCATAAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGACACGGCCGTGTATTACTGTGCGAGAGG
-IGHV2-26*01 IGHV2-26*01 VH all 4191 10 968 2112 6 511 4.1 0 0 1 291 0 IGHV2-26*01 CAGGTCACCTTGAAGGAGTCTGGTCCTGTGCTGGTGAAACCCACAGAGACCCTCACGCTGACCTGCACCGTCTCTGGGTTCTCACTCAGCAATGCTAGAATGGGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACACATTTTTTCGAATGACGAAAAATCCTACAGCACATCTCTGAAGAGCAGGCTCACCATCTCCAAGGACACCTCCAAAAGCCAGGTGGTCCTTACCATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACGGATAC
-IGHV2-5*01 IGHV2-5*01 VH all 4418 9 1039 1888 6 478 4.0 0 0 1 291 0 IGHV2-5*01 CAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGAAACCCACACAGACCCTCACGCTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTGCACTCATTTATTGGAATGATGATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACACAGAC
-IGHV2-5*02 IGHV2-5*02 VH all 2028 10 493 744 6 205 3.6 0 0 1 291 0 IGHV2-5*02 CAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGAAACCCACACAGACCCTCACGCTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTGCACTCATTTATTGGGATGATGATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACACAGAC
-IGHV2-70*01 IGHV2-70*01 VH all 4746 9 1152 2400 7 566 4.2 0 0 1 291 0 IGHV2-70*01 CAGGTCACCTTGAGGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGTGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACTCATTGATTGGGATGATGATAAATACTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
-IGHV2-70D*04 IGHV2-70D*04 VH all 2957 8 652 1144 7 246 4.7 0 0 1 291 0 IGHV2-70D*04 CAGGTCACCTTGAAGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAGCACTAGTGGAATGCGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACGCATTGATTGGGATGATGATAAATTCTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAAACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACGTATTACTGTGCACGGATAC
-IGHV3-11*01 IGHV3-11*01 VH all 1787 8 506 822 7 264 3.1 0 0 1 288 0 IGHV3-11*01 CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV3-11*06 IGHV3-11*06 VH all 221 7 80 92 5 38 2.4 0 0 1 288 0 IGHV3-11*06 CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTTACACAAACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-13*01 IGHV3-13*01 VH all 821 6 199 380 5 97 3.9 0 0 1 285 0 IGHV3-13*01 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACGACATGCACTGGGTCCGCCAAGCTACAGGAAAAGGTCTGGAGTGGGTCTCAGCTATTGGTACTGCTGGTGACACATACTATCCAGGCTCCGTGAAGGGCCGATTCACCATCTCCAGAGAAAATGCCAAGAACTCCTTGTATCTTCAAATGAACAGCCTGAGAGCCGGGGACACGGCTGTGTATTACTGTGCAAGAGA
-IGHV3-13*01_S2321 IGHV3-13*01_S2321 VH all 805 6 192 324 5 86 3.8 0 0 1 285 3 IGHV3-13*01 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACGACATGCACTGGGTCCGCCAAGCTACAGGAAAAGGTCTGGAGTGGGTCTCAGCTATTGGTACTGCTGGTGACACATACTATCCAGGCTCCGTGAAGGGCCGATTCACCATCTCCAGAGAAAATGCCAAGAACTCCTTGTATCTTCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCAAGAG
-IGHV3-15*01 IGHV3-15*01 VH all 13813 11 2773 5650 9 1217 4.6 0 0 1 294 0 IGHV3-15*01 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGATTCACTTTCAGTAACGCCTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
-IGHV3-15*07 IGHV3-15*07 VH all 16932 11 3269 6926 8 1394 5.0 0 0 1 294 0 IGHV3-15*07 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCTGGGGGGTCCCTTAGACTCTCCTGTGCAGCCTCTGGTTTCACTTTCAGTAACGCCTGGATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCGGCCGTATTAAAAGCAAAACTGATGGTGGGACAACAGACTACGCTGCACCCGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCAAAAAACACGCTGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACCACAGA
-IGHV3-20*01 IGHV3-20*01 VH all 529 6 170 200 6 61 3.3 0 0 1 288 0 IGHV3-20*01 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGTGTGGTACGGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGGCATGAGCTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGAGTGGGTCTCTGGTATTAATTGGAATGGTGGTAGCACAGGTTATGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCCTTGTATCACTGTGCGAGAGA
-IGHV3-20*01_S7413 IGHV3-20*01_S7413 VH all 369 6 114 152 5 50 3.0 0 0 1 288 2 IGHV3-20*01 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGTGTGGTACGGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGGCATGAGCTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGAGTGGGTCTCTGGTATTAATTGGAATGGTGGTAGCACAGGTTATGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCCTTGTATTACTGTGCGAGAG
-IGHV3-21*01 IGHV3-21*01 VH all 14645 9 3602 6690 8 1831 3.6 0 0 1 288 0 IGHV3-21*01 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGTAGTAGTTACATATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-23*01 IGHV3-23*01 VH all 4960 10 1433 1736 6 612 2.8 0 0 1 288 0 IGHV3-23*01 GAGGTGCAGCTGTTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGCAGCTATGCCATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGCTATTAGTGGTAGTGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGGTTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTATATTACTGTGCGAAAGA
-IGHV3-30*03_S9223 IGHV3-30*03_S9223 VH all 379 3 53 175 3 16 10.9 0 0 1 288 16 IGHV3-30*03 CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGTAGTCTCTGGATTCACCTTCAGTAGTTATGGCATACACTGGGTCCGTCAGGCTCCAGTCAAGGGGCTGGAGTGGGTGGCAGTTATATCACATGATGGAAGTACTAAGTACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCCGAGACAATTCCAAGAACACATTGTATCTGCAAATGAACAGCCTGACATTTGAGGACACGGCTGTGTATTACTGTGCGAGGGA
-IGHV3-30-5*01 IGHV3-30-5*01 VH all 25649 11 5549 12485 9 2841 4.4 0 0 1 288 0 IGHV3-30-5*01 CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAAAGA
-IGHV3-33*01 IGHV3-33*01 VH all 19630 9 4104 8200 9 1900 4.3 0 0 1 288 0 IGHV3-33*01 CAGGTGCAGCTGGTGGAGTCTGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCGTCTGGATTCACCTTCAGTAGCTATGGCATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATGGTATGATGGAAGTAATAAATACTATGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-43*01 IGHV3-43*01 VH all 1738 8 467 862 7 249 3.5 0 0 1 288 0 IGHV3-43*01 GAAGTGCAGCTGGTGGAGTCTGGGGGAGTCGTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATACCATGCACTGGGTCCGTCAAGCTCCGGGGAAGGGTCTGGAGTGGGTCTCTCTTATTAGTTGGGATGGTGGTAGCACATACTATGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACAGCAAAAACTCCCTGTATCTGCAAATGAACAGTCTGAGAACTGAGGACACCGCCTTGTATTACTGTGCAAAAGATA
-IGHV3-43D*01 IGHV3-43D*01 VH all 1228 6 318 553 6 143 3.9 0 0 1 288 0 IGHV3-43D*01 GAAGTGCAGCTGGTGGAGTCTGGGGGAGTCGTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGCCATGCACTGGGTCCGTCAAGCTCCGGGGAAGGGTCTGGAGTGGGTCTCTCTTATTAGTTGGGATGGTGGTAGCACCTACTATGCAGACTCTGTGAAGGGTCGATTCACCATCTCCAGAGACAACAGCAAAAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCTGAGGACACCGCCTTGTATTACTGTGCAAAAGATA
-IGHV3-48*02 IGHV3-48*02 VH all 6851 11 1741 2591 8 763 3.4 0 0 1 288 0 IGHV3-48*02 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAATGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGACGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-48*04 IGHV3-48*04 VH all 8560 11 2086 3710 8 1004 3.7 0 0 1 288 0 IGHV3-48*04 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTAGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-49*03 IGHV3-49*03 VH all 3686 8 983 1798 7 482 3.7 0 0 1 294 0 IGHV3-49*03 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
-IGHV3-49*05 IGHV3-49*05 VH all 2237 9 584 1149 7 327 3.5 0 0 1 294 0 IGHV3-49*05 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTAAAGCCAGGGCGGTCCCTGAGACTCTCCTGTACAGCTTCTGGATTCACCTTTGGTGATTATGCTATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTAGGTTTCATTAGAAGCAAAGCTTATGGTGGGACAACAGAATACGCCGCGTCTGTGAAAGGCAGATTCACCATCTCAAGAGATGATTCCAAAAGCATCGCCTATCTGCAAATGAACAGCCTGAAAACCGAGGACACAGCCGTGTATTACTGTACTAGAGA
-IGHV3-53*01 IGHV3-53*01 VH all 16832 10 3372 5862 9 1384 4.2 0 0 1 285 0 IGHV3-53*01 GAGGTGCAGCTGGTGGAGTCTGGAGGAGGCTTGATCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGGTTCACCGTCAGTAGCAACTACATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCAGTTATTTATAGCGGTGGTAGCACATACTACGCAGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV3-64*02 IGHV3-64*02 VH all 133 5 33 41 4 12 3.4 0 0 1 288 0 IGHV3-64*02 GAGGTGCAGCTGGTGGAGTCTGGGGAAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGCTATGCACTGGGTCCGCCAGGCTCCAGGGAAGGGACTGGAATATGTTTCAGCTATTAGTAGTAATGGGGGTAGCACATATTATGCAGACTCTGTGAAGGGCAGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTTCAAATGGGCAGCCTGAGAGCTGAGGACATGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-7*01 IGHV3-7*01 VH all 9716 11 2189 2560 7 729 3.5 0 0 1 288 0 IGHV3-7*01 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGTAGCTATTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTGGCCAACATAAAGCAAGATGGAAGTGAGAAATACTATGTGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGA
-IGHV3-7*03 IGHV3-7*03 VH all 6298 9 1498 2114 6 623 3.4 0 0 1 288 0 IGHV3-7*03 GAGGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTAGTAGCTATTGGATGAGCTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTGGCCAACATAAAGCAAGATGGAAGTGAGAAATACTATGTGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV3-73*02 IGHV3-73*02 VH all 6251 10 1185 1736 6 402 4.3 0 0 1 294 0 IGHV3-73*02 GAGGTGCAGCTGGTGGAGTCCGGGGGAGGCTTGGTCCAGCCTGGGGGGTCCCTGAAACTCTCCTGTGCAGCCTCTGGGTTCACCTTCAGTGGCTCTGCTATGCACTGGGTCCGCCAGGCTTCCGGGAAAGGGCTGGAGTGGGTTGGCCGTATTAGAAGCAAAGCTAACAGTTACGCGACAGCATATGCTGCGTCGGTGAAAGGCAGGTTCACCATCTCCAGAGATGATTCAAAGAACACGGCGTATCTGCAAATGAACAGCCTGAAAACCGAGGACACGGCCGTGTATTACTGTACTAGACA
-IGHV3-74*01 IGHV3-74*01 VH all 7208 9 1584 2390 8 586 4.1 0 0 1 288 0 IGHV3-74*01 GAGGTGCAGCTGGTGGAGTCCGGGGGAGGCTTAGTTCAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTACTGGATGCACTGGGTCCGCCAAGCTCCAGGGAAGGGGCTGGTGTGGGTCTCACGTATTAATAGTGATGGGAGTAGCACAAGCTACGCGGACTCCGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACACGCTGTATCTGCAAATGAACAGTCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
-IGHV3-9*01 IGHV3-9*01 VH all 805 6 261 373 6 126 3.0 0 0 1 288 0 IGHV3-9*01 GAAGTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTACAGCCTGGCAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTTGATGATTATGCCATGCACTGGGTCCGGCAAGCTCCAGGGAAGGGCCTGGAGTGGGTCTCAGGTATTAGTTGGAATAGTGGTAGCATAGGCTATGCGGACTCTGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCCCTGTATCTGCAAATGAACAGTCTGAGAGCTGAGGACACGGCCTTGTATTACTGTGCAAAAGATA
-IGHV4-28*01 IGHV4-28*01 VH all 55 5 23 24 4 11 2.2 0 0 1 288 0 IGHV4-28*01 CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGACACCCTGTCCCTCACCTGCGCTGTCTCTGGTTACTCCATCAGCAGTAGTAACTGGTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTACATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGTGGACACGGCCGTGTATTACTGTGCGAGAAA
-IGHV4-31*03 IGHV4-31*03 VH all 5511 11 1727 2519 8 905 2.8 0 0 1 291 0 IGHV4-31*03 CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCACAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTGGTGGTTACTACTGGAGCTGGATCCGCCAGCACCCAGGGAAGGGCCTGGAGTGGATTGGGTACATCTATTACAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTTACCATATCAGTAGACACGTCTAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACTGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-34*01 IGHV4-34*01 VH all 38338 12 11990 18928 10 6988 2.7 0 0 1 285 0 IGHV4-34*01 CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTCAGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGTGGAAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCTGTGTATTACTGTGCGAGAGG
-IGHV4-38-2*02 IGHV4-38-2*02 VH all 5678 9 1579 2099 7 720 2.9 0 0 1 288 0 IGHV4-38-2*02 CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTTACTCCATCAGCAGTGGTTACTACTGGGGCTGGATCCGGCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATCATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-39*01 IGHV4-39*01 VH all 12250 10 3346 5180 7 1708 3.0 0 0 1 291 0 IGHV4-39*01 CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCCGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCAGACACGGCTGTGTATTACTGTGCGAGACA
-IGHV4-39*07 IGHV4-39*07 VH all 19396 11 5780 7910 9 2829 2.8 0 0 1 291 0 IGHV4-39*07 CAGCTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGCAGTAGTAGTTACTACTGGGGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGAGTATCTATTATAGTGGGAGCACCTACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-4*02 IGHV4-4*02 VH all 4465 8 1382 2030 7 734 2.8 0 0 1 288 0 IGHV4-4*02 CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGGGACCCTGTCCCTCACCTGCGCTGTCTCTGGTGGCTCCATCAGCAGTAGTAACTGGTGGAGTTGGGTCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCTATCATAGTGGGAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGACAAGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-4*07 IGHV4-4*07 VH all 5043 10 1391 1886 7 678 2.8 0 0 1 285 0 IGHV4-4*07 CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCGCCGGGAAGGGACTGGAGTGGATTGGGCGTATCTATACCAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATGTCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-59*01 IGHV4-59*01 VH all 16798 10 5022 7023 9 2549 2.8 0 0 1 285 0 IGHV4-59*01 CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCATCAGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV4-61*01 IGHV4-61*01 VH all 4946 8 1515 1862 8 709 2.6 0 0 1 291 0 IGHV4-61*01 CAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCACTGTCTCTGGTGGCTCCGTCAGCAGTGGTAGTTACTACTGGAGCTGGATCCGGCAGCCCCCAGGGAAGGGACTGGAGTGGATTGGGTATATCTATTACAGTGGGAGCACCAACTACAACCCCTCCCTCAAGAGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCGGACACGGCCGTGTATTACTGTGCGAGAGA
-IGHV5-10-1*03 IGHV5-10-1*03 VH all 4317 7 1046 1720 7 476 3.6 0 0 1 288 0 IGHV5-10-1*03 GAAGTGCAGCTGGTGCAGTCCGGAGCAGAGGTGAAAAAGCCCGGGGAGTCTCTGAGGATCTCCTGTAAGGGTTCTGGATACAGCTTTACCAGCTACTGGATCAGCTGGGTGCGCCAGATGCCCGGGAAAGGCCTGGAGTGGATGGGGAGGATTGATCCTAGTGACTCTTATACCAACTACAGCCCGTCCTTCCAAGGCCACGTCACCATCTCAGCTGACAAGTCCATCAGCACTGCCTACCTGCAGTGGAGCAGCCTGAAGGCCTCGGACACCGCCATGTATTACTGTGCGAGA
-IGHV5-51*01 IGHV5-51*01 VH all 30679 10 7649 11953 9 3386 3.5 0 0 1 288 0 IGHV5-51*01 GAGGTGCAGCTGGTGCAGTCTGGAGCAGAGGTGAAAAAGCCCGGGGAGTCTCTGAAGATCTCCTGTAAGGGTTCTGGATACAGCTTTACCAGCTACTGGATCGGCTGGGTGCGCCAGATGCCCGGGAAAGGCCTGGAGTGGATGGGGATCATCTATCCTGGTGACTCTGATACCAGATACAGCCCGTCCTTCCAAGGCCAGGTCACCATCTCAGCCGACAAGTCCATCAGCACCGCCTACCTGCAGTGGAGCAGCCTGAAGGCCTCGGACACCGCCATGTATTACTGTGCGAGACA
-IGHV6-1*01 IGHV6-1*01 VH all 18773 11 3603 6639 8 1316 5.0 0 0 1 297 0 IGHV6-1*01 CAGGTACAGCTGCAGCAGTCAGGTCCAGGACTGGTGAAGCCCTCGCAGACCCTCTCACTCACCTGTGCCATCTCCGGGGACAGTGTCTCTAGCAACAGTGCTGCTTGGAACTGGATCAGGCAGTCCCCATCGAGAGGCCTTGAGTGGCTGGGAAGGACATACTACAGGTCCAAGTGGTATAATGATTATGCAGTATCTGTGAAAAGTCGAATAACCATCAACCCAGACACATCCAAGAACCAGTTCTCCCTGCAGCTGAACTCTGTGACTCCCGAGGACACGGCTGTGTATTACTGTGCAAGAGA
-IGHV7-4-1*01 IGHV7-4-1*01 VH all 140 5 42 44 3 14 3.1 0 0 1 288 0 IGHV7-4-1*01 CAGGTGCAGCTGGTGCAATCTGGGTCTGAGTTGAAGAAGCCTGGGGCCTCAGTGAAGGTTTCCTGCAAGGCTTCTGGATACACCTTCACTAGCTATGCTATGAATTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAACACCAACACTGGGAACCCAACGTATGCCCAGGGCTTCACAGGACGGTTTGTCTTCTCCTTGGACACCTCTGTCAGCACGGCATATCTGCAGATCTGCAGCCTAAAGGCTGAGGACACTGCCGTGTATTACTGTGCGAGA
=====================================
tests/data/H1/test.sh deleted
=====================================
@@ -1,5 +0,0 @@
-#!/bin/bash
-set -euo pipefail
-
-igdiscover germlinefilter --whitelist=V.fasta --max-differences=0 --unique-CDR3=5 --cluster-size=100 --unique-J=3 --cross-mapping-ratio=0.02 --allele-ratio=0.1 candidates.tab.gz > new_V_germline.tab
-diff -U 0 <(cut -f1 expected.tab) <(cut -f1 new_V_germline.tab) | grep '^[+-]' | sed 1,2d
=====================================
tests/run.sh deleted
=====================================
@@ -1,56 +0,0 @@
-#!/bin/bash
-# Run this within an activated igdiscover environment
-set -euo pipefail
-set -x
-unset DISPLAY
-
-pytest
-
-rm -rf testrun
-mkdir testrun
-[[ -L testdata ]] || ln -s igdiscover-testdata testdata
-
-
-# Test whether specifying primer sequences leads to a SyntaxError
-igdiscover init --db=testdata/database --reads=testdata/reads.1.fastq.gz testrun/primers
-pushd testrun/primers
-igdiscover config \
- --set forward_primers "['CGTGA']" \
- --set reverse_primers "['TTCAC']"
-igdiscover run -n stats/reads.json
-popd
-
-# Test using FLASH and parsing its log output
-igdiscover init --db=testdata/database --reads=testdata/reads.1.fastq.gz testrun/flash
-pushd testrun/flash
-igdiscover config --set merge_program flash
-igdiscover run stats/reads.json
-popd
-
-
-igdiscover init --db=testdata/database --reads=testdata/reads.1.fastq.gz testrun/paired
-pushd testrun/paired
-igdiscover config --set barcode_length_3prime 21
-
-igdiscover run nofinal
-if [[ -d final/ ]]; then
- echo "ERROR: nofinal failed"
- exit 1
-fi
-
-# run final iteration
-igdiscover run
-
-igdiscover run iteration-01/exact.tab
-popd
-
-# Use the merged file from above as input again
-igdiscover init --db=testdata/database --single-reads=testrun/paired/reads/2-merged.fastq.gz testrun/singlefastq
-cp -p testrun/paired/igdiscover.yaml testrun/singlefastq/
-( cd testrun/singlefastq && igdiscover run stats/reads.json )
-
-# Test FASTA input
-cutadapt --quiet -o testrun/reads.fasta testrun/paired/reads/2-merged.fastq.gz
-igdiscover init --db=testdata/database --single-reads=testrun/reads.fasta testrun/singlefasta
-cp -p testrun/paired/igdiscover.yaml testrun/singlefastq/
-( cd testrun/singlefasta && igdiscover run stats/reads.json )
=====================================
tests/test_commands.py
=====================================
@@ -1,9 +1,16 @@
import os
import sys
import pytest
+import contextlib
+import shutil
+
from igdiscover.__main__ import main
from .utils import datapath, resultpath, files_equal
+from igdiscover.cli.init import run_init
+from igdiscover.cli.config import print_configuration, modify_configuration
+from igdiscover.cli.run import run_snakemake
+from igdiscover.cli.clonotypes import run_clonotypes
@pytest.fixture
@@ -25,6 +32,59 @@ def run(tmpdir):
return _run
+ at pytest.fixture
+def pipeline_dir(tmp_path):
+ """An initialized pipeline directory"""
+ pipeline_path = tmp_path / "initializedpipeline"
+ init_testdata(pipeline_path)
+ return pipeline_path
+
+
+def init_testdata(directory):
+ run_init(
+ database="testdata/database",
+ reads1="testdata/reads.1.fastq.gz",
+ directory=str(directory),
+ )
+ with chdir(directory):
+ modify_configuration([("barcode_length_3prime", "21")])
+
+
+ at contextlib.contextmanager
+def chdir(path):
+ previous_path = os.getcwd()
+ os.chdir(path)
+ yield
+ os.chdir(previous_path)
+
+
+ at pytest.fixture(scope="session")
+def filtered_tab_session(tmp_path_factory):
+ """Generate iteration-01/filtered.tab.gz"""
+
+ pipeline_dir = tmp_path_factory.mktemp("pipedir") / "pipedir"
+ init_testdata(pipeline_dir)
+ with chdir(pipeline_dir):
+ run_snakemake(targets=["iteration-01/filtered.tab.gz"])
+ return pipeline_dir
+
+
+ at pytest.fixture
+def has_filtered_tab(filtered_tab_session, tmp_path):
+ """
+ Give a fresh copy of a pipeline dir in which iteration-01/filtered.tab.gz
+ is guaranteed to exist
+ """
+ pipeline_dir = tmp_path / "has_filtered_tab"
+ shutil.copytree(
+ filtered_tab_session,
+ pipeline_dir,
+ symlinks=True,
+ ignore=shutil.ignore_patterns((".snakemake")),
+ )
+ return pipeline_dir
+
+
def test_main():
with pytest.raises(SystemExit) as exc:
main(['--version'])
@@ -54,3 +114,115 @@ def test_clusterplot(tmpdir):
def test_igblast(run):
args = ['igblast', '--threads=1', datapath('database/'), datapath('igblast.fasta')]
run(args, resultpath('assigned.tab'))
+
+
+def test_run_init(pipeline_dir):
+ assert pipeline_dir.is_dir()
+ assert (pipeline_dir / "igdiscover.yaml").exists()
+
+
+def test_print_configuration(pipeline_dir):
+ print_configuration(path=pipeline_dir / "igdiscover.yaml")
+
+
+def test_modify_configuration(pipeline_dir):
+ modify_configuration(
+ settings=[("d_coverage", "12"), ("j_discovery.allele_ratio", "0.37")],
+ path=str(pipeline_dir / "igdiscover.yaml"),
+ )
+ import ruamel.yaml
+ with open(pipeline_dir / "igdiscover.yaml") as f:
+ config = ruamel.yaml.safe_load(f)
+ assert config["d_coverage"] == 12
+ assert config["j_discovery"]["allele_ratio"] == 0.37
+
+
+def test_dryrun(pipeline_dir):
+ with chdir(pipeline_dir):
+ run_snakemake(dryrun=True)
+
+
+def test_primers(pipeline_dir):
+ # Test whether specifying primer sequences leads to a SyntaxError
+ with chdir(pipeline_dir):
+ modify_configuration(
+ settings=[
+ ("forward_primers", "['CGTGA']"),
+ ("reverse_primers", "['TTCAC']"),
+ ],
+ )
+ run_snakemake(dryrun=True)
+
+
+def test_flash(pipeline_dir):
+ # Test using FLASH and parsing its log output
+ with chdir(pipeline_dir):
+ modify_configuration(settings=[("merge_program", "flash")])
+ run_snakemake(targets=["stats/reads.json"])
+ # Ensure FLASH was actually run
+ assert (pipeline_dir / "reads/2-flash.log").exists()
+
+
+def test_snakemake_assigned_tab(has_filtered_tab):
+ assert (has_filtered_tab / "iteration-01/filtered.tab.gz").exists()
+ assert not (has_filtered_tab / "iteration-01/new_V_germline.tab").exists()
+
+
+def test_snakemake_exact_tab(has_filtered_tab):
+ with chdir(has_filtered_tab):
+ run_snakemake(targets=["iteration-01/exact.tab"])
+ assert (has_filtered_tab / "iteration-01/exact.tab").exists()
+
+
+def test_snakemake_final(has_filtered_tab):
+ with chdir(has_filtered_tab):
+ run_snakemake(targets=["nofinal"])
+ assert (has_filtered_tab / "iteration-01/new_V_germline.tab").exists()
+ assert not (has_filtered_tab / "final/assigned.tab.gz").exists()
+
+ with chdir(has_filtered_tab):
+ run_snakemake()
+ assert (has_filtered_tab / "final/assigned.tab.gz").exists()
+
+
+def test_clonotypes(has_filtered_tab):
+ run_clonotypes(has_filtered_tab / "iteration-01/assigned.tab.gz", limit=5)
+
+
+def test_fastq_input(has_filtered_tab, tmp_path):
+ # Use merged reads from already-run pipeline as input for a new run
+ single_reads = has_filtered_tab / "reads" / "2-merged.fastq.gz"
+ directory = tmp_path / "singleend-fastq"
+ run_init(
+ database="testdata/database",
+ single_reads=str(single_reads),
+ directory=str(directory),
+ )
+ with chdir(directory):
+ modify_configuration([("barcode_length_3prime", "21")])
+ run_snakemake(targets=["stats/reads.json"])
+
+
+def test_fasta_input(has_filtered_tab, tmp_path):
+ fasta_path = tmp_path / "justfasta.fasta"
+ convert_fastq_to_fasta(
+ has_filtered_tab / "reads" / "2-merged.fastq.gz",
+ fasta_path,
+ )
+ directory = tmp_path / "singleend-fasta"
+ run_init(
+ database="testdata/database",
+ single_reads=str(fasta_path),
+ directory=str(directory),
+ )
+ with chdir(directory):
+ modify_configuration([("barcode_length_3prime", "21")])
+ run_snakemake(targets=["stats/reads.json"])
+
+
+def convert_fastq_to_fasta(fastq, fasta):
+ import dnaio
+ with dnaio.open(fastq) as inf:
+ with dnaio.open(fasta, mode="w") as outf:
+ for record in inf:
+ outf.write(record)
View it on GitLab: https://salsa.debian.org/med-team/igdiscover/-/compare/1182895c2727a9d13216b8ee068504472bdbc1e4...f69cd6756a7c988593763c95e3dda9a8114e854d
--
View it on GitLab: https://salsa.debian.org/med-team/igdiscover/-/compare/1182895c2727a9d13216b8ee068504472bdbc1e4...f69cd6756a7c988593763c95e3dda9a8114e854d
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200229/a53bb2b1/attachment-0001.html>
More information about the debian-med-commit
mailing list