[med-svn] [Git][med-team/changeo][master] 6 commits: New upstream version 1.3.0
Andreas Tille (@tille)
gitlab at salsa.debian.org
Wed Dec 21 07:50:41 GMT 2022
Andreas Tille pushed to branch master at Debian Med / changeo
Commits:
a93417c6 by Andreas Tille at 2022-12-21T08:42:48+01:00
New upstream version 1.3.0
- - - - -
fb155f4a by Andreas Tille at 2022-12-21T08:42:50+01:00
Update upstream source from tag 'upstream/1.3.0'
Update to upstream version '1.3.0'
with Debian dir 677b4c28436c4e271e35d9cd3b52c69f6d5c2efb
- - - - -
94570d6d by Andreas Tille at 2022-12-21T08:43:25+01:00
New upstream version
- - - - -
c657b257 by Andreas Tille at 2022-12-21T08:43:45+01:00
routine-update: Standards-Version: 4.6.2
- - - - -
aa644ddd by Andreas Tille at 2022-12-21T08:43:48+01:00
Update lintian override info format in d/changeo.lintian-overrides on line 3-10.
Changes-By: lintian-brush
Fixes: lintian: mismatched-override
See-also: https://lintian.debian.org/tags/mismatched-override.html
- - - - -
e754f958 by Andreas Tille at 2022-12-21T08:49:40+01:00
Check autopkgtest which fails with this new version
- - - - -
14 changed files:
- NEWS.rst
- PKG-INFO
- bin/AssignGenes.py
- bin/ConvertDb.py
- bin/MakeDb.py
- changeo.egg-info/PKG-INFO
- changeo/Applications.py
- changeo/Gene.py
- changeo/IO.py
- changeo/Receptor.py
- changeo/Version.py
- debian/changelog
- debian/changeo.lintian-overrides
- debian/control
Changes:
=====================================
NEWS.rst
=====================================
@@ -1,6 +1,22 @@
Release Notes
===============================================================================
+Version 1.3.0: December 11, 2022
+-------------------------------------------------------------------------------
+
++ Various updates to internals and error messages.
+
+AssignGenes:
+
++ Added support for ``.fastq`` files. If a ``.fastq`` file is input, then a
+ corresponding ``.fasta`` file will be created in output directory.
++ Added support for C region alignment calls provide by IgBLAST v1.18+.
+
+MakeDb:
+
++ Added support for C region alignment calls provide by IgBLAST v1.18+.
+
+
Version 1.2.0: October 29, 2021
-------------------------------------------------------------------------------
@@ -184,12 +200,12 @@ CreateGermlines:
DefineClones:
-+ Fixed a bug that caused a missing junction column to cluster sequences
++ Fixed a bug that caused a missing junction column to cluster sequences
together.
MakeDb:
-+ Fixed a bug that caused failed germline reconstructions to be recorded as
++ Fixed a bug that caused failed germline reconstructions to be recorded as
``None``, rather than an empty string, in the ``GERMLINE_IMGT`` column.
@@ -507,7 +523,7 @@ MakeDb:
reading frame. This provides the following additional output fields:
``D_FRAME``, ``N1_LENGTH``, ``N2_LENGTH``, ``P3V_LENGTH``, ``P5D_LENGTH``,
``P3D_LENGTH``, ``P5J_LENGTH``.
-+ The fields ``N1_LENGTH`` and ``N2_LENGTH`` have been renamed to accommodate
++ The fields ``N1_LENGTH`` and ``N2_LENGTH`` have been renamed to accommodate
adding additional output from IMGT under the ``--junction`` flag. The new
names are ``NP1_LENGTH`` and ``NP2_LENGTH``.
+ Fixed a bug that caused the ``IN_FRAME``, ``MUTATED_INVARIANT`` and
@@ -538,9 +554,9 @@ CreateGermlines:
MakeDb:
-+ Updated igblast subcommand to correctly parse records with indels. Now
++ Updated igblast subcommand to correctly parse records with indels. Now
igblast must be run with the argument ``outfmt "7 std qseq sseq btop"``.
-+ Changed the names of the FWR and CDR output columns added with
++ Changed the names of the FWR and CDR output columns added with
``--regions`` to ``<region>_IMGT``.
+ Added ``V_BTOP`` and ``J_BTOP`` output when the ``--scores`` flag is
specified to the igblast subcommand.
@@ -551,14 +567,14 @@ Version 0.3.1: December 18, 2015
MakeDb:
-+ Fixed bug wherein the imgt subcommand was not properly recognizing an
++ Fixed bug wherein the imgt subcommand was not properly recognizing an
extracted folder as input to the ``-i`` argument.
Version 0.3.0: December 4, 2015
-------------------------------------------------------------------------------
-Conversion to a proper Python package which uses pip and setuptools for
+Conversion to a proper Python package which uses pip and setuptools for
installation.
The package now requires Python 3.4. Python 2.7 is not longer supported.
@@ -577,7 +593,7 @@ IgCore:
AnalyzeAa:
-+ This tool was removed. This functionality has been migrated to the alakazam
++ This tool was removed. This functionality has been migrated to the alakazam
R package.
DefineClones:
@@ -585,13 +601,13 @@ DefineClones:
+ Added ``--sf`` flag to specify sequence field to be used to calculate
distance between sequences.
+ Fixed bug in wherein sequences with missing data in grouping columns
- were being assigned into a single group and clustered. Sequences with
+ were being assigned into a single group and clustered. Sequences with
missing grouping variables will now be failed.
+ Fixed bug where sequences with "None" junctions were grouped together.
-
+
GapRecords:
-+ This tool was removed in favor of adding IMGT gapping support to igblast
++ This tool was removed in favor of adding IMGT gapping support to igblast
subcommand of MakeDb.
MakeDb:
@@ -600,7 +616,7 @@ MakeDb:
junction region as defined by IMGT.
+ Added the ``--regions`` flag which adds extra columns containing FWR and CDR
regions as defined by IMGT.
-+ Added support to imgt subcommand for the new IMGT/HighV-QUEST compression
++ Added support to imgt subcommand for the new IMGT/HighV-QUEST compression
scheme (.txz files).
@@ -609,11 +625,11 @@ Version 0.2.5: August 25, 2015
CreateGermlines:
-+ Removed default '-r' repository and added informative error messages when
++ Removed default '-r' repository and added informative error messages when
invalid germline repositories are provided.
+ Updated '-r' flag to take list of folders and/or fasta files with germlines.
-
-
+
+
Version 0.2.4: August 19, 2015
-------------------------------------------------------------------------------
@@ -624,46 +640,46 @@ MakeDb:
ParseDb:
-+ Fixed a bug wherein specifying the ``-f`` argument to the index subcommand
++ Fixed a bug wherein specifying the ``-f`` argument to the index subcommand
would cause an error.
-
+
Version 0.2.3: July 22, 2015
-------------------------------------------------------------------------------
DefineClones:
-+ Fixed a typo in the default normalization setting of the bygroup subcommand,
++ Fixed a typo in the default normalization setting of the bygroup subcommand,
which was being interpreted as 'none' rather than 'len'.
-+ Changed the 'hs5f' model of the bygroup subcommand to be centered -log10 of
++ Changed the 'hs5f' model of the bygroup subcommand to be centered -log10 of
the targeting probability.
-+ Added the ``--sym`` argument to the bygroup subcommand which determines how
++ Added the ``--sym`` argument to the bygroup subcommand which determines how
asymmetric distances are handled.
-
+
Version 0.2.2: July 8, 2015
-------------------------------------------------------------------------------
CreateGermlines:
-+ Germline creation now works for IgBLAST output parsed with MakeDb. The
- argument ``--sf SEQUENCE_VDJ`` must be provided to generate germlines from
++ Germline creation now works for IgBLAST output parsed with MakeDb. The
+ argument ``--sf SEQUENCE_VDJ`` must be provided to generate germlines from
IgBLAST output. The same reference database used for the IgBLAST alignment
must be specified with the ``-r`` flag.
+ Fixed a bug with determination of N1 and N2 region positions.
MakeDb:
-+ Combined the ``-z`` and ``-f`` flags of the imgt subcommand into a single flag,
++ Combined the ``-z`` and ``-f`` flags of the imgt subcommand into a single flag,
``-i``, which autodetects the input type.
-+ Added requirement that IgBLAST input be generated using the
++ Added requirement that IgBLAST input be generated using the
``-outfmt "7 std qseq"`` argument to igblastn.
-+ Modified SEQUENCE_VDJ output from IgBLAST parser to include gaps inserted
++ Modified SEQUENCE_VDJ output from IgBLAST parser to include gaps inserted
during alignment.
+ Added correction for IgBLAST alignments where V/D, D/J or V/J segments are
assigned overlapping positions.
+ Corrected N1_LENGTH and N2_LENGTH calculation from IgBLAST output.
-+ Added the ``--scores`` flag which adds extra columns containing alignment
++ Added the ``--scores`` flag which adds extra columns containing alignment
scores from IMGT and IgBLAST output.
@@ -672,17 +688,17 @@ Version 0.2.1: June 18, 2015
DefineClones:
-+ Removed mouse 3-mer model, 'm3n'.
++ Removed mouse 3-mer model, 'm3n'.
Version 0.2.0: June 17, 2015
-------------------------------------------------------------------------------
-Initial public prerelease.
+Initial public prerelease.
-Output files were added to the usage documentation of all scripts.
+Output files were added to the usage documentation of all scripts.
-General code cleanup.
+General code cleanup.
DbCore:
@@ -690,7 +706,7 @@ DbCore:
AnalyzeAa:
-+ Fixed a bug where junctions less than one codon long would lead to a
++ Fixed a bug where junctions less than one codon long would lead to a
division by zero error.
+ Added ``--failed`` flag to create database with records that fail analysis.
+ Added ``--sf`` flag to specify sequence field to be analyzed.
@@ -701,16 +717,16 @@ CreateGermlines:
DefineClones:
-+ Added a human 1-mer model, 'hs1f', which uses the substitution rates from
++ Added a human 1-mer model, 'hs1f', which uses the substitution rates from
from Yaari et al, 2013.
-+ Changed default model to 'hs1f' and default normalization to length for
++ Changed default model to 'hs1f' and default normalization to length for
bygroup subcommand.
+ Added ``--link`` argument which allows for specification of single, complete,
or average linkage during clonal clustering (default single).
GapRecords:
-+ Fixed a bug wherein non-standard sequence fields could not be aligned.
++ Fixed a bug wherein non-standard sequence fields could not be aligned.
MakeDb:
=====================================
PKG-INFO
=====================================
@@ -1,35 +1,13 @@
-Metadata-Version: 1.1
+Metadata-Version: 2.1
Name: changeo
-Version: 1.2.0
+Version: 1.3.0
Summary: A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
Home-page: http://changeo.readthedocs.io
+Download-URL: https://bitbucket.org/kleinstein/changeo/downloads
Author: Namita Gupta, Jason Anthony Vander Heiden
Author-email: immcantation at googlegroups.com
License: GNU Affero General Public License 3 (AGPL-3)
-Download-URL: https://bitbucket.org/kleinstein/changeo/downloads
-Description: .. image:: https://img.shields.io/pypi/dm/changeo
- :target: https://pypi.org/project/changeo
- .. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic
- :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html
-
- Change-O - Repertoire clonal assignment toolkit
- ================================================================================
-
- Change-O is a collection of tools for processing the output of V(D)J alignment
- tools, assigning clonal clusters to immunoglobulin (Ig) sequences, and
- reconstructing germline sequences.
-
- Dramatic improvements in high-throughput sequencing technologies now enable
- large-scale characterization of Ig repertoires, defined as the collection of
- trans-membrane antigen-receptor proteins located on the surface of B cells and
- T cells. Change-O is a suite of utilities to facilitate advanced analysis of
- Ig and TCR sequences following germline segment assignment. Change-O
- handles output from IMGT/HighV-QUEST and IgBLAST, and provides a wide variety of
- clustering methods for assigning clonal groups to Ig sequences. Record sorting,
- grouping, and various database manipulation operations are also included.
-
Keywords: bioinformatics,sequencing,immunology,adaptive immunity,immunoglobulin,AIRR-seq,Rep-Seq,B cell repertoire analysis,adaptive immune receptor repertoires
-Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
@@ -37,3 +15,25 @@ Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.4
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
+License-File: LICENSE
+
+.. image:: https://img.shields.io/pypi/dm/changeo
+ :target: https://pypi.org/project/changeo
+.. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic
+ :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html
+
+Change-O - Repertoire clonal assignment toolkit
+================================================================================
+
+Change-O is a collection of tools for processing the output of V(D)J alignment
+tools, assigning clonal clusters to immunoglobulin (Ig) sequences, and
+reconstructing germline sequences.
+
+Dramatic improvements in high-throughput sequencing technologies now enable
+large-scale characterization of Ig repertoires, defined as the collection of
+trans-membrane antigen-receptor proteins located on the surface of B cells and
+T cells. Change-O is a suite of utilities to facilitate advanced analysis of
+Ig and TCR sequences following germline segment assignment. Change-O
+handles output from IMGT/HighV-QUEST and IgBLAST, and provides a wide variety of
+clustering methods for assigning clonal groups to Ig sequences. Record sorting,
+grouping, and various database manipulation operations are also included.
=====================================
bin/AssignGenes.py
=====================================
@@ -15,6 +15,8 @@ from pkg_resources import parse_version
from textwrap import dedent
from time import time
import re
+import Bio
+from Bio import SeqIO
# Presto imports
from presto.IO import printLog, printMessage, printError, printWarning
@@ -34,7 +36,7 @@ default_igdata = '~/share/igblast'
def assignIgBLAST(seq_file, amino_acid=False, igdata=default_igdata, loci='ig', organism='human',
- vdb=None, ddb=None, jdb=None, format=default_format,
+ vdb=None, ddb=None, jdb=None, cdb=None, format=default_format,
igblast_exec=default_igblastn_exec, out_file=None,
out_args=default_out_args, nproc=None):
"""
@@ -49,6 +51,7 @@ def assignIgBLAST(seq_file, amino_acid=False, igdata=default_igdata, loci='ig',
vdb (str): name of a custom V reference in the database folder to use.
ddb (str): name of a custom D reference in the database folder to use.
jdb (str): name of a custom J reference in the database folder to use.
+ cdb (str): name of a custom C reference in the database folder to use.
format (str): output format. One of 'blast' or 'airr'.
exec (str): the path to the igblastn executable.
out_file (str): output file name. Automatically generated from the input file if None.
@@ -87,26 +90,44 @@ def assignIgBLAST(seq_file, amino_acid=False, igdata=default_igdata, loci='ig',
if out_file is None:
out_file = getOutputName(seq_file, out_label='igblast', out_dir=out_args['out_dir'],
out_name=out_args['out_name'], out_type=out_type)
+ # convert to FASTA if needed
+ infile = open(seq_file, 'r')
+ test = infile.read()[0]
+ if test == "@":
+ printMessage("Running conversion from FASTQ to FASTA")
+ fasta_out_dir, filename = os.path.split(out_file)
+ out_fasta_file = os.path.split(seq_file)[1]
+ out_fasta_file = os.path.join(fasta_out_dir,'%s.fasta' % os.path.splitext(out_fasta_file)[0])
+ with open(out_fasta_file, "w") as out_handle:
+ records = SeqIO.parse(seq_file, 'fastq')
+ if parse_version(Bio.__version__) >= parse_version('1.71'):
+ # Biopython >= v1.71
+ SeqIO.write(records, out_handle, format='fasta-2line')
+ else:
+ # Biopython < v1.71
+ writer = SeqIO.FastaIO.FastaWriter(out_handle, wrap=None)
+ writer.write_file(records)
+ seq_file = out_fasta_file
# Run IgBLAST clustering
start_time = time()
printMessage('Running IgBLAST', start_time=start_time, width=25)
if not amino_acid:
console_out = runIgBLASTN(seq_file, igdata, loci=loci, organism=organism,
- vdb=vdb, ddb=ddb, jdb=jdb, output=out_file,
+ vdb=vdb, ddb=ddb, jdb=jdb, cdb=cdb, output=out_file,
format=format, threads=nproc, exec=igblast_exec)
else:
console_out = runIgBLASTP(seq_file, igdata, loci=loci, organism=organism,
vdb=vdb, output=out_file,
threads=nproc, exec=igblast_exec)
printMessage('Done', start_time=start_time, end=True, width=25)
-
+
# Get number of processed sequences
if (format == 'blast'):
- with open(out_file, 'rb') as f:
- f.seek(-2, os.SEEK_END)
- while f.read(1) != b'\n':
- f.seek(-2, os.SEEK_CUR)
+ with open(out_file, 'rb') as f:
+ f.seek(-2, os.SEEK_END)
+ while f.read(1) != b'\n':
+ f.seek(-2, os.SEEK_CUR)
pass_info = f.readline().decode()
num_seqs_match = re.search('(# BLAST processed )(\d+)( .*)', pass_info)
num_sequences = num_seqs_match.group(2)
@@ -120,7 +141,7 @@ def assignIgBLAST(seq_file, amino_acid=False, igdata=default_igdata, loci='ig',
lines += buf.count(b'\n')
buf = read_f(buf_size)
num_sequences = lines - 1
-
+
# Print log
log = OrderedDict()
log['PASS'] = num_sequences
@@ -137,8 +158,8 @@ def getArgParser():
Arguments:
None
-
- Returns:
+
+ Returns:
an ArgumentParser object
"""
# Define output file names and header fields
@@ -180,16 +201,21 @@ def getArgParser():
choices=choices_loci, help='The receptor type.')
group_igblast.add_argument('--vdb', action='store', dest='vdb', default=None,
help='''Name of the custom V reference in the IgBLAST database folder.
- If not specified, then a default database name with the form
+ If not specified, then a default database name with the form
imgt_<organism>_<loci>_v will be used.''')
group_igblast.add_argument('--ddb', action='store', dest='ddb', default=None,
help='''Name of the custom D reference in the IgBLAST database folder.
- If not specified, then a default database name with the form
+ If not specified, then a default database name with the form
imgt_<organism>_<loci>_d will be used.''')
group_igblast.add_argument('--jdb', action='store', dest='jdb', default=None,
help='''Name of the custom J reference in the IgBLAST database folder.
- If not specified, then a default database name with the form
+ If not specified, then a default database name with the form
imgt_<organism>_<loci>_j will be used.''')
+ group_igblast.add_argument('--cdb', action='store', dest='cdb', default=None,
+ help='''Name of the custom C reference in the IgBLAST database folder.
+ If not specified, then a default database name with the form
+ imgt_<organism>_<loci>_c will be used. Note, this argument will be
+ ignored for IgBLAST versions below 1.18.0.''')
group_igblast.add_argument('--format', action='store', dest='format', default=default_format,
choices=choices_format,
help='''Specify the output format. The "blast" will result in
@@ -217,7 +243,7 @@ def getArgParser():
choices=choices_loci, help='The receptor type.')
group_igblast_aa.add_argument('--vdb', action='store', dest='vdb', default=None,
help='''Name of the custom V reference in the IgBLAST database folder.
- If not specified, then a default database name with the form
+ If not specified, then a default database name with the form
imgt_aa_<organism>_<loci>_v will be used.''')
group_igblast_aa.add_argument('--exec', action='store', dest='igblast_exec',
default=default_igblastp_exec,
=====================================
bin/ConvertDb.py
=====================================
@@ -24,12 +24,11 @@ from Bio.SeqRecord import SeqRecord
# Presto and changeo imports
from presto.Annotation import flattenAnnotation
from presto.IO import printLog, printMessage, printProgress, printError, printWarning
-from changeo.Alignment import gapV
from changeo.Applications import default_tbl2asn_exec, runASN
from changeo.Defaults import default_id_field, default_seq_field, default_germ_field, \
default_csv_size, default_format, default_out_args
from changeo.Commandline import CommonHelpFormatter, checkArgs, getCommonArgParser, parseCommonArgs
-from changeo.Gene import getCGene, buildGermline
+from changeo.Gene import getCGene
from changeo.IO import countDbFile, getFormatOperators, getOutputHandle, AIRRReader, AIRRWriter, \
ChangeoReader, ChangeoWriter, TSVReader, ReceptorData, readGermlines, \
checkFields, yamlDict
@@ -75,152 +74,6 @@ def buildSeqRecord(db_record, id_field, seq_field, meta_fields=None):
return seq_record
-def correctIMGTFields(receptor, references):
- """
- Add IMGT-gaps to IMGT fields in a Receptor object
-
- Arguments:
- receptor (changeo.Receptor.Receptor): Receptor object to modify.
- references (dict): dictionary of IMGT-gapped references sequences.
-
- Returns:
- changeo.Receptor.Receptor: modified Receptor with IMGT-gapped fields.
- """
- # Initialize update object
- imgt_dict = {'sequence_imgt': None,
- 'v_germ_start_imgt': None,
- 'v_germ_length_imgt': None,
- 'germline_imgt': None}
-
- try:
- if not all([receptor.sequence_imgt,
- receptor.v_germ_start_imgt,
- receptor.v_germ_length_imgt,
- receptor.v_call]):
- raise AttributeError
- except AttributeError:
- return None
-
- # Update IMGT fields
- try:
- gapped = gapV(receptor.sequence_imgt,
- receptor.v_germ_start_imgt,
- receptor.v_germ_length_imgt,
- receptor.v_call,
- references)
- except KeyError as e:
- printWarning(e)
- return None
-
- # Verify IMGT-gapped sequence and junction concur
- try:
- check = (receptor.junction == gapped['sequence_imgt'][309:(309 + receptor.junction_length)])
- except TypeError:
- check = False
- if not check:
- return None
-
- # Rebuild germline sequence
- __, germlines, __ = buildGermline(receptor, references)
- if germlines is None:
- return None
- else:
- gapped['germline_imgt'] = germlines['full']
-
- # Update return object
- imgt_dict.update(gapped)
-
- return imgt_dict
-
-
-def insertGaps(db_file, references=None, format=default_format,
- out_file=None, out_args=default_out_args):
- """
- Inserts IMGT numbering into V fields
-
- Arguments:
- db_file : the database file name.
- references : folder with germline repertoire files. If None, do not updated alignment columns wtih IMGT gaps.
- format : input format.
- out_file : output file name. Automatically generated from the input file if None.
- out_args : common output argument dictionary from parseCommonArgs.
-
- Returns:
- str : output file name
- """
- log = OrderedDict()
- log['START'] = 'ConvertDb'
- log['COMMAND'] = 'imgt'
- log['FILE'] = os.path.basename(db_file)
- printLog(log)
-
- # Define format operators
- try:
- reader, writer, schema = getFormatOperators(format)
- except ValueError:
- printError('Invalid format %s.' % format)
-
- # Open input
- db_handle = open(db_file, 'rt')
- db_iter = reader(db_handle)
-
- # Check for required columns
- try:
- required = ['sequence_imgt', 'v_germ_start_imgt']
- checkFields(required, db_iter.fields, schema=schema)
- except LookupError as e:
- printError(e)
-
- # Load references
- reference_dict = readGermlines(references)
-
- # Check for IMGT-gaps in germlines
- if all('...' not in x for x in reference_dict.values()):
- printWarning('Germline reference sequences do not appear to contain IMGT-numbering spacers. Results may be incorrect.')
-
- # Open output writer
- if out_file is not None:
- pass_handle = open(out_file, 'w')
- else:
- pass_handle = getOutputHandle(db_file, out_label='gap', out_dir=out_args['out_dir'],
- out_name=out_args['out_name'], out_type=schema.out_type)
- pass_writer = writer(pass_handle, fields=db_iter.fields)
-
- # Count records
- result_count = countDbFile(db_file)
-
- # Iterate over records
- start_time = time()
- rec_count = pass_count = 0
- for rec in db_iter:
- # Print progress for previous iteration
- printProgress(rec_count, result_count, 0.05, start_time=start_time)
- rec_count += 1
- # Update IMGT fields
- imgt_dict = correctIMGTFields(rec, reference_dict)
- # Write records
- if imgt_dict is not None:
- pass_count += 1
- rec.setDict(imgt_dict, parse=False)
- pass_writer.writeReceptor(rec)
-
- # Print counts
- printProgress(rec_count, result_count, 0.05, start_time=start_time)
- log = OrderedDict()
- log['OUTPUT'] = os.path.basename(pass_handle.name)
- log['RECORDS'] = rec_count
- log['PASS'] = pass_count
- log['FAIL'] = rec_count - pass_count
- log['END'] = 'ConvertDb'
- printLog(log)
-
- # Close file handles
- pass_handle.close()
- db_handle.close()
-
- return pass_handle.name
-
-
def convertToAIRR(db_file, format=default_format,
out_file=None, out_args=default_out_args):
"""
@@ -990,25 +843,6 @@ def getArgParser():
description='Converts input into a Change-O TSV file.')
parser_changeo.set_defaults(func=convertToChangeo)
- # Subparser to insert IMGT-gaps
- # desc_gap = dedent('''
- # Inserts IMGT numbering spacers into the observed sequence
- # (SEQUENCE_IMGT, sequence_alignment) and rebuilds the germline sequence
- # (GERMLINE_IMGT, germline_alignment) if present. Also adjusts the values
- # in the V germline coordinate fields (V_GERM_START_IMGT, V_GERM_LENGTH_IMGT;
- # v_germline_end, v_germline_start), which are required.
- # ''')
- # parser_gap = subparsers.add_parser('gap', parents=[format_parent],
- # formatter_class=CommonHelpFormatter, add_help=False,
- # help='Inserts IMGT numbering spacers into the V region.',
- # description=desc_gap)
- # group_gap = parser_gap.add_argument_group('conversion arguments')
- # group_gap.add_argument('-r', nargs='+', action='store', dest='references', required=False,
- # help='''List of folders and/or fasta files containing
- # IMGT-gapped germline sequences corresponding to the
- # set of germlines used for the alignment.''')
- # parser_gap.set_defaults(func=insertGaps)
-
# Subparser to convert database entries to sequence file
parser_fasta = subparsers.add_parser('fasta', parents=[default_parent],
formatter_class=CommonHelpFormatter, add_help=False,
=====================================
bin/MakeDb.py
=====================================
@@ -22,10 +22,11 @@ from presto.Annotation import parseAnnotation
from presto.IO import countSeqFile, printLog, printMessage, printProgress, printError, printWarning, readSeqFile
from changeo.Defaults import default_format, default_out_args, default_imgt_id_len
from changeo.Commandline import CommonHelpFormatter, checkArgs, getCommonArgParser, parseCommonArgs
-from changeo.Alignment import RegionDefinition
+from changeo.Alignment import RegionDefinition, gapV
from changeo.Gene import buildGermline
from changeo.IO import countDbFile, extractIMGT, readGermlines, getFormatOperators, getOutputHandle, \
- AIRRWriter, ChangeoWriter, IgBLASTReader, IgBLASTReaderAA, IMGTReader, IHMMuneReader
+ AIRRWriter, ChangeoWriter, IgBLASTReader, IgBLASTReaderAA, IMGTReader, IHMMuneReader, \
+ checkFields
from changeo.Receptor import ChangeoSchema, AIRRSchema
# 10X Receptor attributes
@@ -107,10 +108,10 @@ def getIDforIMGT(seq_file, imgt_id_len=default_imgt_id_len):
Create a sequence ID translation using IMGT truncation.
Arguments:
- seq_file : a fasta file of sequences input to IMGT.
+ seq_file (str): a fasta file of sequences input to IMGT.
Returns:
- dict : a dictionary of with the IMGT truncated ID as the key and the full sequence description as the value.
+ dict: a dictionary of with the IMGT truncated ID as the key and the full sequence description as the value.
"""
# Create a sequence ID translation using IDs truncate up to space or 49 chars
@@ -128,15 +129,77 @@ def getIDforIMGT(seq_file, imgt_id_len=default_imgt_id_len):
return ids
+def correctIMGTFields(receptor, references):
+ """
+ Add IMGT-numbering to IMGT fields in a Receptor object
+
+ Arguments:
+ receptor (changeo.Receptor.Receptor): Receptor object to modify.
+ references (dict): dictionary of IMGT-gapped references sequences.
+
+ Returns:
+ changeo.Receptor.Receptor: modified Receptor with IMGT-gapped fields.
+ """
+ # Initialize update object
+ imgt_dict = {'sequence_imgt': None,
+ 'v_germ_start_imgt': None,
+ 'v_germ_length_imgt': None,
+ 'germline_imgt': None}
+
+ # Check for necessary fields
+ try:
+ if not all([receptor.sequence_imgt,
+ receptor.v_germ_start_imgt,
+ receptor.v_germ_length_imgt,
+ receptor.v_call]):
+ raise AttributeError
+ except AttributeError:
+ return None
+
+ # Gap V region
+ try:
+ gapped = gapV(receptor.sequence_imgt,
+ receptor.v_germ_start_imgt,
+ receptor.v_germ_length_imgt,
+ receptor.v_call,
+ references)
+ except KeyError as e:
+ printWarning(e)
+ return None
+
+ # Verify IMGT-gapped sequence and junction concur
+ # try:
+ # check = (receptor.junction == gapped['sequence_imgt'][309:(309 + receptor.junction_length)])
+ # except TypeError:
+ # check = False
+ # if not check:
+ # return None
+
+ # Rebuild germline sequence
+ receptor.setDict(gapped, parse=False)
+ __, germlines, __ = buildGermline(receptor, references)
+ # log, germlines, genes = buildGermline(receptor, references)
+ # print(log)
+ if germlines is not None:
+ gapped['germline_imgt'] = germlines['full']
+ else:
+ return None
+
+ # Update return object
+ imgt_dict.update(gapped)
+
+ return imgt_dict
+
+
def getSeqDict(seq_file):
"""
Create a dictionary from a sequence file.
Arguments:
- seq_file : sequence file.
+ seq_file (str): sequence file.
Returns:
- dict : sequence description as keys with Bio.SeqRecords as values.
+ dict: sequence description as keys with Bio.SeqRecords as values.
"""
seq_dict = SeqIO.to_dict(readSeqFile(seq_file), key_function=lambda x: x.description)
@@ -144,25 +207,25 @@ def getSeqDict(seq_file):
def writeDb(records, fields, aligner_file, total_count, id_dict=None, annotations=None,
- amino_acid=False, partial=False, asis_id=True, regions='default',
+ amino_acid=False, validate='strict', asis_id=True, regions='default',
writer=AIRRWriter, out_file=None, out_args=default_out_args):
"""
Writes parsed records to an output file
Arguments:
- records : a iterator of Receptor objects containing alignment data.
- fields : a list of ordered field names to write.
- aligner_file : input file name.
- total_count : number of records (for progress bar).
- id_dict : a dictionary of the truncated sequence ID mapped to the full sequence ID.
- annotations : additional annotation dictionary.
- amino_acid : if True do verification on amino acid fields.
- partial : if True put incomplete alignments in the pass file.
- asis_id : if ID is to be parsed for pRESTO output with default delimiters.
+ records (iter): a iterator of Receptor objects containing alignment data.
+ fields (list): a list of ordered field names to write.
+ aligner_file (str): input file name.
+ total_count (int): number of records (for progress bar).
+ id_dict (dict): a dictionary of the truncated sequence ID mapped to the full sequence ID.
+ annotations (dict): additional annotation dictionary.
+ amino_acid (bool): if True do verification on amino acid fields.
+ validate (str): validation criteria for passing records; one of 'strict', 'gentle', or 'partial'.
+ asis_id (bool): if ID is to be parsed for pRESTO output with default delimiters.
regions (str): name of the IMGT FWR/CDR region definitions to use.
- writer : writer class.
- out_file : output file name. Automatically generated from the input file if None.
- out_args : common output argument dictionary from parseCommonArgs.
+ writer (changeo.IO.TSVWriter): writer class.
+ out_file (str): output file name. Automatically generated from the input file if None.
+ out_args (dict): common output argument dictionary from parseCommonArgs.
Returns:
None
@@ -205,6 +268,22 @@ def writeDb(records, fields, aligner_file, total_count, id_dict=None, annotation
check = False
return check
+ # Default function to check for valid records
+ def _gentle(rec):
+ if amino_acid:
+ valid = [rec.v_call and rec.v_call != 'None',
+ rec.j_call and rec.j_call != 'None',
+ rec.functional is not None,
+ rec.sequence_aa_imgt,
+ rec.junction_aa]
+ else:
+ valid = [rec.v_call and rec.v_call != 'None',
+ rec.j_call and rec.j_call != 'None',
+ rec.functional is not None,
+ rec.sequence_imgt,
+ rec.junction]
+ return all(valid)
+
# Function to check for valid records strictly
def _strict(rec):
if amino_acid:
@@ -224,7 +303,7 @@ def writeDb(records, fields, aligner_file, total_count, id_dict=None, annotation
return all(valid)
# Function to check for valid records loosely
- def _gentle(rec):
+ def _partial(rec):
valid = [rec.v_call and rec.v_call != 'None',
rec.d_call and rec.d_call != 'None',
rec.j_call and rec.j_call != 'None']
@@ -238,24 +317,9 @@ def writeDb(records, fields, aligner_file, total_count, id_dict=None, annotation
else:
printError('Invalid output writer.')
- # Additional annotation (e.g. 10X cell calls)
- # _append_table = None
- # if cellranger_file is not None:
- # with open(cellranger_file) as csv_file:
- # # Read in annotation file (use Sniffer to discover file delimiters)
- # dialect = csv.Sniffer().sniff(csv_file.readline())
- # csv_file.seek(0)
- # csv_reader = csv.DictReader(csv_file, dialect = dialect)
- #
- # # Generate annotation dictionary
- # anntab_dict = {entry['contig_id']: {cellranger_map[field]: entry[field] \
- # for field in cellranger_map.keys()} for entry in csv_reader}
- #
- # fields = _annotate(fields, cellranger_map.values())
- # _append_table = lambda sequence_id: anntab_dict[sequence_id]
-
# Set pass criteria
- _pass = _gentle if partial else _strict
+ validate_map = {'strict': _strict, 'gentle': _gentle, 'partial': _partial}
+ _pass = validate_map.get(validate, _gentle)
# Define log handle
if out_args['log_file'] is None:
@@ -357,26 +421,26 @@ def writeDb(records, fields, aligner_file, total_count, id_dict=None, annotation
return output
-def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, partial=False, asis_id=True,
+def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, validate='strict', asis_id=True,
extended=False, format=default_format, out_file=None, out_args=default_out_args,
imgt_id_len=default_imgt_id_len):
"""
Main for IMGT aligned sample sequences.
Arguments:
- aligner_file : zipped file or unzipped folder output by IMGT.
- seq_file : FASTA file input to IMGT (from which to get seqID).
- repo : folder with germline repertoire files.
- partial : If True put incomplete alignments in the pass file.
- asis_id : if ID is to be parsed for pRESTO output with default delimiters.
- extended : if True add alignment score, FWR, CDR and junction fields to output file.
- format : output format. one of 'changeo' or 'airr'.
- out_file : output file name. Automatically generated from the input file if None.
- out_args : common output argument dictionary from parseCommonArgs.
- imgt_id_len: maximum character length of sequence identifiers reported by IMGT/HighV-QUEST.
+ aligner_file (str): zipped file or unzipped folder output by IMGT.
+ seq_file (str): FASTA file input to IMGT (from which to get seqID).
+ repo (str): folder with germline repertoire files.
+ validate (str): validation criteria for passing records; one of 'strict', 'gentle', or 'partial'.
+ asis_id (bool): if ID is to be parsed for pRESTO output with default delimiters.
+ extended (bool): if True add alignment score, FWR, CDR and junction fields to output file.
+ format (str): output format. one of 'changeo' or 'airr'.
+ out_file (str): output file name. Automatically generated from the input file if None.
+ out_args (dict): common output argument dictionary from parseCommonArgs.
+ imgt_id_len (int): maximum character length of sequence identifiers reported by IMGT/HighV-QUEST.
Returns:
- dict : names of the 'pass' and 'fail' output files.
+ dict: names of the 'pass' and 'fail' output files.
"""
# Print parameter info
log = OrderedDict()
@@ -385,7 +449,7 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
log['ALIGNER_FILE'] = aligner_file
log['SEQ_FILE'] = os.path.basename(seq_file) if seq_file else ''
log['ASIS_ID'] = asis_id
- log['PARTIAL'] = partial
+ log['VALIDATE'] = validate
log['EXTENDED'] = extended
printLog(log)
@@ -443,7 +507,7 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
germ_iter = (addGermline(x, references) for x in parse_iter)
# Write db
output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
- annotations=annotations, id_dict=id_dict, asis_id=asis_id, partial=partial,
+ annotations=annotations, id_dict=id_dict, asis_id=asis_id, validate=validate,
writer=writer, out_file=out_file, out_args=out_args)
# Cleanup temp directory
@@ -452,7 +516,7 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
return output
-def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file=None, partial=False,
+def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file=None, validate='strict',
asis_id=True, asis_calls=False, extended=False, regions='default', infer_junction=False,
format='changeo', out_file=None, out_args=default_out_args):
"""
@@ -463,7 +527,7 @@ def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file
seq_file (str): fasta file input to IgBlast (from which to get sequence).
repo (str): folder with germline repertoire files.
amino_acid (bool): if True then the IgBLAST output files are results from igblastp. igblastn is assumed if False.
- partial : If True put incomplete alignments in the pass file.
+ validate (str): validation criteria for passing records; one of 'strict', 'gentle', or 'partial'.
asis_id (bool): if ID is to be parsed for pRESTO output with default delimiters.
asis_calls (bool): if True do not parse gene calls for allele names.
extended (bool): if True add alignment scores, FWR regions, and CDR regions to the output.
@@ -474,7 +538,7 @@ def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file
out_args (dict): common output argument dictionary from parseCommonArgs.
Returns:
- dict : names of the 'pass' and 'fail' output files.
+ dict: names of the 'pass' and 'fail' output files.
"""
# Print parameter info
log = OrderedDict()
@@ -484,7 +548,7 @@ def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file
log['SEQ_FILE'] = os.path.basename(seq_file)
log['ASIS_ID'] = asis_id
log['ASIS_CALLS'] = asis_calls
- log['PARTIAL'] = partial
+ log['VALIDATE'] = validate
log['EXTENDED'] = extended
log['INFER_JUNCTION'] = infer_junction
printLog(log)
@@ -540,30 +604,30 @@ def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file
parse_iter = parser(f, seq_dict, references, regions=regions, asis_calls=asis_calls, infer_junction=infer_junction)
germ_iter = (addGermline(x, references, amino_acid=amino_acid) for x in parse_iter)
output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
- annotations=annotations, amino_acid=amino_acid, partial=partial, asis_id=asis_id,
+ annotations=annotations, amino_acid=amino_acid, validate=validate, asis_id=asis_id,
regions=regions, writer=writer, out_file=out_file, out_args=out_args)
return output
-def parseIHMM(aligner_file, seq_file, repo, cellranger_file=None, partial=False, asis_id=True,
+def parseIHMM(aligner_file, seq_file, repo, cellranger_file=None, validate='strict', asis_id=True,
extended=False, format=default_format, out_file=None, out_args=default_out_args):
"""
Main for iHMMuneAlign aligned sample sequences.
Arguments:
- aligner_file : iHMMune-Align output file to process.
- seq_file : fasta file input to iHMMuneAlign (from which to get sequence).
- repo : folder with germline repertoire files.
- partial : If True put incomplete alignments in the pass file.
- asis_id : if ID is to be parsed for pRESTO output with default delimiters.
- extended : if True parse alignment scores, FWR and CDR region fields.
- format : output format. One of 'changeo' or 'airr'.
- out_file : output file name. Automatically generated from the input file if None.
- out_args : common output argument dictionary from parseCommonArgs.
+ aligner_file (str): iHMMune-Align output file to process.
+ seq_file (str): fasta file input to iHMMuneAlign (from which to get sequence).
+ repo (str): folder with germline repertoire files.
+ validate (str): validation criteria for passing records; one of 'strict', 'gentle', or 'partial'.
+ asis_id (bool): if ID is to be parsed for pRESTO output with default delimiters.
+ extended (bool): if True parse alignment scores, FWR and CDR region fields.
+ format (str): output format. One of 'changeo' or 'airr'.
+ out_file (str): output file name. Automatically generated from the input file if None.
+ out_args (dict): common output argument dictionary from parseCommonArgs.
Returns:
- dict : names of the 'pass' and 'fail' output files.
+ dict: names of the 'pass' and 'fail' output files.
"""
# Print parameter info
log = OrderedDict()
@@ -572,7 +636,7 @@ def parseIHMM(aligner_file, seq_file, repo, cellranger_file=None, partial=False,
log['ALIGNER_FILE'] = os.path.basename(aligner_file)
log['SEQ_FILE'] = os.path.basename(seq_file)
log['ASIS_ID'] = asis_id
- log['PARTIAL'] = partial
+ log['VALIDATE'] = validate
log['EXTENDED'] = extended
printLog(log)
@@ -619,12 +683,125 @@ def parseIHMM(aligner_file, seq_file, repo, cellranger_file=None, partial=False,
parse_iter = IHMMuneReader(f, seq_dict, references)
germ_iter = (addGermline(x, references) for x in parse_iter)
output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
- annotations=annotations, asis_id=asis_id, partial=partial,
+ annotations=annotations, asis_id=asis_id, validate=validate,
writer=writer, out_file=out_file, out_args=out_args)
return output
+
+def numberAIRR(aligner_file, repo=None, format=default_format,
+ out_file=None, out_args=default_out_args):
+ """
+ Inserts IMGT numbering into V fields
+
+ Arguments:
+ aligner_file (str): AIRR Rearrangement file from the alignment tool.
+ repo (str): folder with germline repertoire files. If None, do not updated alignment columns with IMGT gaps.
+ format (str): output format.
+ out_file (str): output file name. Automatically generated from the input file if None.
+ out_args (dict): common output argument dictionary from parseCommonArgs.
+
+ Returns:
+ str: output file name.
+ """
+ log = OrderedDict()
+ log['START'] = 'MakeDb'
+ log['COMMAND'] = 'number'
+ log['ALIGNER_FILE'] = os.path.basename(aligner_file)
+ printLog(log)
+
+ # Define format operators
+ try:
+ reader, writer, schema = getFormatOperators(format)
+ except ValueError:
+ printError('Invalid format %s.' % format)
+
+ # Open input
+ db_handle = open(aligner_file, 'rt')
+ db_iter = reader(db_handle)
+
+ # Define log handle
+ if out_args['log_file'] is None:
+ log_handle = None
+ else:
+ log_handle = open(out_args['log_file'], 'w')
+
+ # Check for required columns
+ try:
+ required = ['sequence_imgt', 'v_germ_start_imgt']
+ checkFields(required, db_iter.fields, schema=schema)
+ except LookupError as e:
+ printError(e)
+
+ # Load references
+ reference_dict = readGermlines(repo)
+
+ # Check for IMGT-gaps in germlines
+ if all('...' not in x for x in reference_dict.values()):
+ printWarning('Germline reference sequences do not appear to contain IMGT-numbering spacers. Results may be incorrect.')
+
+ # Open output writer
+ if out_file is not None:
+ pass_handle = open(out_file, 'w')
+ else:
+ pass_handle = getOutputHandle(aligner_file, out_label='db-pass', out_dir=out_args['out_dir'],
+ out_name=out_args['out_name'], out_type=schema.out_type)
+ pass_writer = writer(pass_handle, fields=db_iter.fields)
+
+ if out_args['failed']:
+ fail_handle = getOutputHandle(aligner_file, out_label='db-fail', out_dir=out_args['out_dir'],
+ out_name=out_args['out_name'], out_type=schema.out_type)
+ fail_writer = writer(fail_handle, fields=db_iter.fields)
+
+ # Count records
+ result_count = countDbFile(aligner_file)
+
+ # Iterate over records
+ start_time = time()
+ rec_count = pass_count = fail_count= 0
+ for rec in db_iter:
+ # Print progress for previous iteration
+ printProgress(rec_count, result_count, 0.05, start_time=start_time)
+ rec_count += 1
+ # Update IMGT fields
+ imgt_dict = correctIMGTFields(rec, reference_dict)
+ # Write records
+ if imgt_dict is not None:
+ pass_count += 1
+ rec.setDict(imgt_dict, parse=False)
+ pass_writer.writeReceptor(rec)
+ else:
+ fail_count += 1
+ # Write row to fail file if specified
+ if out_args['failed']:
+ fail_writer.writeReceptor(rec)
+ # Write log
+ if log_handle is not None:
+ log = OrderedDict([('ID', rec.sequence_id),
+ ('V_CALL', rec.v_call),
+ ('D_CALL', rec.d_call),
+ ('J_CALL', rec.j_call),
+ ('PRODUCTIVE', rec.functional)])
+ printLog(log, log_handle)
+
+ # Print counts
+ printProgress(rec_count, result_count, 0.05, start_time=start_time)
+ log = OrderedDict()
+ log['OUTPUT'] = os.path.basename(pass_handle.name)
+ log['RECORDS'] = rec_count
+ log['PASS'] = pass_count
+ log['FAIL'] = rec_count - pass_count
+ log['END'] = 'MakeDb'
+ printLog(log)
+
+ # Close file handles
+ pass_handle.close()
+ db_handle.close()
+
+ return pass_handle.name
+
+
def getArgParser():
"""
Defines the ArgumentParser.
@@ -645,7 +822,7 @@ def getArgParser():
universal output fields:
sequence_id, sequence, sequence_alignment, germline_alignment,
rev_comp, productive, stop_codon, vj_in_frame, locus,
- v_call, d_call, j_call, junction, junction_length, junction_aa,
+ v_call, d_call, j_call, c_call, junction, junction_length, junction_aa,
v_sequence_start, v_sequence_end, v_germline_start, v_germline_end,
d_sequence_start, d_sequence_end, d_germline_start, d_germline_end,
j_sequence_start, j_sequence_end, j_germline_start, j_germline_end,
@@ -663,8 +840,8 @@ def getArgParser():
ihmm specific output fields:
vdj_score
- 10X specific output fields:
- cell_id, c_call, consensus_count, umi_count,
+ 10x specific output fields:
+ cell_id, consensus_count, umi_count,
v_call_10x, d_call_10x, j_call_10x,
junction_10x, junction_10x_aa
''')
@@ -702,7 +879,7 @@ def getArgParser():
required=True,
help='''List of input FASTA files (with .fasta, .fna or .fa
extension), containing sequences.''')
- group_igblast.add_argument('--10x', action='store', nargs='+', dest='cellranger_file',
+ group_igblast.add_argument('--10x', action='store', nargs='+', dest='cellranger_files',
help='''Table file containing 10X annotations (with .csv or .tsv
extension).''')
group_igblast.add_argument('--asis-id', action='store_true', dest='asis_id',
@@ -716,12 +893,6 @@ def getArgParser():
in both the IgBLAST output and reference database. Note, this requires
the sequence identifiers in the reference sequence set and the IgBLAST
database to be exact string matches.''')
- group_igblast.add_argument('--partial', action='store_true', dest='partial',
- help='''If specified, include incomplete V(D)J alignments in
- the pass file instead of the fail file. An incomplete alignment
- is defined as a record for which a valid IMGT-gapped sequence
- cannot be built or that is missing a V gene assignment,
- J gene assignment, junction region, or productivity call.''')
group_igblast.add_argument('--extended', action='store_true', dest='extended',
help='''Specify to include additional aligner specific fields in the output.
Adds <vdj>_score, <vdj>_identity, <vdj>_support, <vdj>_cigar,
@@ -730,9 +901,20 @@ def getArgParser():
choices=('default', 'rhesus-igl'), default='default',
help='''IMGT CDR and FWR boundary definition to use.''')
group_igblast.add_argument('--infer-junction', action='store_true', dest='infer_junction',
- help='''Infer the junction sequence. For use with IgBLAST v1.6.0 or older,
- prior to the addition of IMGT-CDR3 inference.''')
- parser_igblast.set_defaults(func=parseIgBLAST, amino_acid=False)
+ help='''Infer the junction sequence. For use with IgBLAST v1.6.0 or older,
+ prior to the addition of IMGT-CDR3 inference.''')
+ group_igblast_validate = group_igblast.add_mutually_exclusive_group(required=False)
+ # group_igblast_validate.add_argument('--strict', action='store_const', const='strict', dest='validate',
+ # help='''By default, passing records must contain valid values for the
+ # V gene, J gene, junction region, and productivity call. If specified,
+ # this argument adds the additional requirement that the junction region must
+ # start at position 310 in the IMGT-numbered sequence.''')
+ group_igblast_validate.add_argument('--partial', action='store_const', const='partial', dest='validate',
+ help='''If specified, include incomplete V(D)J alignments in
+ the pass file instead of the fail file. An incomplete alignment
+ is defined as a record that is missing a V gene assignment,
+ J gene assignment, junction region, or productivity call.''')
+ parser_igblast.set_defaults(func=parseIgBLAST, amino_acid=False, validate='strict')
# igblastp output parser
parser_igblast_aa = subparsers.add_parser('igblast-aa', parents=[parser_parent],
@@ -751,7 +933,7 @@ def getArgParser():
group_igblast_aa.add_argument('-s', action='store', nargs='+', dest='seq_files', required=True,
help='''List of input FASTA files (with .fasta, .fna or .fa
extension), containing sequences.''')
- group_igblast_aa.add_argument('--10x', action='store', nargs='+', dest='cellranger_file',
+ group_igblast_aa.add_argument('--10x', action='store', nargs='+', dest='cellranger_files',
help='''Table file containing 10X annotations (with .csv or .tsv extension).''')
group_igblast_aa.add_argument('--asis-id', action='store_true', dest='asis_id',
help='''Specify to prevent input sequence headers from being parsed
@@ -770,7 +952,7 @@ def getArgParser():
group_igblast_aa.add_argument('--regions', action='store', dest='regions',
choices=('default', 'rhesus-igl'), default='default',
help='''IMGT CDR and FWR boundary definition to use.''')
- parser_igblast_aa.set_defaults(func=parseIgBLAST, partial=True, amino_acid=True)
+ parser_igblast_aa.set_defaults(func=parseIgBLAST, amino_acid=True, validate='strict')
# IMGT aligner
@@ -797,31 +979,37 @@ def getArgParser():
These reference sequences must contain IMGT-numbering spacers (gaps)
in the V segment. If unspecified, the germline sequence reconstruction
will not be included in the output.''')
- group_imgt.add_argument('--10x', action='store', nargs='+', dest='cellranger_file',
+ group_imgt.add_argument('--10x', action='store', nargs='+', dest='cellranger_files',
help='''Table file containing 10X annotations (with .csv or .tsv
extension).''')
+ group_imgt.add_argument('--extended', action='store_true', dest='extended',
+ help='''Specify to include additional aligner specific fields in the output.
+ Adds <vdj>_score, <vdj>_identity>, fwr1, fwr2, fwr3, fwr4,
+ cdr1, cdr2, cdr3, n1_length, n2_length, p3v_length, p5d_length,
+ p3d_length, p5j_length and d_frame.''')
group_imgt.add_argument('--asis-id', action='store_true', dest='asis_id',
help='''Specify to prevent input sequence headers from being parsed
to add new columns to database. Parsing of sequence headers requires
headers to be in the pRESTO annotation format, so this should be specified
when sequence headers are incompatible with the pRESTO annotation scheme.
Note, unrecognized header formats will default to this behavior.''')
- group_imgt.add_argument('--partial', action='store_true', dest='partial',
- help='''If specified, include incomplete V(D)J alignments in
- the pass file instead of the fail file. An incomplete alignment
- is defined as a record that is missing a V gene assignment,
- J gene assignment, junction region, or productivity call.''')
- group_imgt.add_argument('--extended', action='store_true', dest='extended',
- help='''Specify to include additional aligner specific fields in the output.
- Adds <vdj>_score, <vdj>_identity>, fwr1, fwr2, fwr3, fwr4,
- cdr1, cdr2, cdr3, n1_length, n2_length, p3v_length, p5d_length,
- p3d_length, p5j_length and d_frame.''')
group_imgt.add_argument('--imgt-id-len', action='store', dest='imgt_id_len', type=int,
default=default_imgt_id_len,
help='''The maximum character length of sequence identifiers reported by IMGT/HighV-QUEST.
- Specify 50 if the IMGT files (-i) were generated with an IMGT/HighV-QUEST version older
- than 1.8.3 (May 7, 2021).''')
- parser_imgt.set_defaults(func=parseIMGT)
+ Specify 50 if the IMGT files (-i) were generated with an IMGT/HighV-QUEST version older
+ than 1.8.3 (May 7, 2021).''')
+ group_imgt_validate = group_imgt.add_mutually_exclusive_group(required=False)
+ # group_imgt_validate.add_argument('--strict', action='store_const', const='strict', dest='validate',
+ # help='''By default, passing records must contain valid values for the
+ # V gene, J gene, junction region, and productivity call. If specified,
+ # this argument adds the additional requirement that the junction region must
+ # start at position 310 in the IMGT-numbered sequence.''')
+ group_imgt_validate.add_argument('--partial', action='store_const', const='partial', dest='validate',
+ help='''If specified, include incomplete V(D)J alignments in
+ the pass file instead of the fail file. An incomplete alignment
+ is defined as a record that is missing a V gene assignment,
+ J gene assignment, junction region, or productivity call.''')
+ parser_imgt.set_defaults(func=parseIMGT, validate='strict')
# iHMMuneAlign Aligner
parser_ihmm = subparsers.add_parser('ihmm', parents=[parser_parent],
@@ -841,7 +1029,7 @@ def getArgParser():
required=True,
help='''List of input FASTA files (with .fasta, .fna or .fa
extension) containing sequences.''')
- group_ihmm.add_argument('--10x', action='store', nargs='+', dest='cellranger_file',
+ group_ihmm.add_argument('--10x', action='store', nargs='+', dest='cellranger_files',
help='''Table file containing 10X annotations (with .csv or .tsv
extension).''')
group_ihmm.add_argument('--asis-id', action='store_true', dest='asis_id',
@@ -850,17 +1038,41 @@ def getArgParser():
headers to be in the pRESTO annotation format, so this should be specified
when sequence headers are incompatible with the pRESTO annotation scheme.
Note, unrecognized header formats will default to this behavior.''')
- group_ihmm.add_argument('--partial', action='store_true', dest='partial',
- help='''If specified, include incomplete V(D)J alignments in
- the pass file instead of the fail file. An incomplete alignment
- is defined as a record for which a valid IMGT-gapped sequence
- cannot be built or that is missing a V gene assignment,
- J gene assignment, junction region, or productivity call.''')
group_ihmm.add_argument('--extended', action='store_true', dest='extended',
help='''Specify to include additional aligner specific fields in the output.
Adds the path score of the iHMMune-Align hidden Markov model as vdj_score;
adds fwr1, fwr2, fwr3, fwr4, cdr1, cdr2 and cdr3.''')
- parser_ihmm.set_defaults(func=parseIHMM)
+ group_ihmm_validate = group_ihmm.add_mutually_exclusive_group(required=False)
+ # group_ihmm_validate.add_argument('--strict', action='store_const', const='strict', dest='validate',
+ # help='''By default, passing records must contain valid values for the
+ # V gene, J gene, junction region, and productivity call. If specified,
+ # this argument adds the additional requirement that the junction region must
+ # start at position 310 in the IMGT-numbered sequence.''')
+ group_ihmm_validate.add_argument('--partial', action='store_const', const='partial', dest='validate',
+ help='''If specified, include incomplete V(D)J alignments in
+ the pass file instead of the fail file. An incomplete alignment
+ is defined as a record that is missing a V gene assignment,
+ J gene assignment, junction region, or productivity call.''')
+ parser_ihmm.set_defaults(func=parseIHMM, validate='strict')
+
+ # Subparser to normalize AIRR file with IMGT-numbering
+ # desc_number = dedent('''
+ # Inserts IMGT numbering spacers into sequence_alignment, rebuilds the germline sequence
+ # in germline_alignment, and adjusts the values in the coordinate fields v_germline_start
+ # and v_germline_end accordingly.
+ # ''')
+ # parser_number = subparsers.add_parser('number', parents=[parser_parent],
+ # formatter_class=CommonHelpFormatter, add_help=False,
+ # help='Add IMGT-numbering to an AIRR Rearrangement TSV.',
+ # description=desc_number)
+ # group_number = parser_number.add_argument_group('aligner parsing arguments')
+ # group_number.add_argument('-i', nargs='+', action='store', dest='aligner_files', required=True,
+ # help='''AIRR Rearrangement TSV files.''')
+ # group_number.add_argument('-r', nargs='+', action='store', dest='repo', required=False,
+ # help='''List of folders and/or fasta files containing
+ # IMGT-numbered germline sequences corresponding to the
+ # set of germlines used for the alignment.''')
+ # parser_number.set_defaults(func=numberAIRR)
return parser
@@ -881,6 +1093,7 @@ if __name__ == "__main__":
# Delete
if 'aligner_files' in args_dict: del args_dict['aligner_files']
if 'seq_files' in args_dict: del args_dict['seq_files']
+ if 'cellranger_files' in args_dict: del args_dict['cellranger_files']
if 'out_files' in args_dict: del args_dict['out_files']
if 'command' in args_dict: del args_dict['command']
if 'func' in args_dict: del args_dict['func']
@@ -888,10 +1101,12 @@ if __name__ == "__main__":
# Call main
for i, f in enumerate(args.__dict__['aligner_files']):
args_dict['aligner_file'] = f
- args_dict['seq_file'] = args.__dict__['seq_files'][i] \
- if args.__dict__['seq_files'] else None
args_dict['out_file'] = args.__dict__['out_files'][i] \
if args.__dict__['out_files'] else None
- args_dict['cellranger_file'] = args.__dict__['cellranger_file'][i] \
- if args.__dict__['cellranger_file'] else None
+ if 'seq_files' in args.__dict__:
+ args_dict['seq_file'] = args.__dict__['seq_files'][i] \
+ if args.__dict__['seq_files'] else None
+ if 'cellranger_files' in args.__dict__:
+ args_dict['cellranger_file'] = args.__dict__['cellranger_files'][i] \
+ if args.__dict__['cellranger_files'] else None
args.func(**args_dict)
=====================================
changeo.egg-info/PKG-INFO
=====================================
@@ -1,35 +1,13 @@
-Metadata-Version: 1.1
+Metadata-Version: 2.1
Name: changeo
-Version: 1.2.0
+Version: 1.3.0
Summary: A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
Home-page: http://changeo.readthedocs.io
+Download-URL: https://bitbucket.org/kleinstein/changeo/downloads
Author: Namita Gupta, Jason Anthony Vander Heiden
Author-email: immcantation at googlegroups.com
License: GNU Affero General Public License 3 (AGPL-3)
-Download-URL: https://bitbucket.org/kleinstein/changeo/downloads
-Description: .. image:: https://img.shields.io/pypi/dm/changeo
- :target: https://pypi.org/project/changeo
- .. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic
- :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html
-
- Change-O - Repertoire clonal assignment toolkit
- ================================================================================
-
- Change-O is a collection of tools for processing the output of V(D)J alignment
- tools, assigning clonal clusters to immunoglobulin (Ig) sequences, and
- reconstructing germline sequences.
-
- Dramatic improvements in high-throughput sequencing technologies now enable
- large-scale characterization of Ig repertoires, defined as the collection of
- trans-membrane antigen-receptor proteins located on the surface of B cells and
- T cells. Change-O is a suite of utilities to facilitate advanced analysis of
- Ig and TCR sequences following germline segment assignment. Change-O
- handles output from IMGT/HighV-QUEST and IgBLAST, and provides a wide variety of
- clustering methods for assigning clonal groups to Ig sequences. Record sorting,
- grouping, and various database manipulation operations are also included.
-
Keywords: bioinformatics,sequencing,immunology,adaptive immunity,immunoglobulin,AIRR-seq,Rep-Seq,B cell repertoire analysis,adaptive immune receptor repertoires
-Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
@@ -37,3 +15,25 @@ Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.4
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
+License-File: LICENSE
+
+.. image:: https://img.shields.io/pypi/dm/changeo
+ :target: https://pypi.org/project/changeo
+.. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic
+ :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html
+
+Change-O - Repertoire clonal assignment toolkit
+================================================================================
+
+Change-O is a collection of tools for processing the output of V(D)J alignment
+tools, assigning clonal clusters to immunoglobulin (Ig) sequences, and
+reconstructing germline sequences.
+
+Dramatic improvements in high-throughput sequencing technologies now enable
+large-scale characterization of Ig repertoires, defined as the collection of
+trans-membrane antigen-receptor proteins located on the surface of B cells and
+T cells. Change-O is a suite of utilities to facilitate advanced analysis of
+Ig and TCR sequences following germline segment assignment. Change-O
+handles output from IMGT/HighV-QUEST and IgBLAST, and provides a wide variety of
+clustering methods for assigning clonal groups to Ig sequences. Record sorting,
+grouping, and various database manipulation operations are also included.
=====================================
changeo/Applications.py
=====================================
@@ -8,6 +8,7 @@ __author__ = 'Jason Anthony Vander Heiden'
# Imports
import os
import re
+from pkg_resources import parse_version
from subprocess import check_output, STDOUT, CalledProcessError
# Presto and changeo imports
@@ -94,8 +95,8 @@ def runIgPhyML(rep_file, rep_dir, model='HLP17', motifs='FCH',
return None
-def runIgBLASTN(fasta, igdata, loci='ig', organism='human', vdb=None, ddb=None, jdb=None, output=None,
- format=default_igblast_output, threads=1, exec=default_igblastn_exec):
+def runIgBLASTN(fasta, igdata, loci='ig', organism='human', vdb=None, ddb=None, jdb=None, cdb=None,
+ output=None, format=default_igblast_output, threads=1, exec=default_igblastn_exec):
"""
Runs igblastn on a sequence file
@@ -107,6 +108,7 @@ def runIgBLASTN(fasta, igdata, loci='ig', organism='human', vdb=None, ddb=None,
vdb (str): name of a custom V reference in the database folder to use.
ddb (str): name of a custom D reference in the database folder to use.
jdb (str): name of a custom J reference in the database folder to use.
+ cdb (str): name of a custom C reference in the database folder to use.
output (str): output file name. If None, automatically generate from the fasta file name.
format (str): output format. One of 'blast' or 'airr'.
threads (int): number of threads for igblastn.
@@ -123,12 +125,14 @@ def runIgBLASTN(fasta, igdata, loci='ig', organism='human', vdb=None, ddb=None,
# GERMLINE_V = "imgt_${SPECIES}_${RECEPTOR}_v"
# GERMLINE_D = "imgt_${SPECIES}_${RECEPTOR}_d"
# GERMLINE_J = "imgt_${SPECIES}_${RECEPTOR}_j"
+ # GERMLINE_C = "imgt_${SPECIES}_${RECEPTOR}_c"
# AUXILIARY = "${SPECIES}_gl.aux"
# IGBLAST_DB = "${IGDATA}/database"
# IGBLAST_CMD = "igblastn \
# -germline_db_V ${IGBLAST_DB}/${GERMLINE_V} \
# -germline_db_D ${IGBLAST_DB}/${GERMLINE_D} \
# -germline_db_J ${IGBLAST_DB}/${GERMLINE_J} \
+ # -c_region_db ${IGBLAST_DB}/${GERMLINE_C} \
# -auxiliary_data ${IGDATA}/optional_file/${AUXILIARY} \
# -ig_seqtype ${SEQTYPE[${RECEPTOR}]} -organism ${SPECIES} \
# -domain_system imgt -outfmt '7 std qseq sseq btop'"
@@ -139,6 +143,7 @@ def runIgBLASTN(fasta, igdata, loci='ig', organism='human', vdb=None, ddb=None,
# IGBLAST_VER =$(${IGBLAST_CMD} -version | grep 'Package' | sed s / 'Package: ' //)
# IGBLAST_RUN = "${IGBLAST_CMD} -query ${READFILE} -out ${OUTFILE} -num_threads ${NPROC}"
+ # Define arguments
try:
outfmt = {'blast': '7 std qseq sseq btop', 'airr': '19'}[format]
except KeyError:
@@ -149,8 +154,8 @@ def runIgBLASTN(fasta, igdata, loci='ig', organism='human', vdb=None, ddb=None,
except KeyError:
printError('Invalid receptor type %s.' % loci)
- # Set auxilary data
- auxilary = os.path.join(igdata, 'optional_file', '%s_gl.aux' % organism)
+ # Set auxiliary data
+ auxiliary = os.path.join(igdata, 'optional_file', '%s_gl.aux' % organism)
# Set V database
if vdb is not None: v_germ = os.path.join(igdata, 'database', vdb)
else: v_germ = os.path.join(igdata, 'database', 'imgt_%s_%s_v' % (organism, loci))
@@ -160,6 +165,9 @@ def runIgBLASTN(fasta, igdata, loci='ig', organism='human', vdb=None, ddb=None,
# Set J database
if jdb is not None: j_germ = os.path.join(igdata, 'database', jdb)
else: j_germ = os.path.join(igdata, 'database', 'imgt_%s_%s_j' % (organism, loci))
+ # Set C database
+ if cdb is not None: c_germ = os.path.join(igdata, 'database', cdb)
+ else: c_germ = os.path.join(igdata, 'database', 'imgt_%s_%s_c' % (organism, loci))
# Define IgBLAST command
cmd = [exec,
@@ -168,13 +176,19 @@ def runIgBLASTN(fasta, igdata, loci='ig', organism='human', vdb=None, ddb=None,
'-num_threads', str(threads),
'-ig_seqtype', seqtype,
'-organism', organism,
- '-auxiliary_data', str(auxilary),
+ '-auxiliary_data', str(auxiliary),
'-germline_db_V', str(v_germ),
'-germline_db_D', str(d_germ),
'-germline_db_J', str(j_germ),
'-outfmt', outfmt,
'-domain_system', 'imgt']
+ # Add C-region arguments for igblastn v1.18.0
+ version = getIgBLASTVersion(exec=exec)
+ if parse_version(version) >= parse_version('1.18.0'):
+ cmd = cmd + ['-c_region_db', str(c_germ)]
+ #print(cmd)
+
# Execute IgBLAST
env = os.environ.copy()
env['IGDATA'] = igdata
=====================================
changeo/Gene.py
=====================================
@@ -22,6 +22,7 @@ locus_regex = re.compile(r'(IG[HLK]|TR[ABGD])')
v_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])V[A-R0-9]+[-/\w]*[-\*][\.\w]+)')
d_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])D[A-R0-9]+[-/\w]*[-\*][\.\w]+)')
j_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])J[A-R0-9]+[-/\w]*[-\*][\.\w]+)')
+c_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])([DMAGEC][P0-9]?[A-Z]?)[\*][\.\w]+)')
c_gene_regex = re.compile(r'((IG[HLK]|TR[ABGD])([DMAGEC][P0-9]?[A-Z]?))')
@@ -133,22 +134,6 @@ def getAlleleNumber(gene, action='first'):
return parseGeneCall(gene, allele_number_regex, action=action)
-def getCGene(gene, action='first'):
- """
- Extract C-region gene from gene call string
-
- Arguments:
- gene (str): string with C-region gene calls
- action (str): action to perform for multiple alleles;
- one of ('first', 'set', 'list').
-
- Returns:
- str: String of the first C-region gene call when action is 'first'.
- tuple: Tuple of gene calls for 'set' or 'list' actions.
- """
- return parseGeneCall(gene, c_gene_regex, action=action)
-
-
def getVAllele(gene, action='first'):
"""
Extract V allele gene from gene call string
@@ -197,6 +182,38 @@ def getJAllele(gene, action='first'):
return parseGeneCall(gene, j_allele_regex, action=action)
+def getCAllele(gene, action='first'):
+ """
+ Extract C-region allele gene call string
+
+ Arguments:
+ gene (str): string with C-region gene calls
+ action (str): action to perform for multiple alleles;
+ one of ('first', 'set', 'list').
+
+ Returns:
+ str: String of the first C-region allele call when action is 'first'.
+ tuple: Tuple of allele calls for 'set' or 'list' actions.
+ """
+ return parseGeneCall(gene, c_allele_regex, action=action)
+
+
+def getCGene(gene, action='first'):
+ """
+ Extract C-region gene from gene call string
+
+ Arguments:
+ gene (str): string with C-region gene calls
+ action (str): action to perform for multiple alleles;
+ one of ('first', 'set', 'list').
+
+ Returns:
+ str: String of the first C-region gene call when action is 'first'.
+ tuple: Tuple of gene calls for 'set' or 'list' actions.
+ """
+ return parseGeneCall(gene, c_gene_regex, action=action)
+
+
# TODO: this is not generalized for non-IMGT gapped sequences!
def getVGermline(receptor, references, v_field=v_attr, amino_acid=False):
"""
=====================================
changeo/IO.py
=====================================
@@ -20,7 +20,7 @@ from Bio.Seq import Seq
# Presto and changeo imports
from presto.IO import getFileType, printError, printWarning, printDebug
from changeo.Defaults import default_csv_size
-from changeo.Gene import getAllele, getLocus, getVAllele, getDAllele, getJAllele
+from changeo.Gene import getAllele, getLocus, getVAllele, getDAllele, getJAllele, getCAllele
from changeo.Receptor import AIRRSchema, AIRRSchemaAA, ChangeoSchema, ChangeoSchemaAA, Receptor, ReceptorData
from changeo.Alignment import decodeBTOP, encodeCIGAR, padAlignment, gapV, inferJunction, \
RegionDefinition, getRegions
@@ -943,6 +943,7 @@ class IgBLASTReader:
summary_map = {'Top V gene match': 'v_match',
'Top D gene match': 'd_match',
'Top J gene match': 'j_match',
+ 'Top C gene match': 'c_match',
'Chain type': 'chain_type',
'stop codon': 'stop_codon',
'V-J frame': 'vj_frame',
@@ -1041,13 +1042,16 @@ class IgBLASTReader:
v_call = getVAllele(summary['v_match'], action='list')
d_call = getDAllele(summary['d_match'], action='list')
j_call = getJAllele(summary['j_match'], action='list')
+ c_call = getCAllele(summary['c_match'], action='list')
result['v_call'] = ','.join(v_call) if v_call else None
result['d_call'] = ','.join(d_call) if d_call else None
result['j_call'] = ','.join(j_call) if j_call else None
+ result['c_call'] = ','.join(c_call) if c_call else None
else:
result['v_call'] = None if summary['v_match'] == 'N/A' else summary['v_match']
result['d_call'] = None if summary['d_match'] == 'N/A' else summary['d_match']
result['j_call'] = None if summary['j_match'] == 'N/A' else summary['j_match']
+ result['c_call'] = None if summary['c_match'] == 'N/A' else summary['c_match']
# Parse locus
locus = None if summary['chain_type'] == 'N/A' else summary['chain_type']
@@ -2198,7 +2202,7 @@ def readGermlines(references, asis=False, warn=False):
repo_dict = {}
duplicates = []
for file_name in repo_files:
- with open(file_name, 'rU') as file_handle:
+ with open(file_name, 'r') as file_handle:
germlines = SeqIO.parse(file_handle, 'fasta')
for g in germlines:
germ_key = getAllele(g.description, 'first') if not asis else g.id
@@ -2237,6 +2241,10 @@ def extractIMGT(imgt_output):
# Extract required files
imgt_files = sorted([n for n in imgt_zip.namelist() \
if os.path.basename(n).startswith(imgt_names)])
+ if len(imgt_files) < len(imgt_names):
+ print("")
+ printError('Missing necessary file(s) in IMGT output %s.' % imgt_output + ' Expecting:' + ', '.join(imgt_names))
+
imgt_zip.extractall(temp_dir.name, imgt_files)
# Define file dictionary
imgt_dict = {k: os.path.join(temp_dir.name, f) for k, f in zip_longest(imgt_keys, imgt_files)}
@@ -2248,6 +2256,10 @@ def extractIMGT(imgt_output):
# Define file dictionary
imgt_files = sorted([n for n in folder_files \
if os.path.basename(n).startswith(imgt_names)])
+ if len(imgt_files) < len(imgt_names):
+ print("")
+ printError('Missing necessary file(s) in IMGT output %s.' % imgt_output + ' Expecting:' + ', '.join(imgt_names))
+
imgt_dict = {k: f for k, f in zip_longest(imgt_keys, imgt_files)}
# Tarball input
elif tarfile.is_tarfile(imgt_output):
@@ -2255,6 +2267,10 @@ def extractIMGT(imgt_output):
# Extract required files
imgt_files = sorted([n for n in imgt_tar.getnames() \
if os.path.basename(n).startswith(imgt_names)])
+ if len(imgt_files) < len(imgt_names):
+ print("")
+ printError('Missing necessary file(s) in IMGT output %s.' % imgt_output + ' Expecting:' + ', '.join(imgt_names))
+
imgt_tar.extractall(temp_dir.name, [imgt_tar.getmember(n) for n in imgt_files])
# Define file dictionary
imgt_dict = {k: os.path.join(temp_dir.name, f) for k, f in zip_longest(imgt_keys, imgt_files)}
=====================================
changeo/Receptor.py
=====================================
@@ -94,6 +94,7 @@ class AIRRSchema:
'v_call',
'd_call',
'j_call',
+ 'c_call',
'junction',
'junction_length',
'junction_aa',
@@ -129,6 +130,7 @@ class AIRRSchema:
('v_call', 'v_call'),
('d_call', 'd_call'),
('j_call', 'j_call'),
+ ('c_call', 'c_call'),
('junction', 'junction'),
('junction_start', 'junction_start'),
('junction_end', 'junction_end'),
@@ -175,7 +177,6 @@ class AIRRSchema:
('j_germline_aa_start', 'j_germ_aa_start'),
('j_germline_aa_end', 'j_germ_aa_end'),
('j_germline_aa_length', 'j_germ_aa_length'),
- ('c_call', 'c_call'),
('germline_alignment_d_mask', 'germline_imgt_d_mask'),
('v_score', 'v_score'),
('v_identity', 'v_identity'),
@@ -286,6 +287,7 @@ class AIRRSchemaAA(AIRRSchema):
'v_call',
'd_call',
'j_call',
+ 'c_call',
'junction',
'junction_length',
'junction_aa',
@@ -350,6 +352,7 @@ class ChangeoSchema:
('V_CALL', 'v_call'),
('D_CALL', 'd_call'),
('J_CALL', 'j_call'),
+ ('C_CALL', 'c_call'),
('SEQUENCE_VDJ', 'sequence_vdj'),
('SEQUENCE_IMGT', 'sequence_imgt'),
('SEQUENCE_AA_VDJ', 'sequence_aa_vdj'),
@@ -428,7 +431,6 @@ class ChangeoSchema:
('P3D_LENGTH', 'p3d_length'),
('P5J_LENGTH', 'p5j_length'),
('D_FRAME', 'd_frame'),
- ('C_CALL', 'c_call'),
('CDR3_IGBLAST', 'cdr3_igblast'),
('CDR3_IGBLAST_AA', 'cdr3_igblast_aa'),
('CONSCOUNT', 'conscount'),
=====================================
changeo/Version.py
=====================================
@@ -5,5 +5,5 @@ Version and authorship information
__author__ = 'Namita Gupta, Jason Anthony Vander Heiden'
__copyright__ = 'Copyright 2021 Kleinstein Lab, Yale University. All rights reserved.'
__license__ = 'GNU Affero General Public License 3 (AGPL-3)'
-__version__ = '1.2.0'
-__date__ = '2021.10.29'
+__version__ = '1.3.0'
+__date__ = '2022.12.11'
=====================================
debian/changelog
=====================================
@@ -1,3 +1,12 @@
+changeo (1.3.0-1) UNRELEASED; urgency=medium
+
+ * Team upload.
+ * New upstream version
+ * Standards-Version: 4.6.2 (routine-update)
+ TODO: Check autopkgtest which fails with this new version
+
+ -- Andreas Tille <tille at debian.org> Wed, 21 Dec 2022 08:43:09 +0100
+
changeo (1.2.0-1) unstable; urgency=medium
* New upstream version 1.2.0
=====================================
debian/changeo.lintian-overrides
=====================================
@@ -1,10 +1,10 @@
# Ignoreing .py endings to preserve compatibility with
# scripts exchanged in the community.
-changeo: script-with-language-extension usr/bin/AlignRecords.py
-changeo: script-with-language-extension usr/bin/AssignGenes.py
-changeo: script-with-language-extension usr/bin/BuildTrees.py
-changeo: script-with-language-extension usr/bin/ConvertDb.py
-changeo: script-with-language-extension usr/bin/CreateGermlines.py
-changeo: script-with-language-extension usr/bin/DefineClones.py
-changeo: script-with-language-extension usr/bin/MakeDb.py
-changeo: script-with-language-extension usr/bin/ParseDb.py
+changeo: script-with-language-extension [usr/bin/AlignRecords.py]
+changeo: script-with-language-extension [usr/bin/AssignGenes.py]
+changeo: script-with-language-extension [usr/bin/BuildTrees.py]
+changeo: script-with-language-extension [usr/bin/ConvertDb.py]
+changeo: script-with-language-extension [usr/bin/CreateGermlines.py]
+changeo: script-with-language-extension [usr/bin/DefineClones.py]
+changeo: script-with-language-extension [usr/bin/MakeDb.py]
+changeo: script-with-language-extension [usr/bin/ParseDb.py]
=====================================
debian/control
=====================================
@@ -8,7 +8,7 @@ Build-Depends: debhelper-compat (= 13),
dh-python,
python3-all,
python3-setuptools
-Standards-Version: 4.6.0
+Standards-Version: 4.6.2
Vcs-Browser: https://salsa.debian.org/med-team/changeo
Vcs-Git: https://salsa.debian.org/med-team/changeo.git
Homepage: https://changeo.readthedocs.io
View it on GitLab: https://salsa.debian.org/med-team/changeo/-/compare/2ffc5fef15a54f45999b985f78c41440d26841f8...e754f958659327961b28ef5bab3637aeb60b6b6d
--
View it on GitLab: https://salsa.debian.org/med-team/changeo/-/compare/2ffc5fef15a54f45999b985f78c41440d26841f8...e754f958659327961b28ef5bab3637aeb60b6b6d
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20221221/66556e4a/attachment-0001.htm>
More information about the debian-med-commit
mailing list