[med-svn] [Git][med-team/changeo][master] 3 commits: New upstream version 1.1.0
Nilesh Patra (@nilesh)
gitlab at salsa.debian.org
Fri Jul 9 18:40:23 BST 2021
Nilesh Patra pushed to branch master at Debian Med / changeo
Commits:
46594f9a by Nilesh Patra at 2021-07-09T23:08:07+05:30
New upstream version 1.1.0
- - - - -
05bc29b8 by Nilesh Patra at 2021-07-09T23:08:07+05:30
d/tests/run-unit-tests: Change checksums in output
- - - - -
1215c917 by Nilesh Patra at 2021-07-09T23:08:07+05:30
Interim changelog entry
- - - - -
15 changed files:
- INSTALL.rst
- NEWS.rst
- PKG-INFO
- README.rst
- bin/MakeDb.py
- changeo.egg-info/PKG-INFO
- changeo.egg-info/requires.txt
- changeo/Commandline.py
- changeo/Defaults.py
- changeo/Gene.py
- changeo/IO.py
- changeo/Version.py
- debian/changelog
- debian/tests/run-unit-test
- requirements.txt
Changes:
=====================================
INSTALL.rst
=====================================
@@ -23,9 +23,9 @@ The minimum dependencies for installation are:
+ `NumPy 1.8 <http://numpy.org>`__
+ `SciPy 0.14 <http://scipy.org>`__
+ `pandas 0.24 <http://pandas.pydata.org>`__
-+ `Biopython 1.71 <http://biopython.org>`__
++ `Biopython 1.77 <http://biopython.org>`__
+ `presto 0.6.2 <http://presto.readthedocs.io>`__
-+ `airr 1.2.1 <https://docs.airr-community.org>`__
++ `airr 1.3.1 <https://docs.airr-community.org>`__
Some tools wrap external applications that are not required for installation.
Those tools require minimum versions of:
@@ -45,7 +45,7 @@ Linux
Biopython according to its
`instructions <http://biopython.org/DIST/docs/install/Installation.html>`__.
-2. Install `presto 0.5.10 <http://presto.readthedocs.io>`__ or greater.
+2. Install `presto 0.6.2 <http://presto.readthedocs.io>`__ or greater.
3. Download the Change-O bundle and run::
@@ -86,7 +86,7 @@ Mac OS X
> pip3 install numpy scipy pandas biopython
-8. Install `presto 0.5.10 <http://presto.readthedocs.io>`__ or greater.
+8. Install `presto 0.6.2 <http://presto.readthedocs.io>`__ or greater.
9. Download the Change-O bundle, open a terminal window, change directories
to the download folder, and run::
@@ -104,7 +104,7 @@ Windows
`Unofficial Windows binary <http://www.lfd.uci.edu/~gohlke/pythonlibs>`__
collection.
-3. Install `presto 0.5.10 <http://presto.readthedocs.io>`__ or greater.
+3. Install `presto 0.6.2 <http://presto.readthedocs.io>`__ or greater.
4. Download the Change-O bundle, open a Command Prompt, change directories to
the download folder, and run::
=====================================
NEWS.rst
=====================================
@@ -1,13 +1,32 @@
Release Notes
===============================================================================
+Version 1.1.0: June 21, 2021
+-------------------------------------------------------------------------------
+
++ Fixed gene parsing for IMGT temporary designation nomenclature.
++ Updated dependencies to biopython >= v1.77, airr >= v1.3.1, PyYAML>=5.1.
+
+MakeDb:
+
++ Added the ``--imgt-id-len`` argument to accommodate changes introduced in how
+ IMGT/HighV-QUEST truncates sequence identifiers as of version 1.8.3 (May 7, 2021).
+ The header lines in the fasta files are now truncated to 49 characters. In
+ IMGT/HighV-QUEST versions older that 1.8.3, they were truncated to 50 characters.
+ ``--imgt-id-len`` default value is 49. Users should specify ``--imgt-id-len 50``
+ to analyze IMGT results generated with IMGT/HighV-QUEST versions older that 1.8.3.
++ Added the ``--infer-junction`` argument to ``MakeDb igblast``, to enable the inference
+ of the junction sequence when not reported by IgBLAST. Should be used with data from
+ IgBLAST v1.6.0 or older; before igblast added the IMGT-CDR3 inference.
+
+
Version 1.0.2: January 18, 2021
-------------------------------------------------------------------------------
AlignRecords:
+ Fixed a bug caused the program to exit when encountering missing sequence
-data. It will now fail the row or group with missing data and continue.
+ data. It will now fail the row or group with missing data and continue.
MakeDb:
@@ -69,7 +88,7 @@ MakeDb:
+ Add --regions argument to the ``igblast`` and ``igblast-aa`` subcommands
to allow specification of the IMGT CDR/FWR region boundaries. Currently,
the supported specifications are ``default`` (human, mouse) and
- ``rhesus-igl``.
+ ``rhesus-igl``.
Version 0.4.6: July 19, 2019
=====================================
PKG-INFO
=====================================
@@ -1,13 +1,18 @@
Metadata-Version: 1.1
Name: changeo
-Version: 1.0.2
+Version: 1.1.0
Summary: A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
Home-page: http://changeo.readthedocs.io
Author: Namita Gupta, Jason Anthony Vander Heiden
Author-email: immcantation at googlegroups.com
License: GNU Affero General Public License 3 (AGPL-3)
Download-URL: https://bitbucket.org/kleinstein/changeo/downloads
-Description: Change-O - Repertoire clonal assignment toolkit
+Description: .. image:: https://img.shields.io/pypi/dm/changeo
+ :target: https://pypi.org/project/changeo
+ .. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic
+ :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html
+
+ Change-O - Repertoire clonal assignment toolkit
================================================================================
Change-O is a collection of tools for processing the output of V(D)J alignment
=====================================
README.rst
=====================================
@@ -1,3 +1,8 @@
+.. image:: https://img.shields.io/pypi/dm/changeo
+ :target: https://pypi.org/project/changeo
+.. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic
+ :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html
+
Change-O - Repertoire clonal assignment toolkit
================================================================================
=====================================
bin/MakeDb.py
=====================================
@@ -20,7 +20,7 @@ from Bio import SeqIO
# Presto and changeo imports
from presto.Annotation import parseAnnotation
from presto.IO import countSeqFile, printLog, printMessage, printProgress, printError, printWarning, readSeqFile
-from changeo.Defaults import default_format, default_out_args
+from changeo.Defaults import default_format, default_out_args, default_imgt_id_len
from changeo.Commandline import CommonHelpFormatter, checkArgs, getCommonArgParser, parseCommonArgs
from changeo.Alignment import RegionDefinition
from changeo.Gene import buildGermline
@@ -102,7 +102,7 @@ def addGermline(receptor, references, amino_acid=False):
return receptor
-def getIDforIMGT(seq_file):
+def getIDforIMGT(seq_file, imgt_id_len=default_imgt_id_len):
"""
Create a sequence ID translation using IMGT truncation.
@@ -113,13 +113,13 @@ def getIDforIMGT(seq_file):
dict : a dictionary of with the IMGT truncated ID as the key and the full sequence description as the value.
"""
- # Create a sequence ID translation using IDs truncate up to space or 50 chars
+ # Create a sequence ID translation using IDs truncate up to space or 49 chars
ids = {}
for rec in readSeqFile(seq_file):
- if len(rec.description) <= 50:
+ if len(rec.description) <= imgt_id_len:
id_key = rec.description
else:
- id_key = re.sub('\||\s|!|&|\*|<|>|\?', '_', rec.description[:50])
+ id_key = re.sub('\||\s|!|&|\*|<|>|\?', '_', rec.description[:imgt_id_len])
ids.update({id_key: rec.description})
return ids
@@ -355,7 +355,7 @@ def writeDb(records, fields, aligner_file, total_count, id_dict=None, annotation
def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, partial=False, asis_id=True,
- extended=False, format=default_format, out_file=None, out_args=default_out_args):
+ extended=False, format=default_format, out_file=None, out_args=default_out_args, imgt_id_len=default_imgt_id_len):
"""
Main for IMGT aligned sample sequences.
@@ -369,6 +369,7 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
format : output format. one of 'changeo' or 'airr'.
out_file : output file name. Automatically generated from the input file if None.
out_args : common output argument dictionary from parseCommonArgs.
+ imgt_id_len: maximum character length of sequence identifiers reported by IMGT/HighV-QUEST.
Returns:
dict : names of the 'pass' and 'fail' output files.
@@ -394,8 +395,8 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
total_count = countDbFile(imgt_files['summary'])
# Get (parsed) IDs from fasta file submitted to IMGT
- id_dict = getIDforIMGT(seq_file) if seq_file else {}
-
+ id_dict = getIDforIMGT(seq_file, imgt_id_len) if seq_file else {}
+
# Load supplementary annotation table
if cellranger_file is not None:
f = cellranger_extended if extended else cellranger_base
@@ -436,7 +437,6 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
if all('...' not in x for x in references.values()):
printWarning('Germline reference sequences do not appear to contain IMGT-numbering spacers. Results may be incorrect.')
germ_iter = (addGermline(x, references) for x in parse_iter)
-
# Write db
output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
annotations=annotations, id_dict=id_dict, asis_id=asis_id, partial=partial,
@@ -449,7 +449,7 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file=None, partial=False,
- asis_id=True, asis_calls=False, extended=False, regions='default',
+ asis_id=True, asis_calls=False, extended=False, regions='default', infer_junction=False,
format='changeo', out_file=None, out_args=default_out_args):
"""
Main for IgBLAST aligned sample sequences.
@@ -464,6 +464,7 @@ def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file
asis_calls (bool): if True do not parse gene calls for allele names.
extended (bool): if True add alignment scores, FWR regions, and CDR regions to the output.
regions (str): name of the IMGT FWR/CDR region definitions to use.
+ infer_junction (bool): if True, infer the junction sequence, if not reported by IgBLAST.
format (str): output format. one of 'changeo' or 'airr'.
out_file (str): output file name. Automatically generated from the input file if None.
out_args (dict): common output argument dictionary from parseCommonArgs.
@@ -481,6 +482,7 @@ def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file
log['ASIS_CALLS'] = asis_calls
log['PARTIAL'] = partial
log['EXTENDED'] = extended
+ log['INFER_JUNCTION'] = infer_junction
printLog(log)
# Set amino acid conditions
@@ -531,7 +533,7 @@ def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file
# Parse and write output
with open(aligner_file, 'r') as f:
- parse_iter = parser(f, seq_dict, references, regions=regions, asis_calls=asis_calls)
+ parse_iter = parser(f, seq_dict, references, regions=regions, asis_calls=asis_calls, infer_junction=infer_junction)
germ_iter = (addGermline(x, references, amino_acid=amino_acid) for x in parse_iter)
output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
annotations=annotations, amino_acid=amino_acid, partial=partial, asis_id=asis_id,
@@ -724,6 +726,9 @@ def getArgParser():
group_igblast.add_argument('--regions', action='store', dest='regions',
choices=('default', 'rhesus-igl'), default='default',
help='''IMGT CDR and FWR boundary definition to use.''')
+ group_igblast.add_argument('--infer-junction', action='store_true', dest='infer_junction',
+ help='''Infer the junction sequence. For use with IgBLAST v1.6.0 or older,
+ prior to the addition of IMGT-CDR3 inference.''')
parser_igblast.set_defaults(func=parseIgBLAST, amino_acid=False)
# igblastp output parser
@@ -809,6 +814,11 @@ def getArgParser():
Adds <vdj>_score, <vdj>_identity>, fwr1, fwr2, fwr3, fwr4,
cdr1, cdr2, cdr3, n1_length, n2_length, p3v_length, p5d_length,
p3d_length, p5j_length and d_frame.''')
+ group_imgt.add_argument('--imgt-id-len', action='store', dest='imgt_id_len', type=int,
+ default=default_imgt_id_len,
+ help='''The maximum character length of sequence identifiers reported by IMGT/HighV-QUEST.
+ Specify 50 if the IMGT files (-i) were generated with an IMGT/HighV-QUEST version older
+ than 1.8.3 (May 7, 2021).''')
parser_imgt.set_defaults(func=parseIMGT)
# iHMMuneAlign Aligner
=====================================
changeo.egg-info/PKG-INFO
=====================================
@@ -1,13 +1,18 @@
Metadata-Version: 1.1
Name: changeo
-Version: 1.0.2
+Version: 1.1.0
Summary: A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
Home-page: http://changeo.readthedocs.io
Author: Namita Gupta, Jason Anthony Vander Heiden
Author-email: immcantation at googlegroups.com
License: GNU Affero General Public License 3 (AGPL-3)
Download-URL: https://bitbucket.org/kleinstein/changeo/downloads
-Description: Change-O - Repertoire clonal assignment toolkit
+Description: .. image:: https://img.shields.io/pypi/dm/changeo
+ :target: https://pypi.org/project/changeo
+ .. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic
+ :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html
+
+ Change-O - Repertoire clonal assignment toolkit
================================================================================
Change-O is a collection of tools for processing the output of V(D)J alignment
=====================================
changeo.egg-info/requires.txt
=====================================
@@ -1,8 +1,8 @@
numpy>=1.8
scipy>=0.14
pandas>=0.24
-biopython>=1.71
-PyYAML>=3.12
+biopython>=1.77
+PyYAML>=5.1
setuptools>=2.0
presto>=0.6.2
-airr>=1.2.1
+airr>=1.3.1
=====================================
changeo/Commandline.py
=====================================
@@ -110,8 +110,10 @@ def getCommonArgParser(db_in=True, db_out=True, out_file=True, failed=True, log=
fail processing.''')
# Format arguments
if format:
- group.add_argument('--format', action='store', dest='format', default=default_format,
- choices=choices_format, help='''Specify input and output format.''')
+ group.add_argument('--format', action='store', dest='format',
+ default=default_format, choices=choices_format,
+ help='''Output format. Also specifies the input format for tools accepting
+ tab delimited AIRR Rearrangement or Change-O files.''')
# Multiprocessing arguments
if multiproc:
=====================================
changeo/Defaults.py
=====================================
@@ -42,3 +42,6 @@ default_out_args = {'log_file': None,
'out_name': None,
'out_type': 'tsv',
'failed': False}
+
+# IMGT
+default_imgt_id_len = 49
=====================================
changeo/Gene.py
=====================================
@@ -14,14 +14,14 @@ from changeo.Defaults import v_attr, d_attr, j_attr, seq_attr
# Ig and TCR Regular expressions
allele_number_regex = re.compile(r'(?<=\*)([\.\w]+)')
-allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-Z0-9]+[-/\w]*[-\*][\.\w]+))')
-gene_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-Z0-9]+[-/\w]*))')
-family_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-Z0-9]+))')
+allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-R0-9]+[-/\w]*[-\*][\.\w]+))')
+gene_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-R0-9]+[-/\w]*))')
+family_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-R0-9]+))')
locus_regex = re.compile(r'(IG[HLK]|TR[ABGD])')
-v_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])V[A-Z0-9]+[-/\w]*[-\*][\.\w]+)')
-d_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])D[A-Z0-9]+[-/\w]*[-\*][\.\w]+)')
-j_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])J[A-Z0-9]+[-/\w]*[-\*][\.\w]+)')
+v_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])V[A-R0-9]+[-/\w]*[-\*][\.\w]+)')
+d_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])D[A-R0-9]+[-/\w]*[-\*][\.\w]+)')
+j_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])J[A-R0-9]+[-/\w]*[-\*][\.\w]+)')
c_gene_regex = re.compile(r'((IG[HLK]|TR[ABGD])([DMAGEC][P0-9]?[A-Z]?))')
=====================================
changeo/IO.py
=====================================
@@ -883,7 +883,8 @@ class IgBLASTReader:
return fields
- def __init__(self, igblast, sequences, references, asis_calls=False, regions='default', receptor=True):
+ def __init__(self, igblast, sequences, references, asis_calls=False, regions='default', receptor=True,
+ infer_junction=False):
"""
Initializer.
@@ -895,6 +896,7 @@ class IgBLASTReader:
asis_calls (bool): if True do not parse gene calls for allele names.
regions (str): name of the IMGT FWR/CDR region definitions to use.
receptor (bool): if True (default) iteration returns an Receptor object, otherwise it returns a dictionary.
+ infer_junction (bool): if True, infer the junction region if not reported by IgBLAST.
Returns:
changeo.IO.IgBLASTReader
@@ -906,6 +908,7 @@ class IgBLASTReader:
self.regions = regions
self.asis_calls = asis_calls
self.receptor = receptor
+ self.infer_junction = infer_junction
# Define parsing blocks
self.groups = groupby(self.igblast, lambda x: not re.match('# IGBLAST', x))
@@ -1472,7 +1475,7 @@ class IgBLASTReader:
if 'subregion' in sections and 'cdr3_igblast_start' in sections['subregion']:
junc_dict = self._parseSubregionSection(sections['subregion'], db['sequence_input'])
db.update(junc_dict)
- elif ('j_call' in db and db['j_call']) and ('sequence_imgt' in db and db['sequence_imgt']):
+ elif self.infer_junction and ('j_call' in db and db['j_call']) and ('sequence_imgt' in db and db['sequence_imgt']):
junc_dict = inferJunction(db['sequence_imgt'],
j_germ_start=db['j_germ_start'],
j_germ_length=db['j_germ_length'],
@@ -2474,4 +2477,4 @@ def yamlDict(file):
except:
printError('YAML file is invalid.')
- return yaml_dict
\ No newline at end of file
+ return yaml_dict
=====================================
changeo/Version.py
=====================================
@@ -3,7 +3,7 @@ Version and authorship information
"""
__author__ = 'Namita Gupta, Jason Anthony Vander Heiden'
-__copyright__ = 'Copyright 2020 Kleinstein Lab, Yale University. All rights reserved.'
+__copyright__ = 'Copyright 2021 Kleinstein Lab, Yale University. All rights reserved.'
__license__ = 'GNU Affero General Public License 3 (AGPL-3)'
-__version__ = '1.0.2'
-__date__ = '2021.01.18'
+__version__ = '1.1.0'
+__date__ = '2021.06.21'
=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+changeo (1.1.0-1) UNRELEASED; urgency=medium
+
+ * New upstream version 1.1.0
+ * d/tests/run-unit-tests: Change checksums in output
+
+ -- Nilesh Patra <nilesh at debian.org> Fri, 09 Jul 2021 23:07:44 +0530
+
changeo (1.0.2-2~0exp0) experimental; urgency=medium
* Team Upload.
=====================================
debian/tests/run-unit-test
=====================================
@@ -16,13 +16,13 @@ gunzip -r *
MakeDb.py imgt -i S43_atleast-2.txz -s S43_atleast-2.fasta
-echo "025d331569cf3959735d1677ad1532d9 S43_atleast-2_db-pass.tsv" >> checksums
+echo "fca65a99ea3569b99196c50c42269946 S43_atleast-2_db-pass.tsv" >> checksums
CreateGermlines.py -d S43_atleast-2_db-pass.tsv -g dmask -r IMGT_Human_IGHV.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta
-echo "8a5a1673f5a3e566a0c2a901bdf2a278 S43_atleast-2_db-pass_germ-pass.tsv" >> checksums
+echo "8b76920fd30d640d34af22009621cc2e S43_atleast-2_db-pass_germ-pass.tsv" >> checksums
ParseDb.py select -d S43_atleast-2_db-pass.tsv -f productive -u T
-echo "23ae15ac46bc1d83cfa9e615e8af3703 S43_atleast-2_db-pass_parse-select.tsv" >> checksums
+echo "7cf6dffe3d45414021d214da9ad6dc1e S43_atleast-2_db-pass_parse-select.tsv" >> checksums
md5sum --check checksums
=====================================
requirements.txt
=====================================
@@ -1,8 +1,8 @@
numpy>=1.8
scipy>=0.14
pandas>=0.24
-biopython>=1.71
-PyYAML>=3.12
+biopython>=1.77
+PyYAML>=5.1
setuptools>=2.0
presto>=0.6.2
-airr>=1.2.1
+airr>=1.3.1
View it on GitLab: https://salsa.debian.org/med-team/changeo/-/compare/908fd41c08bc8d600e1c082a3b1f4ac4ea5f0a67...1215c9170cffeaebdd365f6fd3fa3962f288c89d
--
View it on GitLab: https://salsa.debian.org/med-team/changeo/-/compare/908fd41c08bc8d600e1c082a3b1f4ac4ea5f0a67...1215c9170cffeaebdd365f6fd3fa3962f288c89d
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20210709/9a61af4c/attachment-0001.htm>
More information about the debian-med-commit
mailing list