[med-svn] [Git][med-team/changeo][master] 3 commits: New upstream version 1.1.0

Fri Jul 9 18:40:23 BST 2021


Nilesh Patra pushed to branch master at Debian Med / changeo


Commits:
46594f9a by Nilesh Patra at 2021-07-09T23:08:07+05:30
New upstream version 1.1.0
- - - - -
05bc29b8 by Nilesh Patra at 2021-07-09T23:08:07+05:30
d/tests/run-unit-tests: Change checksums in output

- - - - -
1215c917 by Nilesh Patra at 2021-07-09T23:08:07+05:30
Interim changelog entry

- - - - -


15 changed files:

- INSTALL.rst
- NEWS.rst
- PKG-INFO
- README.rst
- bin/MakeDb.py
- changeo.egg-info/PKG-INFO
- changeo.egg-info/requires.txt
- changeo/Commandline.py
- changeo/Defaults.py
- changeo/Gene.py
- changeo/IO.py
- changeo/Version.py
- debian/changelog
- debian/tests/run-unit-test
- requirements.txt


Changes:

=====================================
INSTALL.rst
=====================================
@@ -23,9 +23,9 @@ The minimum dependencies for installation are:
 + `NumPy 1.8 <http://numpy.org>`__
 + `SciPy 0.14 <http://scipy.org>`__
 + `pandas 0.24 <http://pandas.pydata.org>`__
-+ `Biopython 1.71 <http://biopython.org>`__
++ `Biopython 1.77 <http://biopython.org>`__
 + `presto 0.6.2 <http://presto.readthedocs.io>`__
-+ `airr 1.2.1 <https://docs.airr-community.org>`__
++ `airr 1.3.1 <https://docs.airr-community.org>`__
 
 Some tools wrap external applications that are not required for installation.
 Those tools require minimum versions of:
@@ -45,7 +45,7 @@ Linux
    Biopython according to its
    `instructions <http://biopython.org/DIST/docs/install/Installation.html>`__.
 
-2. Install `presto 0.5.10 <http://presto.readthedocs.io>`__ or greater.
+2. Install `presto 0.6.2 <http://presto.readthedocs.io>`__ or greater.
 
 3. Download the Change-O bundle and run::
 
@@ -86,7 +86,7 @@ Mac OS X
 
    > pip3 install numpy scipy pandas biopython
 
-8. Install `presto 0.5.10 <http://presto.readthedocs.io>`__ or greater.
+8. Install `presto 0.6.2 <http://presto.readthedocs.io>`__ or greater.
 
 9. Download the Change-O bundle, open a terminal window, change directories
    to the download folder, and run::
@@ -104,7 +104,7 @@ Windows
    `Unofficial Windows binary <http://www.lfd.uci.edu/~gohlke/pythonlibs>`__
    collection.
 
-3. Install `presto 0.5.10 <http://presto.readthedocs.io>`__ or greater.
+3. Install `presto 0.6.2 <http://presto.readthedocs.io>`__ or greater.
 
 4. Download the Change-O bundle, open a Command Prompt, change directories to
    the download folder, and run::


=====================================
NEWS.rst
=====================================
@@ -1,13 +1,32 @@
 Release Notes
 ===============================================================================
 
+Version 1.1.0:  June 21, 2021
+-------------------------------------------------------------------------------
+
++ Fixed gene parsing for IMGT temporary designation nomenclature.
++ Updated dependencies to biopython >= v1.77, airr >= v1.3.1, PyYAML>=5.1.
+
+MakeDb:
+
++ Added the ``--imgt-id-len`` argument to accommodate changes introduced in how
+  IMGT/HighV-QUEST truncates sequence identifiers as of version 1.8.3 (May 7, 2021).
+  The header lines in the fasta files are now truncated to 49 characters. In
+  IMGT/HighV-QUEST versions older that 1.8.3, they were truncated to 50 characters.
+  ``--imgt-id-len`` default value is 49. Users should specify ``--imgt-id-len 50``
+  to analyze IMGT results generated with IMGT/HighV-QUEST versions older that 1.8.3.
++ Added the ``--infer-junction`` argument to ``MakeDb igblast``, to enable the inference
+  of the junction sequence when not reported by IgBLAST. Should be used with data from
+  IgBLAST v1.6.0 or older; before igblast added the IMGT-CDR3 inference.
+
+
 Version 1.0.2:  January 18, 2021
 -------------------------------------------------------------------------------
 
 AlignRecords:
 
 + Fixed a bug caused the program to exit when encountering missing sequence
-data. It will now fail the row or group with missing data and continue.
+  data. It will now fail the row or group with missing data and continue.
 
 MakeDb:
 
@@ -69,7 +88,7 @@ MakeDb:
 + Add --regions argument to the ``igblast`` and ``igblast-aa`` subcommands
   to allow specification of the IMGT CDR/FWR region boundaries. Currently,
   the supported specifications are ``default`` (human, mouse) and
-   ``rhesus-igl``.
+  ``rhesus-igl``.
 
 
 Version 0.4.6:  July 19, 2019


=====================================
PKG-INFO
=====================================
@@ -1,13 +1,18 @@
 Metadata-Version: 1.1
 Name: changeo
-Version: 1.0.2
+Version: 1.1.0
 Summary: A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
 Home-page: http://changeo.readthedocs.io
 Author: Namita Gupta, Jason Anthony Vander Heiden
 Author-email: immcantation at googlegroups.com
 License: GNU Affero General Public License 3 (AGPL-3)
 Download-URL: https://bitbucket.org/kleinstein/changeo/downloads
-Description: Change-O - Repertoire clonal assignment toolkit
+Description: .. image:: https://img.shields.io/pypi/dm/changeo
+            :target: https://pypi.org/project/changeo
+        .. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic
+            :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html
+        
+        Change-O - Repertoire clonal assignment toolkit
         ================================================================================
         
         Change-O is a collection of tools for processing the output of V(D)J alignment


=====================================
README.rst
=====================================
@@ -1,3 +1,8 @@
+.. image:: https://img.shields.io/pypi/dm/changeo
+    :target: https://pypi.org/project/changeo
+.. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic
+    :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html
+
 Change-O - Repertoire clonal assignment toolkit
 ================================================================================
 


=====================================
bin/MakeDb.py
=====================================
@@ -20,7 +20,7 @@ from Bio import SeqIO
 # Presto and changeo imports
 from presto.Annotation import parseAnnotation
 from presto.IO import countSeqFile, printLog, printMessage, printProgress, printError, printWarning, readSeqFile
-from changeo.Defaults import default_format, default_out_args
+from changeo.Defaults import default_format, default_out_args, default_imgt_id_len
 from changeo.Commandline import CommonHelpFormatter, checkArgs, getCommonArgParser, parseCommonArgs
 from changeo.Alignment import RegionDefinition
 from changeo.Gene import buildGermline
@@ -102,7 +102,7 @@ def addGermline(receptor, references, amino_acid=False):
     return receptor
 
 
-def getIDforIMGT(seq_file):
+def getIDforIMGT(seq_file, imgt_id_len=default_imgt_id_len):
     """
     Create a sequence ID translation using IMGT truncation.
 
@@ -113,13 +113,13 @@ def getIDforIMGT(seq_file):
       dict : a dictionary of with the IMGT truncated ID as the key and the full sequence description as the value.
     """
 
-    # Create a sequence ID translation using IDs truncate up to space or 50 chars
+    # Create a sequence ID translation using IDs truncate up to space or 49 chars
     ids = {}
     for rec in readSeqFile(seq_file):
-        if len(rec.description) <= 50:
+        if len(rec.description) <= imgt_id_len:
             id_key = rec.description
         else:
-            id_key = re.sub('\||\s|!|&|\*|<|>|\?', '_', rec.description[:50])
+            id_key = re.sub('\||\s|!|&|\*|<|>|\?', '_', rec.description[:imgt_id_len])
         ids.update({id_key: rec.description})
 
     return ids
@@ -355,7 +355,7 @@ def writeDb(records, fields, aligner_file, total_count, id_dict=None, annotation
 
 
 def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, partial=False, asis_id=True,
-              extended=False, format=default_format, out_file=None, out_args=default_out_args):
+              extended=False, format=default_format, out_file=None, out_args=default_out_args, imgt_id_len=default_imgt_id_len):
     """
     Main for IMGT aligned sample sequences.
 
@@ -369,6 +369,7 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
       format : output format. one of 'changeo' or 'airr'.
       out_file : output file name. Automatically generated from the input file if None.
       out_args : common output argument dictionary from parseCommonArgs.
+      imgt_id_len: maximum character length of sequence identifiers reported by IMGT/HighV-QUEST.
 
     Returns:
       dict : names of the 'pass' and 'fail' output files.
@@ -394,8 +395,8 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
     total_count = countDbFile(imgt_files['summary'])
 
     # Get (parsed) IDs from fasta file submitted to IMGT
-    id_dict = getIDforIMGT(seq_file) if seq_file else {}
-
+    id_dict = getIDforIMGT(seq_file, imgt_id_len) if seq_file else {}
+    
     # Load supplementary annotation table
     if cellranger_file is not None:
         f = cellranger_extended if extended else cellranger_base
@@ -436,7 +437,6 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
             if all('...' not in x for x in references.values()):
                 printWarning('Germline reference sequences do not appear to contain IMGT-numbering spacers. Results may be incorrect.')
             germ_iter = (addGermline(x, references) for x in parse_iter)
-
         # Write db
         output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count, 
                          annotations=annotations, id_dict=id_dict, asis_id=asis_id, partial=partial,
@@ -449,7 +449,7 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
 
 
 def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file=None, partial=False,
-                 asis_id=True, asis_calls=False, extended=False, regions='default',
+                 asis_id=True, asis_calls=False, extended=False, regions='default', infer_junction=False,
                  format='changeo', out_file=None, out_args=default_out_args):
     """
     Main for IgBLAST aligned sample sequences.
@@ -464,6 +464,7 @@ def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file
       asis_calls (bool): if True do not parse gene calls for allele names.
       extended (bool): if True add alignment scores, FWR regions, and CDR regions to the output.
       regions (str): name of the IMGT FWR/CDR region definitions to use.
+      infer_junction (bool): if True, infer the junction sequence, if not reported by IgBLAST.
       format (str): output format. one of 'changeo' or 'airr'.
       out_file (str): output file name. Automatically generated from the input file if None.
       out_args (dict): common output argument dictionary from parseCommonArgs.
@@ -481,6 +482,7 @@ def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file
     log['ASIS_CALLS'] = asis_calls
     log['PARTIAL'] = partial
     log['EXTENDED'] = extended
+    log['INFER_JUNCTION'] = infer_junction
     printLog(log)
 
     # Set amino acid conditions
@@ -531,7 +533,7 @@ def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file
 
     # Parse and write output
     with open(aligner_file, 'r') as f:
-        parse_iter = parser(f, seq_dict, references, regions=regions, asis_calls=asis_calls)
+        parse_iter = parser(f, seq_dict, references, regions=regions, asis_calls=asis_calls, infer_junction=infer_junction)
         germ_iter = (addGermline(x, references, amino_acid=amino_acid) for x in parse_iter)
         output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count, 
                          annotations=annotations, amino_acid=amino_acid, partial=partial, asis_id=asis_id,
@@ -724,6 +726,9 @@ def getArgParser():
     group_igblast.add_argument('--regions', action='store', dest='regions',
                                choices=('default', 'rhesus-igl'), default='default',
                                help='''IMGT CDR and FWR boundary definition to use.''')
+    group_igblast.add_argument('--infer-junction', action='store_true', dest='infer_junction',
+                                 help='''Infer the junction sequence. For use with IgBLAST v1.6.0 or older, 
+                                 prior to the addition of IMGT-CDR3 inference.''')
     parser_igblast.set_defaults(func=parseIgBLAST, amino_acid=False)
 
     # igblastp output parser
@@ -809,6 +814,11 @@ def getArgParser():
                                  Adds <vdj>_score, <vdj>_identity>, fwr1, fwr2, fwr3, fwr4,
                                  cdr1, cdr2, cdr3, n1_length, n2_length, p3v_length, p5d_length, 
                                  p3d_length, p5j_length and d_frame.''')
+    group_imgt.add_argument('--imgt-id-len', action='store', dest='imgt_id_len', type=int,
+                            default=default_imgt_id_len,
+                            help='''The maximum character length of sequence identifiers reported by IMGT/HighV-QUEST. 
+                            Specify 50 if the IMGT files (-i) were generated with an IMGT/HighV-QUEST version older 
+                            than 1.8.3 (May 7, 2021).''')
     parser_imgt.set_defaults(func=parseIMGT)
 
     # iHMMuneAlign Aligner


=====================================
changeo.egg-info/PKG-INFO
=====================================
@@ -1,13 +1,18 @@
 Metadata-Version: 1.1
 Name: changeo
-Version: 1.0.2
+Version: 1.1.0
 Summary: A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
 Home-page: http://changeo.readthedocs.io
 Author: Namita Gupta, Jason Anthony Vander Heiden
 Author-email: immcantation at googlegroups.com
 License: GNU Affero General Public License 3 (AGPL-3)
 Download-URL: https://bitbucket.org/kleinstein/changeo/downloads
-Description: Change-O - Repertoire clonal assignment toolkit
+Description: .. image:: https://img.shields.io/pypi/dm/changeo
+            :target: https://pypi.org/project/changeo
+        .. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic
+            :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html
+        
+        Change-O - Repertoire clonal assignment toolkit
         ================================================================================
         
         Change-O is a collection of tools for processing the output of V(D)J alignment


=====================================
changeo.egg-info/requires.txt
=====================================
@@ -1,8 +1,8 @@
 numpy>=1.8
 scipy>=0.14
 pandas>=0.24
-biopython>=1.71
-PyYAML>=3.12
+biopython>=1.77
+PyYAML>=5.1
 setuptools>=2.0
 presto>=0.6.2
-airr>=1.2.1
+airr>=1.3.1


=====================================
changeo/Commandline.py
=====================================
@@ -110,8 +110,10 @@ def getCommonArgParser(db_in=True, db_out=True, out_file=True, failed=True, log=
                                 fail processing.''')
     # Format arguments
     if format:
-        group.add_argument('--format', action='store', dest='format', default=default_format,
-                           choices=choices_format, help='''Specify input and output format.''')
+        group.add_argument('--format', action='store', dest='format',
+                           default=default_format, choices=choices_format,
+                           help='''Output format. Also specifies the input format for tools accepting 
+                                tab delimited AIRR Rearrangement or Change-O files.''')
 
     # Multiprocessing arguments
     if multiproc:


=====================================
changeo/Defaults.py
=====================================
@@ -42,3 +42,6 @@ default_out_args = {'log_file': None,
                     'out_name': None,
                     'out_type': 'tsv',
                     'failed': False}
+
+# IMGT
+default_imgt_id_len = 49


=====================================
changeo/Gene.py
=====================================
@@ -14,14 +14,14 @@ from changeo.Defaults import v_attr, d_attr, j_attr, seq_attr
 
 # Ig and TCR Regular expressions
 allele_number_regex = re.compile(r'(?<=\*)([\.\w]+)')
-allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-Z0-9]+[-/\w]*[-\*][\.\w]+))')
-gene_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-Z0-9]+[-/\w]*))')
-family_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-Z0-9]+))')
+allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-R0-9]+[-/\w]*[-\*][\.\w]+))')
+gene_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-R0-9]+[-/\w]*))')
+family_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-R0-9]+))')
 locus_regex = re.compile(r'(IG[HLK]|TR[ABGD])')
 
-v_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])V[A-Z0-9]+[-/\w]*[-\*][\.\w]+)')
-d_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])D[A-Z0-9]+[-/\w]*[-\*][\.\w]+)')
-j_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])J[A-Z0-9]+[-/\w]*[-\*][\.\w]+)')
+v_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])V[A-R0-9]+[-/\w]*[-\*][\.\w]+)')
+d_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])D[A-R0-9]+[-/\w]*[-\*][\.\w]+)')
+j_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])J[A-R0-9]+[-/\w]*[-\*][\.\w]+)')
 c_gene_regex = re.compile(r'((IG[HLK]|TR[ABGD])([DMAGEC][P0-9]?[A-Z]?))')
 
 


=====================================
changeo/IO.py
=====================================
@@ -883,7 +883,8 @@ class IgBLASTReader:
 
         return fields
 
-    def __init__(self, igblast, sequences, references, asis_calls=False, regions='default', receptor=True):
+    def __init__(self, igblast, sequences, references, asis_calls=False, regions='default', receptor=True,
+                 infer_junction=False):
         """
         Initializer.
 
@@ -895,6 +896,7 @@ class IgBLASTReader:
           asis_calls (bool): if True do not parse gene calls for allele names.
           regions (str): name of the IMGT FWR/CDR region definitions to use.
           receptor (bool): if True (default) iteration returns an Receptor object, otherwise it returns a dictionary.
+          infer_junction (bool): if True, infer the junction region if not reported by IgBLAST.
 
         Returns:
           changeo.IO.IgBLASTReader
@@ -906,6 +908,7 @@ class IgBLASTReader:
         self.regions = regions
         self.asis_calls = asis_calls
         self.receptor = receptor
+        self.infer_junction = infer_junction
 
         # Define parsing blocks
         self.groups = groupby(self.igblast, lambda x: not re.match('# IGBLAST', x))
@@ -1472,7 +1475,7 @@ class IgBLASTReader:
         if 'subregion' in sections and 'cdr3_igblast_start' in sections['subregion']:
             junc_dict = self._parseSubregionSection(sections['subregion'], db['sequence_input'])
             db.update(junc_dict)
-        elif ('j_call' in db and db['j_call']) and ('sequence_imgt' in db and db['sequence_imgt']):
+        elif self.infer_junction and ('j_call' in db and db['j_call']) and ('sequence_imgt' in db and db['sequence_imgt']):
             junc_dict = inferJunction(db['sequence_imgt'],
                                       j_germ_start=db['j_germ_start'],
                                       j_germ_length=db['j_germ_length'],
@@ -2474,4 +2477,4 @@ def yamlDict(file):
     except:
         printError('YAML file is invalid.')
 
-    return yaml_dict
\ No newline at end of file
+    return yaml_dict


=====================================
changeo/Version.py
=====================================
@@ -3,7 +3,7 @@ Version and authorship information
 """
 
 __author__    = 'Namita Gupta, Jason Anthony Vander Heiden'
-__copyright__ = 'Copyright 2020 Kleinstein Lab, Yale University. All rights reserved.'
+__copyright__ = 'Copyright 2021 Kleinstein Lab, Yale University. All rights reserved.'
 __license__   = 'GNU Affero General Public License 3 (AGPL-3)'
-__version__   = '1.0.2'
-__date__      = '2021.01.18'
+__version__   = '1.1.0'
+__date__      = '2021.06.21'


=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+changeo (1.1.0-1) UNRELEASED; urgency=medium
+
+  * New upstream version 1.1.0
+  * d/tests/run-unit-tests: Change checksums in output
+
+ -- Nilesh Patra <nilesh at debian.org>  Fri, 09 Jul 2021 23:07:44 +0530
+
 changeo (1.0.2-2~0exp0) experimental; urgency=medium
 
   * Team Upload.


=====================================
debian/tests/run-unit-test
=====================================
@@ -16,13 +16,13 @@ gunzip -r *
 
 
 MakeDb.py imgt -i S43_atleast-2.txz -s S43_atleast-2.fasta
-echo "025d331569cf3959735d1677ad1532d9  S43_atleast-2_db-pass.tsv" >> checksums
+echo "fca65a99ea3569b99196c50c42269946  S43_atleast-2_db-pass.tsv" >> checksums
 
 CreateGermlines.py -d S43_atleast-2_db-pass.tsv -g dmask -r IMGT_Human_IGHV.fasta IMGT_Human_IGHD.fasta IMGT_Human_IGHJ.fasta
-echo "8a5a1673f5a3e566a0c2a901bdf2a278  S43_atleast-2_db-pass_germ-pass.tsv" >> checksums
+echo "8b76920fd30d640d34af22009621cc2e  S43_atleast-2_db-pass_germ-pass.tsv" >> checksums
 
 ParseDb.py select -d S43_atleast-2_db-pass.tsv -f productive -u T
-echo "23ae15ac46bc1d83cfa9e615e8af3703  S43_atleast-2_db-pass_parse-select.tsv" >> checksums
+echo "7cf6dffe3d45414021d214da9ad6dc1e  S43_atleast-2_db-pass_parse-select.tsv" >> checksums
 
 md5sum --check checksums
 


=====================================
requirements.txt
=====================================
@@ -1,8 +1,8 @@
 numpy>=1.8
 scipy>=0.14
 pandas>=0.24
-biopython>=1.71
-PyYAML>=3.12
+biopython>=1.77
+PyYAML>=5.1
 setuptools>=2.0
 presto>=0.6.2
-airr>=1.2.1
+airr>=1.3.1



View it on GitLab: https://salsa.debian.org/med-team/changeo/-/compare/908fd41c08bc8d600e1c082a3b1f4ac4ea5f0a67...1215c9170cffeaebdd365f6fd3fa3962f288c89d

-- 
View it on GitLab: https://salsa.debian.org/med-team/changeo/-/compare/908fd41c08bc8d600e1c082a3b1f4ac4ea5f0a67...1215c9170cffeaebdd365f6fd3fa3962f288c89d
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20210709/9a61af4c/attachment-0001.htm>