[med-svn] [Git][med-team/changeo][master] 5 commits: New upstream version 1.1.0
Nilesh Patra (@nilesh)
gitlab at salsa.debian.org
Mon Nov 1 18:28:36 GMT 2021
Nilesh Patra pushed to branch master at Debian Med / changeo
Commits:
b16b82f9 by Nilesh Patra at 2021-07-09T22:53:30+05:30
New upstream version 1.1.0
- - - - -
da695a88 by Nilesh Patra at 2021-11-01T23:47:16+05:30
New upstream version 1.2.0
- - - - -
ac0ca4e1 by Nilesh Patra at 2021-11-01T23:47:19+05:30
Update upstream source from tag 'upstream/1.2.0'
Update to upstream version '1.2.0'
with Debian dir a8341ffd330fa5525d7858a3f13b926f2f75bd2f
- - - - -
ff3545b5 by Nilesh Patra at 2021-11-01T23:47:47+05:30
Run cme
- - - - -
207d4e3b by Nilesh Patra at 2021-11-01T23:48:16+05:30
Upload to unstable
- - - - -
14 changed files:
- INSTALL.rst
- NEWS.rst
- PKG-INFO
- bin/AssignGenes.py
- bin/BuildTrees.py
- bin/MakeDb.py
- bin/ParseDb.py
- changeo.egg-info/PKG-INFO
- changeo.egg-info/requires.txt
- changeo/IO.py
- changeo/Version.py
- debian/changelog
- debian/control
- requirements.txt
Changes:
=====================================
INSTALL.rst
=====================================
@@ -24,7 +24,7 @@ The minimum dependencies for installation are:
+ `SciPy 0.14 <http://scipy.org>`__
+ `pandas 0.24 <http://pandas.pydata.org>`__
+ `Biopython 1.77 <http://biopython.org>`__
-+ `presto 0.6.2 <http://presto.readthedocs.io>`__
++ `presto 0.7.0 <http://presto.readthedocs.io>`__
+ `airr 1.3.1 <https://docs.airr-community.org>`__
Some tools wrap external applications that are not required for installation.
=====================================
NEWS.rst
=====================================
@@ -1,6 +1,40 @@
Release Notes
===============================================================================
+Version 1.2.0: October 29, 2021
+-------------------------------------------------------------------------------
+
++ Updated dependencies to presto >= v0.7.0.
+
+AssignGenes:
+
++ Fixed reporting of IgBLAST output counts when specifying ``--format airr``.
+
+BuildTrees:
+
++ Added support for specifying fixed omega and hotness parameters at the
+ commandline.
+
+CreateGermlines:
+
++ Will now use the first allele in the reference database when duplicate
+ allele names are provided. Only appears to affect mouse BCR light chains
+ and TCR alleles in the IMGT database when the same allele name differs by
+ strain.
+
+MakeDb:
+
++ Added support for changes in how IMGT/HighV-QUEST v1.8.4 handles special
+ characters in sequence identifiers.
++ Fixed the ``imgt`` subcommand incorrectly allowing execution without
+ specifying the IMGT/HighV-QUEST output file at the commandline.
+
+ParseDb:
+
++ Added reporting of output file sizes to the console log of the ``split``
+ subcommand.
+
+
Version 1.1.0: June 21, 2021
-------------------------------------------------------------------------------
@@ -8,13 +42,12 @@ Version 1.1.0: June 21, 2021
+ Updated dependencies to biopython >= v1.77, airr >= v1.3.1, PyYAML>=5.1.
MakeDb:
-
+ Added the ``--imgt-id-len`` argument to accommodate changes introduced in how
- IMGT/HighV-QUEST truncates sequence identifiers as of version 1.8.3 (May 7, 2021).
+ IMGT/HighV-QUEST truncates sequence identifiers as of v1.8.3 (May 7, 2021).
The header lines in the fasta files are now truncated to 49 characters. In
- IMGT/HighV-QUEST versions older that 1.8.3, they were truncated to 50 characters.
+ IMGT/HighV-QUEST versions older than v1.8.3, they were truncated to 50 characters.
``--imgt-id-len`` default value is 49. Users should specify ``--imgt-id-len 50``
- to analyze IMGT results generated with IMGT/HighV-QUEST versions older that 1.8.3.
+ to analyze IMGT results generated with IMGT/HighV-QUEST versions older than v1.8.3.
+ Added the ``--infer-junction`` argument to ``MakeDb igblast``, to enable the inference
of the junction sequence when not reported by IgBLAST. Should be used with data from
IgBLAST v1.6.0 or older; before igblast added the IMGT-CDR3 inference.
=====================================
PKG-INFO
=====================================
@@ -1,6 +1,6 @@
Metadata-Version: 1.1
Name: changeo
-Version: 1.1.0
+Version: 1.2.0
Summary: A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
Home-page: http://changeo.readthedocs.io
Author: Namita Gupta, Jason Anthony Vander Heiden
=====================================
bin/AssignGenes.py
=====================================
@@ -14,6 +14,7 @@ from collections import OrderedDict
from pkg_resources import parse_version
from textwrap import dedent
from time import time
+import re
# Presto imports
from presto.IO import printLog, printMessage, printError, printWarning
@@ -99,9 +100,30 @@ def assignIgBLAST(seq_file, amino_acid=False, igdata=default_igdata, loci='ig',
vdb=vdb, output=out_file,
threads=nproc, exec=igblast_exec)
printMessage('Done', start_time=start_time, end=True, width=25)
-
+
+ # Get number of processed sequences
+ if (format == 'blast'):
+ with open(out_file, 'rb') as f:
+ f.seek(-2, os.SEEK_END)
+ while f.read(1) != b'\n':
+ f.seek(-2, os.SEEK_CUR)
+ pass_info = f.readline().decode()
+ num_seqs_match = re.search('(# BLAST processed )(\d+)( .*)', pass_info)
+ num_sequences = num_seqs_match.group(2)
+ else:
+ f = open(out_file, 'rb')
+ lines = 0
+ buf_size = 1024 * 1024
+ read_f = f.raw.read
+ buf = read_f(buf_size)
+ while buf:
+ lines += buf.count(b'\n')
+ buf = read_f(buf_size)
+ num_sequences = lines - 1
+
# Print log
log = OrderedDict()
+ log['PASS'] = num_sequences
log['OUTPUT'] = os.path.basename(out_file)
log['END'] = 'AssignGenes'
printLog(log)
=====================================
bin/BuildTrees.py
=====================================
@@ -999,7 +999,7 @@ def runIgPhyML(outfile, igphyml_out, clone_dir, nproc=1, optimization="lr", omeg
if oformat == "tab":
os.rmdir(clone_dir)
else:
- printWarning("Using --clean all with --oformat txt will delete all tree file results.\n"
+ printWarning("Using --clean all with --oformat txt will not delete all tree file results.\n"
"You'll have to do that yourself.")
log = OrderedDict()
log["END"] = "IgPhyML analysis"
@@ -1323,19 +1323,17 @@ def getArgParser():
help="""Optimize combination of topology (t) branch lengths (l) and parameters (r), or
nothing (n), for IgPhyML.""")
igphyml_group.add_argument("--omega", action="store", dest="omega", type=str, default="e,e",
- choices = ("e", "ce", "e,e", "ce,e", "e,ce", "ce,ce"),
help="""Omega parameters to estimate for FWR,CDR respectively:
- e = estimate, ce = estimate + confidence interval""")
+ e = estimate, ce = estimate + confidence interval, or numeric value""")
igphyml_group.add_argument("-t", action="store", dest="kappa", type=str, default="e",
- choices=("e", "ce"),
help="""Kappa parameters to estimate:
- e = estimate, ce = estimate + confidence interval""")
+ e = estimate, ce = estimate + confidence interval, or numeric value""")
igphyml_group.add_argument("--motifs", action="store", dest="motifs", type=str,
default="WRC_2:0,GYW_0:1,WA_1:2,TW_0:3,SYC_2:4,GRS_0:5",
help="""Which motifs to estimate mutability.""")
igphyml_group.add_argument("--hotness", action="store", dest="hotness", type=str, default="e,e,e,e,e,e",
help="""Mutability parameters to estimate:
- e = estimate, ce = estimate + confidence interval""")
+ e = estimate, ce = estimate + confidence interval, or numeric value""")
igphyml_group.add_argument("--oformat", action="store", dest="oformat", type=str, default="tab",
choices=("tab", "txt"),
help="""IgPhyML output format.""")
=====================================
bin/MakeDb.py
=====================================
@@ -118,8 +118,11 @@ def getIDforIMGT(seq_file, imgt_id_len=default_imgt_id_len):
for rec in readSeqFile(seq_file):
if len(rec.description) <= imgt_id_len:
id_key = rec.description
- else:
- id_key = re.sub('\||\s|!|&|\*|<|>|\?', '_', rec.description[:imgt_id_len])
+ else: # truncate and replace characters
+ if imgt_id_len == 49: # 28 September 2021 (version 1.8.4)
+ id_key = re.sub('\s|\t', '_', rec.description[:imgt_id_len])
+ else: # older versions
+ id_key = re.sub('\||\s|!|&|\*|<|>|\?', '_', rec.description[:imgt_id_len])
ids.update({id_key: rec.description})
return ids
@@ -145,8 +148,8 @@ def writeDb(records, fields, aligner_file, total_count, id_dict=None, annotation
writer=AIRRWriter, out_file=None, out_args=default_out_args):
"""
Writes parsed records to an output file
-
- Arguments:
+
+ Arguments:
records : a iterator of Receptor objects containing alignment data.
fields : a list of ordered field names to write.
aligner_file : input file name.
@@ -355,7 +358,8 @@ def writeDb(records, fields, aligner_file, total_count, id_dict=None, annotation
def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, partial=False, asis_id=True,
- extended=False, format=default_format, out_file=None, out_args=default_out_args, imgt_id_len=default_imgt_id_len):
+ extended=False, format=default_format, out_file=None, out_args=default_out_args,
+ imgt_id_len=default_imgt_id_len):
"""
Main for IMGT aligned sample sequences.
@@ -396,7 +400,7 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
# Get (parsed) IDs from fasta file submitted to IMGT
id_dict = getIDforIMGT(seq_file, imgt_id_len) if seq_file else {}
-
+
# Load supplementary annotation table
if cellranger_file is not None:
f = cellranger_extended if extended else cellranger_base
@@ -438,7 +442,7 @@ def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, part
printWarning('Germline reference sequences do not appear to contain IMGT-numbering spacers. Results may be incorrect.')
germ_iter = (addGermline(x, references) for x in parse_iter)
# Write db
- output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
+ output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
annotations=annotations, id_dict=id_dict, asis_id=asis_id, partial=partial,
writer=writer, out_file=out_file, out_args=out_args)
@@ -535,7 +539,7 @@ def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file
with open(aligner_file, 'r') as f:
parse_iter = parser(f, seq_dict, references, regions=regions, asis_calls=asis_calls, infer_junction=infer_junction)
germ_iter = (addGermline(x, references, amino_acid=amino_acid) for x in parse_iter)
- output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
+ output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
annotations=annotations, amino_acid=amino_acid, partial=partial, asis_id=asis_id,
regions=regions, writer=writer, out_file=out_file, out_args=out_args)
@@ -614,7 +618,7 @@ def parseIHMM(aligner_file, seq_file, repo, cellranger_file=None, partial=False,
with open(aligner_file, 'r') as f:
parse_iter = IHMMuneReader(f, seq_dict, references)
germ_iter = (addGermline(x, references) for x in parse_iter)
- output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
+ output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count,
annotations=annotations, asis_id=asis_id, partial=partial,
writer=writer, out_file=out_file, out_args=out_args)
@@ -625,7 +629,7 @@ def getArgParser():
"""
Defines the ArgumentParser.
- Returns:
+ Returns:
argparse.ArgumentParser
"""
fields = dedent(
@@ -637,34 +641,34 @@ def getArgParser():
db-fail
database with records that fail due to no productivity information,
no gene V assignment, no J assignment, or no junction region.
-
+
universal output fields:
- sequence_id, sequence, sequence_alignment, germline_alignment,
- rev_comp, productive, stop_codon, vj_in_frame, locus,
- v_call, d_call, j_call, junction, junction_length, junction_aa,
+ sequence_id, sequence, sequence_alignment, germline_alignment,
+ rev_comp, productive, stop_codon, vj_in_frame, locus,
+ v_call, d_call, j_call, junction, junction_length, junction_aa,
v_sequence_start, v_sequence_end, v_germline_start, v_germline_end,
d_sequence_start, d_sequence_end, d_germline_start, d_germline_end,
j_sequence_start, j_sequence_end, j_germline_start, j_germline_end,
np1_length, np2_length, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3
imgt specific output fields:
- n1_length, n2_length, p3v_length, p5d_length, p3d_length, p5j_length,
- d_frame, v_score, v_identity, d_score, d_identity, j_score, j_identity
-
+ n1_length, n2_length, p3v_length, p5d_length, p3d_length, p5j_length,
+ d_frame, v_score, v_identity, d_score, d_identity, j_score, j_identity
+
igblast specific output fields:
- v_score, v_identity, v_support, v_cigar,
- d_score, d_identity, d_support, d_cigar,
+ v_score, v_identity, v_support, v_cigar,
+ d_score, d_identity, d_support, d_cigar,
j_score, j_identity, j_support, j_cigar
ihmm specific output fields:
vdj_score
-
+
10X specific output fields:
- cell_id, c_call, consensus_count, umi_count,
+ cell_id, c_call, consensus_count, umi_count,
v_call_10x, d_call_10x, j_call_10x,
junction_10x, junction_10x_aa
''')
-
+
# Define ArgumentParser
parser = ArgumentParser(description=__doc__, epilog=fields,
formatter_class=CommonHelpFormatter, add_help=False)
@@ -686,8 +690,7 @@ def getArgParser():
help='Process igblastn output.',
description='Process igblastn output.')
group_igblast = parser_igblast.add_argument_group('aligner parsing arguments')
- group_igblast.add_argument('-i', nargs='+', action='store', dest='aligner_files',
- required=True,
+ group_igblast.add_argument('-i', nargs='+', action='store', dest='aligner_files', required=True,
help='''IgBLAST output files in format 7 with query sequence
(igblastn argument \'-outfmt "7 std qseq sseq btop"\').''')
group_igblast.add_argument('-r', nargs='+', action='store', dest='repo', required=True,
@@ -716,18 +719,18 @@ def getArgParser():
group_igblast.add_argument('--partial', action='store_true', dest='partial',
help='''If specified, include incomplete V(D)J alignments in
the pass file instead of the fail file. An incomplete alignment
- is defined as a record for which a valid IMGT-gapped sequence
- cannot be built or that is missing a V gene assignment,
+ is defined as a record for which a valid IMGT-gapped sequence
+ cannot be built or that is missing a V gene assignment,
J gene assignment, junction region, or productivity call.''')
group_igblast.add_argument('--extended', action='store_true', dest='extended',
- help='''Specify to include additional aligner specific fields in the output.
+ help='''Specify to include additional aligner specific fields in the output.
Adds <vdj>_score, <vdj>_identity, <vdj>_support, <vdj>_cigar,
fwr1, fwr2, fwr3, fwr4, cdr1, cdr2 and cdr3.''')
group_igblast.add_argument('--regions', action='store', dest='regions',
choices=('default', 'rhesus-igl'), default='default',
help='''IMGT CDR and FWR boundary definition to use.''')
group_igblast.add_argument('--infer-junction', action='store_true', dest='infer_junction',
- help='''Infer the junction sequence. For use with IgBLAST v1.6.0 or older,
+ help='''Infer the junction sequence. For use with IgBLAST v1.6.0 or older,
prior to the addition of IMGT-CDR3 inference.''')
parser_igblast.set_defaults(func=parseIgBLAST, amino_acid=False)
@@ -737,8 +740,7 @@ def getArgParser():
help='Process igblastp output.',
description='Process igblastp output.')
group_igblast_aa = parser_igblast_aa.add_argument_group('aligner parsing arguments')
- group_igblast_aa.add_argument('-i', nargs='+', action='store', dest='aligner_files',
- required=True,
+ group_igblast_aa.add_argument('-i', nargs='+', action='store', dest='aligner_files', required=True,
help='''IgBLAST output files in format 7 with query sequence
(igblastp argument \'-outfmt "7 std qseq sseq btop"\').''')
group_igblast_aa.add_argument('-r', nargs='+', action='store', dest='repo', required=True,
@@ -763,7 +765,7 @@ def getArgParser():
the sequence identifiers in the reference sequence set and the IgBLAST
database to be exact string matches.''')
group_igblast_aa.add_argument('--extended', action='store_true', dest='extended',
- help='''Specify to include additional aligner specific fields in the output.
+ help='''Specify to include additional aligner specific fields in the output.
Adds v_score, v_identity, v_support, v_cigar, fwr1, fwr2, fwr3, cdr1 and cdr2.''')
group_igblast_aa.add_argument('--regions', action='store', dest='regions',
choices=('default', 'rhesus-igl'), default='default',
@@ -779,21 +781,21 @@ def getArgParser():
description='''Process IMGT/HighV-Quest output
(does not work with V-QUEST).''')
group_imgt = parser_imgt.add_argument_group('aligner parsing arguments')
- group_imgt.add_argument('-i', nargs='+', action='store', dest='aligner_files',
+ group_imgt.add_argument('-i', nargs='+', action='store', dest='aligner_files', required=True,
help='''Either zipped IMGT output files (.zip or .txz) or a
folder containing unzipped IMGT output files (which must
include 1_Summary, 2_IMGT-gapped, 3_Nt-sequences,
and 6_Junction).''')
group_imgt.add_argument('-s', nargs='*', action='store', dest='seq_files', required=False,
help='''List of FASTA files (with .fasta, .fna or .fa
- extension) that were submitted to IMGT/HighV-QUEST.
+ extension) that were submitted to IMGT/HighV-QUEST.
If unspecified, sequence identifiers truncated by IMGT/HighV-QUEST
will not be corrected.''')
group_imgt.add_argument('-r', nargs='+', action='store', dest='repo', required=False,
help='''List of folders and/or fasta files containing
- the germline sequence set used by IMGT/HighV-QUEST.
+ the germline sequence set used by IMGT/HighV-QUEST.
These reference sequences must contain IMGT-numbering spacers (gaps)
- in the V segment. If unspecified, the germline sequence reconstruction
+ in the V segment. If unspecified, the germline sequence reconstruction
will not be included in the output.''')
group_imgt.add_argument('--10x', action='store', nargs='+', dest='cellranger_file',
help='''Table file containing 10X annotations (with .csv or .tsv
@@ -807,17 +809,17 @@ def getArgParser():
group_imgt.add_argument('--partial', action='store_true', dest='partial',
help='''If specified, include incomplete V(D)J alignments in
the pass file instead of the fail file. An incomplete alignment
- is defined as a record that is missing a V gene assignment,
+ is defined as a record that is missing a V gene assignment,
J gene assignment, junction region, or productivity call.''')
group_imgt.add_argument('--extended', action='store_true', dest='extended',
- help='''Specify to include additional aligner specific fields in the output.
+ help='''Specify to include additional aligner specific fields in the output.
Adds <vdj>_score, <vdj>_identity>, fwr1, fwr2, fwr3, fwr4,
- cdr1, cdr2, cdr3, n1_length, n2_length, p3v_length, p5d_length,
+ cdr1, cdr2, cdr3, n1_length, n2_length, p3v_length, p5d_length,
p3d_length, p5j_length and d_frame.''')
group_imgt.add_argument('--imgt-id-len', action='store', dest='imgt_id_len', type=int,
default=default_imgt_id_len,
- help='''The maximum character length of sequence identifiers reported by IMGT/HighV-QUEST.
- Specify 50 if the IMGT files (-i) were generated with an IMGT/HighV-QUEST version older
+ help='''The maximum character length of sequence identifiers reported by IMGT/HighV-QUEST.
+ Specify 50 if the IMGT files (-i) were generated with an IMGT/HighV-QUEST version older
than 1.8.3 (May 7, 2021).''')
parser_imgt.set_defaults(func=parseIMGT)
@@ -851,18 +853,18 @@ def getArgParser():
group_ihmm.add_argument('--partial', action='store_true', dest='partial',
help='''If specified, include incomplete V(D)J alignments in
the pass file instead of the fail file. An incomplete alignment
- is defined as a record for which a valid IMGT-gapped sequence
- cannot be built or that is missing a V gene assignment,
+ is defined as a record for which a valid IMGT-gapped sequence
+ cannot be built or that is missing a V gene assignment,
J gene assignment, junction region, or productivity call.''')
group_ihmm.add_argument('--extended', action='store_true', dest='extended',
- help='''Specify to include additional aligner specific fields in the output.
+ help='''Specify to include additional aligner specific fields in the output.
Adds the path score of the iHMMune-Align hidden Markov model as vdj_score;
adds fwr1, fwr2, fwr3, fwr4, cdr1, cdr2 and cdr3.''')
parser_ihmm.set_defaults(func=parseIHMM)
return parser
-
-
+
+
if __name__ == "__main__":
"""
Parses command line arguments and calls main
@@ -881,7 +883,7 @@ if __name__ == "__main__":
if 'seq_files' in args_dict: del args_dict['seq_files']
if 'out_files' in args_dict: del args_dict['out_files']
if 'command' in args_dict: del args_dict['command']
- if 'func' in args_dict: del args_dict['func']
+ if 'func' in args_dict: del args_dict['func']
# Call main
for i, f in enumerate(args.__dict__['aligner_files']):
=====================================
bin/ParseDb.py
=====================================
@@ -142,12 +142,15 @@ def splitDbFile(db_file, field, num_split=None, out_args=default_out_args):
log['OUTPUT%i' % (i + 1)] = os.path.basename(handles_dict[k].name)
log['RECORDS'] = rec_count
log['PARTS'] = len(handles_dict)
- log['END'] = 'ParseDb'
- printLog(log)
- # Close output file handles
+ # Close output file handles and log file size
db_handle.close()
- for t in handles_dict: handles_dict[t].close()
+ for i, t in enumerate(handles_dict):
+ handles_dict[t].close()
+ log['SIZE%i' % (i + 1)] = countDbFile(handles_dict[t].name)
+
+ log['END'] = 'ParseDb'
+ printLog(log)
return [handles_dict[t].name for t in handles_dict]
@@ -364,7 +367,7 @@ def deleteDbFile(db_file, fields, values, logic='any', regex=False,
"""
Deletes records from a database file
- Arguments:
+ Arguments:
db_file : the database file name.
fields : a list of fields to check for deletion criteria.
values : a list of values defining deletion targets.
@@ -372,8 +375,8 @@ def deleteDbFile(db_file, fields, values, logic='any', regex=False,
regex : if False do exact full string matches; if True allow partial regex matches.
out_file : output file name. Automatically generated from the input file if None.
out_args : common output argument dictionary from parseCommonArgs.
-
- Returns:
+
+ Returns:
str : output file name.
"""
# Define string match function
@@ -428,14 +431,14 @@ def deleteDbFile(db_file, fields, values, logic='any', regex=False,
rec_count += 1
# Check for deletion values in all fields
delete = _logic_func([_match_func(rec.get(f, False), values) for f in fields])
-
+
# Write sequences
if not delete:
pass_count += 1
pass_writer.writeDict(rec)
else:
fail_count += 1
-
+
# Print counts
printProgress(rec_count, result_count, 0.05, start_time=start_time)
log = OrderedDict()
@@ -449,7 +452,7 @@ def deleteDbFile(db_file, fields, values, logic='any', regex=False,
# Close file handles
pass_handle.close()
db_handle.close()
-
+
return pass_handle.name
@@ -867,10 +870,10 @@ def getArgParser():
"""
Defines the ArgumentParser
- Arguments:
+ Arguments:
None
-
- Returns:
+
+ Returns:
an ArgumentParser object
"""
# Define input and output field help message
@@ -888,7 +891,7 @@ def getArgParser():
required fields:
sequence_id
''')
-
+
# Define ArgumentParser
parser = ArgumentParser(description=__doc__, epilog=fields,
formatter_class=CommonHelpFormatter, add_help=False)
@@ -1027,11 +1030,11 @@ def getArgParser():
description='Merges files.')
group_merge = parser_merge.add_argument_group('parsing arguments')
group_merge.add_argument('-o', action='store', dest='out_file', default=None,
- help='''Explicit output file name. Note, this argument cannot be used with
+ help='''Explicit output file name. Note, this argument cannot be used with
the --failed, --outdir or --outname arguments.''')
group_merge.add_argument('--drop', action='store_true', dest='drop',
help='''If specified, drop fields that do not exist in all input files.
- Otherwise, include all columns in all files and fill missing data
+ Otherwise, include all columns in all files and fill missing data
with empty strings.''')
parser_merge.set_defaults(func=mergeDbFiles)
@@ -1092,4 +1095,4 @@ if __name__ == '__main__':
args_dict['out_file'] = args.__dict__['out_files'][i] \
if args.__dict__['out_files'] else None
args.func(**args_dict)
-
+
=====================================
changeo.egg-info/PKG-INFO
=====================================
@@ -1,6 +1,6 @@
Metadata-Version: 1.1
Name: changeo
-Version: 1.1.0
+Version: 1.2.0
Summary: A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
Home-page: http://changeo.readthedocs.io
Author: Namita Gupta, Jason Anthony Vander Heiden
=====================================
changeo.egg-info/requires.txt
=====================================
@@ -4,5 +4,5 @@ pandas>=0.24
biopython>=1.77
PyYAML>=5.1
setuptools>=2.0
-presto>=0.6.2
+presto>=0.7.0
airr>=1.3.1
=====================================
changeo/IO.py
=====================================
@@ -13,11 +13,12 @@ import yaml
import zipfile
from itertools import chain, groupby, zip_longest
from tempfile import TemporaryDirectory
+from textwrap import indent
from Bio import SeqIO
from Bio.Seq import Seq
# Presto and changeo imports
-from presto.IO import getFileType, printError, printWarning
+from presto.IO import getFileType, printError, printWarning, printDebug
from changeo.Defaults import default_csv_size
from changeo.Gene import getAllele, getLocus, getVAllele, getDAllele, getJAllele
from changeo.Receptor import AIRRSchema, AIRRSchemaAA, ChangeoSchema, ChangeoSchemaAA, Receptor, ReceptorData
@@ -2167,13 +2168,14 @@ class IHMMuneReader:
return db
-def readGermlines(references, asis=False):
+def readGermlines(references, asis=False, warn=False):
"""
Parses germline repositories
Arguments:
references (list): list of strings specifying directories and/or files from which to read germline records.
asis (bool): if True use sequence ID as record name and do not parse headers for allele names.
+ warn (bool): print warning messages to standard error if True.
Returns:
dict: Dictionary of germlines in the form {allele: sequence}.
@@ -2194,12 +2196,20 @@ def readGermlines(references, asis=False):
printError('No valid germline fasta files (.fasta, .fna, .fa) were found at %s.' % ','.join(references))
repo_dict = {}
+ duplicates = []
for file_name in repo_files:
with open(file_name, 'rU') as file_handle:
germlines = SeqIO.parse(file_handle, 'fasta')
for g in germlines:
germ_key = getAllele(g.description, 'first') if not asis else g.id
- repo_dict[germ_key] = str(g.seq).upper()
+ if germ_key not in repo_dict:
+ repo_dict[germ_key] = str(g.seq).upper()
+ else:
+ duplicates.append(g.description)
+
+ if warn and len(duplicates) > 0:
+ w = indent('\n'.join(duplicates), ' '*9)
+ printWarning('Duplicated germline allele names excluded from references:\n%s' % w)
return repo_dict
=====================================
changeo/Version.py
=====================================
@@ -5,5 +5,5 @@ Version and authorship information
__author__ = 'Namita Gupta, Jason Anthony Vander Heiden'
__copyright__ = 'Copyright 2021 Kleinstein Lab, Yale University. All rights reserved.'
__license__ = 'GNU Affero General Public License 3 (AGPL-3)'
-__version__ = '1.1.0'
-__date__ = '2021.06.21'
+__version__ = '1.2.0'
+__date__ = '2021.10.29'
=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+changeo (1.2.0-1) unstable; urgency=medium
+
+ * New upstream version 1.2.0
+ * Run cme
+
+ -- Nilesh Patra <nilesh at debian.org> Mon, 01 Nov 2021 23:47:51 +0530
+
changeo (1.1.0-1) unstable; urgency=medium
* New upstream version 1.1.0
=====================================
debian/control
=====================================
@@ -1,6 +1,7 @@
Source: changeo
Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
-Uploaders: Steffen Moeller <moeller at debian.org>, Nilesh Patra <nilesh at debian.org>
+Uploaders: Steffen Moeller <moeller at debian.org>,
+ Nilesh Patra <nilesh at debian.org>
Section: science
Testsuite: autopkgtest-pkg-python
Priority: optional
@@ -8,7 +9,7 @@ Build-Depends: debhelper-compat (= 13),
dh-python,
python3-all,
python3-setuptools
-Standards-Version: 4.5.1
+Standards-Version: 4.6.0
Vcs-Browser: https://salsa.debian.org/med-team/changeo
Vcs-Git: https://salsa.debian.org/med-team/changeo.git
Homepage: https://changeo.readthedocs.io
@@ -18,8 +19,8 @@ Package: changeo
Architecture: all
Depends: ${python3:Depends},
${misc:Depends}
-Recommends: python3-biopython (>=1.65),
- python3-pandas (>= 0.15),
+Recommends: python3-biopython,
+ python3-pandas,
python3-scipy,
python3-presto (>= 0.6.2),
python3-airr
=====================================
requirements.txt
=====================================
@@ -4,5 +4,5 @@ pandas>=0.24
biopython>=1.77
PyYAML>=5.1
setuptools>=2.0
-presto>=0.6.2
+presto>=0.7.0
airr>=1.3.1
View it on GitLab: https://salsa.debian.org/med-team/changeo/-/compare/329a5b1b0a0d182ad961d343d87501b1fe7675c4...207d4e3b1ee7848590abdddeba79b6812e71e34d
--
View it on GitLab: https://salsa.debian.org/med-team/changeo/-/compare/329a5b1b0a0d182ad961d343d87501b1fe7675c4...207d4e3b1ee7848590abdddeba79b6812e71e34d
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20211101/8a40da87/attachment-0001.htm>
More information about the debian-med-commit
mailing list