[med-svn] [Git][med-team/changeo][upstream] New upstream version 1.0.1
Nilesh Patra
gitlab at salsa.debian.org
Thu Oct 15 15:09:54 BST 2020
Nilesh Patra pushed to branch upstream at Debian Med / changeo
Commits:
fc08a33f by Nilesh Patra at 2020-10-15T13:05:08+00:00
New upstream version 1.0.1
- - - - -
13 changed files:
- INSTALL.rst
- NEWS.rst
- PKG-INFO
- bin/ConvertDb.py
- bin/DefineClones.py
- bin/ParseDb.py
- changeo.egg-info/PKG-INFO
- changeo.egg-info/SOURCES.txt
- changeo.egg-info/requires.txt
- changeo/IO.py
- changeo/Version.py
- − changeo/data/receptor.tsv
- requirements.txt
Changes:
=====================================
INSTALL.rst
=====================================
@@ -5,9 +5,9 @@ The simplest way to install the latest stable release of Change-O is via pip::
> pip3 install changeo --user
-The current development build can be installed using pip and mercurial in similar fashion::
+The current development build can be installed using pip and git in similar fashion::
- > pip3 install hg+https://bitbucket.org/kleinstein/changeo@default --user
+ > pip3 install git+https://bitbucket.org/kleinstein/changeo@master --user
If you currently have a development version installed, then you will likely
need to add the arguments ``--upgrade --no-deps --force-reinstall`` to the
@@ -16,14 +16,20 @@ pip3 command.
Requirements
--------------------------------------------------------------------------------
+The minimum dependencies for installation are:
+
+ `Python 3.4.0 <http://python.org>`__
+ `setuptools 2.0 <http://bitbucket.org/pypa/setuptools>`__
+ `NumPy 1.8 <http://numpy.org>`__
+ `SciPy 0.14 <http://scipy.org>`__
-+ `pandas 0.15 <http://pandas.pydata.org>`__
-+ `Biopython 1.65 <http://biopython.org>`__
-+ `presto 0.5.10 <http://presto.readthedocs.io>`__
-+ `airr 1.2.1 <https://docs.airr-community.org>`__.
++ `pandas 0.24 <http://pandas.pydata.org>`__
++ `Biopython 1.71 <http://biopython.org>`__
++ `presto 0.6.2 <http://presto.readthedocs.io>`__
++ `airr 1.2.1 <https://docs.airr-community.org>`__
+
+Some tools wrap external applications that are not required for installation.
+Those tools require minimum versions of:
+
+ AlignRecords requires `MUSCLE 3.8 <http://www.drive5.com/muscle>`__
+ ConvertDb-genbank requires `tbl2asn <https://www.ncbi.nlm.nih.gov/genbank/tbl2asn2>`__
+ AssignGenes requires `IgBLAST 1.6 <https://ncbi.github.io/igblast>`__, but
@@ -39,7 +45,7 @@ Linux
Biopython according to its
`instructions <http://biopython.org/DIST/docs/install/Installation.html>`__.
-2. Install `presto 0.5.0 <http://presto.readthedocs.io>`__ or greater.
+2. Install `presto 0.5.10 <http://presto.readthedocs.io>`__ or greater.
3. Download the Change-O bundle and run::
@@ -75,12 +81,12 @@ Mac OS X
> brew install --env=std gfortran
-7. Install NumPy, SciPy, pandas and Biopyton using the Python package
+7. Install NumPy, SciPy, pandas and Biopython using the Python package
manager::
> pip3 install numpy scipy pandas biopython
-8. Install `presto 0.5.0 <http://presto.readthedocs.io>`__ or greater.
+8. Install `presto 0.5.10 <http://presto.readthedocs.io>`__ or greater.
9. Download the Change-O bundle, open a terminal window, change directories
to the download folder, and run::
@@ -98,7 +104,7 @@ Windows
`Unofficial Windows binary <http://www.lfd.uci.edu/~gohlke/pythonlibs>`__
collection.
-3. Install `presto 0.5.0 <http://presto.readthedocs.io>`__ or greater.
+3. Install `presto 0.5.10 <http://presto.readthedocs.io>`__ or greater.
4. Download the Change-O bundle, open a Command Prompt, change directories to
the download folder, and run::
@@ -108,16 +114,16 @@ Windows
5. For a default installation of Python 3.4, the Change-0 scripts will be
installed into ``C:\Python34\Scripts`` and should be directly
executable from the Command Prompt. If this is not the case, then
- follow step 5 below.
+ follow step 6 below.
6. Add both the ``C:\Python34`` and ``C:\Python34\Scripts`` directories
to your ``%Path%``. On Windows 7 the ``%Path%`` setting is located
under Control Panel -> System and Security -> System -> Advanced
System Settings -> Environment variables -> System variables -> Path.
-6. If you have trouble with the ``.py`` file associations, try adding ``.PY``
- to your ``PATHEXT`` environment variable. Also, opening a
- command prompt as Administrator and run::
+7. If you have trouble with the ``.py`` file associations, try adding ``.PY``
+ to your ``PATHEXT`` environment variable. Also, try opening a
+ Command Prompt as Administrator and run::
> assoc .py=Python.File
> ftype Python.File="C:\Python34\python.exe" "%1" %*
=====================================
NEWS.rst
=====================================
@@ -1,6 +1,14 @@
Release Notes
===============================================================================
+Version 1.0.1: October 13, 2020
+-------------------------------------------------------------------------------
+
++ Updated to support Biopython v1.78.
++ Increased the biopython dependency to v1.71.
++ Increased the presto dependency to 0.6.2.
+
+
Version 1.0.0: May 6, 2020
-------------------------------------------------------------------------------
=====================================
PKG-INFO
=====================================
@@ -1,6 +1,6 @@
Metadata-Version: 1.1
Name: changeo
-Version: 1.0.0
+Version: 1.0.1
Summary: A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
Home-page: http://changeo.readthedocs.io
Author: Namita Gupta, Jason Anthony Vander Heiden
=====================================
bin/ConvertDb.py
=====================================
@@ -20,7 +20,6 @@ from time import time
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
-from Bio.Alphabet import IUPAC
# Presto and changeo imports
from presto.Annotation import flattenAnnotation
@@ -70,7 +69,7 @@ def buildSeqRecord(db_record, id_field, seq_field, meta_fields=None):
desc_str = flattenAnnotation(desc_dict)
# Create SeqRecord
- seq_record = SeqRecord(Seq(db_record[seq_field], IUPAC.ambiguous_dna),
+ seq_record = SeqRecord(Seq(db_record[seq_field]),
id=desc_str, name=desc_str, description='')
return seq_record
@@ -787,8 +786,7 @@ def makeGenbankSequence(record, name=None, label=None, count_field=None, index_f
name = '%s [note=%s]' % (name, note)
# Return SeqRecord and positions
- record = SeqRecord(Seq(seq[seq_start:seq_end], IUPAC.ambiguous_dna), id=name,
- name=name, description='')
+ record = SeqRecord(Seq(seq[seq_start:seq_end]), id=name, name=name, description='')
result = {'record': record, 'start': seq_start, 'end': seq_end}
return result
=====================================
bin/DefineClones.py
=====================================
@@ -49,7 +49,7 @@ choices_distance_model = ('ham', 'aa', 'hh_s1f', 'hh_s5f',
def filterMissing(data, seq_field=junction_attr, v_field=v_attr,
j_field=j_attr, max_missing=default_max_missing):
"""
- Splits a set of sequence into passed and failed groups based on the number
+ Splits a set of sequences into passed and failed groups based on the number
of missing characters in the sequence
Arguments:
@@ -60,7 +60,7 @@ def filterMissing(data, seq_field=junction_attr, v_field=v_attr,
max_missing (int): maximum number of missing characters (non-ACGT) to permit before failing the record.
Returns:
- changeo.Multiprocessing.DbResult : objected containing filtered records.
+ changeo.Multiprocessing.DbResult : object containing filtered records.
"""
# Function to validate the sequence string
def _pass(seq):
@@ -266,17 +266,17 @@ def distanceClones(result, seq_field=default_junction_field, model=default_dista
"""
Separates a set of Receptor objects into clones
- Arguments:
+ Arguments:
result : a changeo.Multiprocessing.DbResult object with filtered records to clone
seq_field : sequence field used to calculate distance between records
model : substitution model used to calculate distance
- distance : the distance threshold to assign clonal groups
+ distance : t distance threshold to assign clonal groups
dist_mat : pandas DataFrame of pairwise nucleotide or amino acid distances
norm : normalization method
sym : symmetry method
linkage : type of linkage
- Returns:
+ Returns:
changeo.Multiprocessing.DbResult : an updated DbResult object
"""
# Get distance matrix if not provided
@@ -349,7 +349,7 @@ def collectQueue(alive, result_queue, collect_queue, db_file, fields,
"""
Assembles results from a queue of individual sequence results and manages log/file I/O
- Arguments:
+ Arguments:
alive = a multiprocessing.Value boolean controlling whether processing continues
if False exit process
result_queue : a multiprocessing.Queue holding processQueue results
@@ -556,7 +556,7 @@ def defineClones(db_file, seq_field=default_junction_field, v_field=default_v_fi
group_args['j_field'] = j_field
feed_args = {'db_file': db_file,
'reader': reader,
- 'group_func': group_func,
+ 'group_func': group_func,
'group_args': group_args}
# Define worker function and arguments
=====================================
bin/ParseDb.py
=====================================
@@ -168,7 +168,7 @@ def addDbFile(db_file, fields, values, out_file=None, out_args=default_out_args)
log['VALUES'] = ','.join(values)
printLog(log)
- # Open inut
+ # Open input
db_handle = open(db_file, 'rt')
db_iter = TSVReader(db_handle)
__, __, out_args['out_type'] = splitName(db_file)
=====================================
changeo.egg-info/PKG-INFO
=====================================
@@ -1,6 +1,6 @@
Metadata-Version: 1.1
Name: changeo
-Version: 1.0.0
+Version: 1.0.1
Summary: A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
Home-page: http://changeo.readthedocs.io
Author: Namita Gupta, Jason Anthony Vander Heiden
=====================================
changeo.egg-info/SOURCES.txt
=====================================
@@ -35,5 +35,4 @@ changeo/data/hh_s5f_dist.tsv
changeo/data/hs1f_compat_dist.tsv
changeo/data/m1n_compat_dist.tsv
changeo/data/mk_rs1nf_dist.tsv
-changeo/data/mk_rs5nf_dist.tsv
-changeo/data/receptor.tsv
\ No newline at end of file
+changeo/data/mk_rs5nf_dist.tsv
\ No newline at end of file
=====================================
changeo.egg-info/requires.txt
=====================================
@@ -1,8 +1,8 @@
numpy>=1.8
scipy>=0.14
pandas>=0.24
-biopython>=1.65
+biopython>=1.71
PyYAML>=3.12
setuptools>=2.0
-presto>=0.5.10
+presto>=0.6.2
airr>=1.2.1
=====================================
changeo/IO.py
=====================================
@@ -14,7 +14,6 @@ import zipfile
from itertools import chain, groupby, zip_longest
from tempfile import TemporaryDirectory
from Bio import SeqIO
-from Bio.Alphabet import IUPAC
from Bio.Seq import Seq
# Presto and changeo imports
@@ -1058,7 +1057,7 @@ class IgBLASTReader:
# Reverse complement input sequence if required
if summary['strand'] == '-':
- seq_rc = Seq(db['sequence_input'], IUPAC.ambiguous_dna).reverse_complement()
+ seq_rc = Seq(db['sequence_input']).reverse_complement()
result['sequence_input'] = str(seq_rc)
result['rev_comp'] = 'T'
else:
=====================================
changeo/Version.py
=====================================
@@ -5,5 +5,5 @@ Version and authorship information
__author__ = 'Namita Gupta, Jason Anthony Vander Heiden'
__copyright__ = 'Copyright 2020 Kleinstein Lab, Yale University. All rights reserved.'
__license__ = 'GNU Affero General Public License 3 (AGPL-3)'
-__version__ = '1.0.0'
-__date__ = '2020.05.06'
+__version__ = '1.0.1'
+__date__ = '2020.10.13'
=====================================
changeo/data/receptor.tsv deleted
=====================================
@@ -1,111 +0,0 @@
-Name Type Description Change-O AIRR IMGT
-sequence_id Unique sequence identifier identity SEQUENCE_ID sequence_id Sequence ID
-sequence_input Query nucleotide sequence. nucleotide SEQUENCE_INPUT sequence Sequence
-sequence_vdj Aligned V(D)J sequence. nucleotide SEQUENCE_VDJ ['V-D-J-REGION', 'V-J-REGION', 'V-REGION']
-sequence_imgt IMGT-gapped, aligned V(D)J sequence. nucleotide SEQUENCE_IMGT sequence_alignment ['V-D-J-REGION', 'V-J-REGION', 'V-REGION']
-junction Junction region nucleotide sequence. nucleotide JUNCTION junction JUNCTION
-junction_aa Junction region amino acid sequence. aminoacid JUNCTION_AA junction_aa AA JUNCTION
-junction_start Start position of the junction in the query sequence. integer JUNCTION start
-junction_end End position of the junction in the query sequence. integer JUNCTION end
-junction_length Number of nucleotides in the junction region. integer JUNCTION_LENGTH junction_length JUNCTION-nt nb
-germline_vdj Inferred germline sequence aligned with the 'sequence_align' field. nucleotide GERMLINE_VDJ
-germline_vdj_d_mask Inferred germline sequence aligned with the 'sequence_align' field and having the D segment masked. nucleotide GERMLINE_VDJ_D_MASK
-germline_imgt Inferred germline sequence aligned with the 'sequence_imgt' field. nucleotide GERMLINE_IMGT germline_alignment
-germline_imgt_d_mask Inferred germline sequence aligned with the 'sequence_imgt' field and having the D segment masked. nucleotide GERMLINE_IMGT_D_MASK germline_alignment_d_mask
-v_call V gene with allele. identity V_CALL v_call V-GENE and allele
-d_call D gene with allele. identity D_CALL d_call D-GENE and allele
-j_call J gene with allele. identity J_CALL j_call J-GENE and allele
-c_call C region with allele. identity C_CALL c_call
-locus Gene locus. identity LOCUS locus
-rev_comp True if the the alignment is on the opposite strand (reverse complemented). boolean REV_COMP rev_comp Orientation
-functional True if the V(D)J sequence is a functional gene and is predicted to be productive. boolean FUNCTIONAL productive V-DOMAIN Functionality
-in_frame True if the V and J segment alignments are in-frame. boolean IN_FRAME vj_in_frame JUNCTION frame
-stop True if the aligned sequence contains a stop codon. boolean STOP stop_codon V-DOMAIN Functionality comment
-mutated_invariant True the aligment contains a mutated conserved amino acid. boolean MUTATED_INVARIANT mutated_invariant ['V-DOMAIN Functionality comment', 'V-REGION potential ins/del']
-indels True if the V(D)J sequence contains insertions and/or deletions. boolean INDELS indels ['V-REGION potential ins/del', 'V-REGION insertions', 'V-REGION deletions']
-v_seq_start Start position of the V segment in the query sequence. integer V_SEQ_START v_sequence_start V-REGION start
-v_seq_end End of the V segment in the query sequence. integer v_sequence_end V-REGION end
-v_seq_length Length of the V segment in the query sequence. integer V_SEQ_LENGTH
-v_germ_start_vdj Alignment start position in the V reference sequence. integer V_GERM_START_VDJ
-v_germ_end_vdj Alignment end position in the V reference sequence. integer
-v_germ_length_vdj Alignment length in the V reference sequence. integer V_GERM_LENGTH_VDJ
-v_germ_start_imgt Alignment start position in the IMGT-gapped V reference sequence. integer V_GERM_START_IMGT v_germline_start
-v_germ_end_imgt Alignment end position in the IMGT-gapped V reference sequence. integer v_germline_end
-v_germ_length_imgt Alignment length in the IMGT-gapped V reference sequence. integer V_GERM_LENGTH_IMGT
-np1_start Start position of the nucleotides between the V and D segments or V and J segments. integer
-np1_end End position of the nucleotides between the V and D segments or V and J segments. integer
-np1_length Number of nucleotides between the V and D segments or V and J segments. integer NP1_LENGTH np1_length "[""P3'V-nt nb"", 'N-REGION-nt nb', 'N1-REGION-nt nb', ""P5'D-nt nb""]"
-d_seq_start Start position of the D segment in the query sequence. integer D_SEQ_START d_sequence_start D-REGION start
-d_seq_end End position of the D segment in the query sequence. integer d_sequence_end D-REGION end
-d_seq_length Length of the D segment in the query sequence. integer D_SEQ_LENGTH D-REGION-nt nb
-d_germ_start Alignment start position in the D reference sequence. integer D_GERM_START d_germline_start 5'D-REGION trimmed-nt nb
-d_germ_end Alignment end position in the D reference sequence. integer d_germline_end
-d_germ_length Length of the alignment to the D reference sequence. integer D_GERM_LENGTH D-REGION-nt nb
-np2_start Start position of the nucleotides between the D and J segments. integer
-np2_end End position of the nucleotides between the D and J segments. integer
-np2_length Number of nucleotides between the D and J segments. integer NP2_LENGTH np2_length "[""P3'D-nt nb"", 'N2-REGION-nt nb', ""P5'J-nt nb""]"
-j_seq_start Start position of the J segment in the query sequence. integer J_SEQ_START j_sequence_start J-REGION start
-j_seq_end End position of the J segment in the query sequence. integer j_sequence_end J-REGION end
-j_seq_length Length of the J segment in the query sequence. integer J_SEQ_LENGTH
-j_germ_start Alignment start position in the J reference sequence. integer J_GERM_START j_germline_start 5'J-REGION trimmed-nt nb
-j_germ_end Alignment start position in the J reference sequence. integer j_germline_end
-j_germ_length Alignment length of the J reference sequence. integer J_GERM_LENGTH
-np1 Nucleotide sequence of the combined N/P region between the V and D segments or V and J segments. nucleotide NP1 np1 "[""P3'V"", 'N-REGION', 'N1-REGION', ""P5'D""]"
-np2 Nucleotide sequence of the combined N/P region between the D and J segments. nucleotide NP2 np2 "[""P3'D"", 'N2-REGION', ""P5'J""]"
-fwr1 Nucleotide sequence of the aligned FWR1 region. nucleotide FWR1_IMGT fwr1 FR1-IMGT
-fwr2 Nucleotide sequence of the aligned FWR2 region. nucleotide FWR2_IMGT fwr2 FR2-IMGT
-fwr3 Nucleotide sequence of the aligned FWR3 region. nucleotide FWR3_IMGT fwr3 FR3-IMGT
-fwr4 Nucleotide sequence of the aligned FWR4 region. nucleotide FWR4_IMGT fwr4 FR4-IMGT
-cdr1 Nucleotide sequence of the aligned CDR1 region. nucleotide CDR1_IMGT cdr1 CDR1-IMGT
-cdr2 Nucleotide sequence of the aligned CDR2 region. nucleotide CDR2_IMGT cdr2 CDR2-IMGT
-cdr3 Nucleotide sequence of the aligned CDR3 region. nucleotide CDR3_IMGT cdr3 CDR3-IMGT
-fwr1_start FWR1 start position in the query sequence. integer fwr1_start FR1-IMGT start
-fwr1_end FWR1 end position in the query sequence. integer fwr1_end FR1-IMGT end
-fwr2_start FWR2 start position in the query sequence. integer fwr2_start FR2-IMGT start
-fwr2_end FWR2 end position in the query sequence. integer fwr2_end FR2-IMGT end
-fwr3_start FWR3 start position in the query sequence. integer fwr3_start FR3-IMGT start
-fwr3_end FWR3 end position in the query sequence. integer fwr3_end FR3-IMGT end
-fwr4_start FWR4 start position in the query sequence. integer fwr4_start FR4-IMGT start
-fwr4_end FWR4 end position in the query sequence. integer fwr4_end FR4-IMGT end
-cdr1_start CDR1 start position in the query sequence. integer cdr1_start CDR1-IMGT start
-cdr1_end CDR1 end position in the query sequence. integer cdr1_end CDR1-IMGT end
-cdr2_start CDR2 start position in the query sequence. integer cdr2_start CDR2-IMGT start
-cdr2_end CDR2 end position in the query sequence. integer cdr2_end CDR2-IMGT end
-cdr3_start CDR3 start position in the query sequence. integer cdr3_start CDR3-IMGT start
-cdr3_end CDR3 end position in the query sequence. integer cdr3_end CDR3-IMGT end
-v_score V alignment score. float V_SCORE v_score V-REGION score
-v_identity V alignment fractional identity. float V_IDENTITY v_identity V-REGION identity %
-v_evalue V alignment E-value. float V_EVALUE v_support
-v_cigar V alignment CIGAR string. identity V_CIGAR v_cigar
-v_btop V alignment BTOP string. identity V_BTOP
-d_score D alignment score. float D_SCORE d_score D-REGION score
-d_identity D alignment fractional identity. float D_IDENTITY d_identity D-REGION identity %
-d_evalue D alignment E-value. float D_EVALUE d_support
-d_cigar D alignment CIGAR string. identity D_CIGAR d_cigar
-d_btop D alignment BTOP string. identity D_BTOP
-j_score J alignment score. float J_SCORE j_score J-REGION score
-j_identity J alignment fractional identity. float J_IDENTITY j_identity J-REGION identity %
-j_evalue J alignment E-value. float J_EVALUE j_support
-j_cigar J alignment CIGAR string. identity J_CIGAR j_cigar
-j_btop V alignment BTOP string. identity J_BTOP
-c_score C region alignment score. float C_SCORE c_score
-c_identity C region alignment fractional identity. float C_IDENTITY c_identity
-c_evalue C region alignment E-value. float C_EVALUE c_support
-c_cigar C region alignment CIGAR string. identity C_CIGAR c_cigar
-c_btop V alignment BTOP string. identity C_BTOP
-vdj_score Alignment score for aligners that consider the full sequence. float VDJ_SCORE vdj_score
-vdj_identity Alignment fractional identity for aligners that consider the full sequence. float VDJ_IDENTITY vdj_identity
-vdj_evalue Alignment E-value for aligners that consider the full sequence. float VDJ_EVALUE vdj_support
-vdj_cigar CIGAR string for the full V(D)J alignment. identity VDJ_CIGAR vdj_cigar
-vdj_btop BTOP string for the full V(D)J alignment. identity VDJ_BTOP
-n1_length Number of untemplated nucleotides 5' of the D segment. integer N1_LENGTH n1_length ['N-REGION-nt nb', 'N1-REGION-nt nb']
-n2_length Number of untemplated nucleotides 3' of the D segment. integer N2_LENGTH n2_length N2-REGION-nt nb
-p3v_length Number of palindromic nucleotides 3' of the V segment. integer P3V_LENGTH p3v_length P3'V-nt nb
-p5d_length Number of palindromic nucleotides 5' of the D segment. integer P5D_LENGTH p5d_length P5'D-nt nb
-p3d_length Number of palindromic nucleotides 3' of the D segment. integer P3D_LENGTH p3d_length P3'D-nt nb
-p5j_length Number of palindromic nucleotides 5' of the J segment. integer P5J_LENGTH p5j_length P5'J-nt nb
-d_frame Reading frame of the D segment. integer D_FRAME D-REGION reading frame
-dupcount Copy number or number of duplicate observations of the sequence. integer DUPCOUNT duplicate_count
-conscount Number of reads contributing to the (UMI) consensus for this sequence. integer CONSCOUNT consensus_count
-clone Clonal cluster assignment for the query sequence. identity CLONE clone_id
-cell Cell identifier. identity CELL cell_id
=====================================
requirements.txt
=====================================
@@ -1,8 +1,8 @@
numpy>=1.8
scipy>=0.14
pandas>=0.24
-biopython>=1.65
+biopython>=1.71
PyYAML>=3.12
setuptools>=2.0
-presto>=0.5.10
+presto>=0.6.2
airr>=1.2.1
View it on GitLab: https://salsa.debian.org/med-team/changeo/-/commit/fc08a33fe0fc1638ea20e03e42a849994955b825
--
View it on GitLab: https://salsa.debian.org/med-team/changeo/-/commit/fc08a33fe0fc1638ea20e03e42a849994955b825
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20201015/e1d63549/attachment-0001.html>
More information about the debian-med-commit
mailing list