[med-svn] [Git][med-team/deblur][upstream] New upstream version 1.1.1
Andreas Tille (@tille)
gitlab at salsa.debian.org
Tue Jan 30 19:19:20 GMT 2024
Andreas Tille pushed to branch upstream at Debian Med / deblur
Commits:
26effdc3 by Andreas Tille at 2024-01-30T20:05:11+01:00
New upstream version 1.1.1
- - - - -
15 changed files:
- .coveragerc
- + .github/workflows/main.yml
- − .travis.yml
- ChangeLog.md
- README.md
- deblur/__init__.py
- deblur/deblurring.py
- deblur/sequence.py
- deblur/test/test_deblurring.py
- deblur/test/test_mixedcase.py
- deblur/test/test_script.py
- deblur/test/test_workflow.py
- deblur/workflow.py
- scripts/deblur
- setup.py
Changes:
=====================================
.coveragerc
=====================================
@@ -6,7 +6,6 @@ omit =
*/__init__.py
source = deblur
branch = True
-include = */deblur/*
[report]
exclude_lines =
=====================================
.github/workflows/main.yml
=====================================
@@ -0,0 +1,80 @@
+name: "Main CI"
+
+on:
+ pull_request:
+ branches: [ master ]
+ push:
+ branches: [ master ]
+
+jobs:
+ main:
+ runs-on: ubuntu-latest
+
+ strategy:
+ matrix:
+ python-version: ["3.8"]
+
+ steps:
+ - uses: actions/checkout at v2
+ with:
+ persist-credentials: false
+ fetch-depth: 0
+
+ - uses: conda-incubator/setup-miniconda at v2
+ with:
+ activate-environment: deblur
+ python-version: ${{ matrix.python-version }}
+
+ - name: Install packages
+ shell: bash -l {0}
+ run: |
+ conda create --yes -n deblur python=${{ matrix.python-version }} pip nose flake8 h5py
+ conda activate deblur
+ conda install --yes -c bioconda VSEARCH>=2.7.0 MAFFT>=7.394 SortMeRNA=2.0
+
+
+ echo "=============================="
+ echo "====== Software Versions ====="
+ echo "=============================="
+
+ echo "vsearch version:"
+ vsearch --version
+
+ echo "mafft version:"
+ mafft --version
+
+ echo "=============================="
+ echo "=============================="
+ echo "=============================="
+
+ pip install coveralls
+ pip install -e .
+
+ - name: Run tests
+ shell: bash -l {0}
+ run: |
+ conda activate deblur
+ nosetests --with-doctest --with-coverage --cover-package=deblur
+
+ coveralls_finish:
+ needs: main
+ runs-on: ubuntu-latest
+ steps:
+ - name: Coveralls Finished
+ uses: AndreMiras/coveralls-python-action at develop
+
+ lint:
+ runs-on: ubuntu-latest
+ steps:
+ - name: flake8
+ uses: actions/setup-python at v2
+ with:
+ python-version: ${{ matrix.python-version }}
+ - name: install dependencies
+ run: python -m pip install --upgrade pip
+ - name: Check out repository code
+ uses: actions/checkout at v2
+ - name: lint
+ run: |
+ pip install -q flake8
+ flake8 deblur scripts/deblur setup.py
=====================================
.travis.yml deleted
=====================================
@@ -1,22 +0,0 @@
-language: python
-env:
- - PYTHON_VERSION=3.5
- - PYTHON_VERSION=3.6
-before_install:
- - wget http://repo.continuum.io/miniconda/Miniconda3-3.7.3-Linux-x86_64.sh -O miniconda.sh
- - chmod +x miniconda.sh
- - ./miniconda.sh -b
- - export PATH=/home/travis/miniconda3/bin:$PATH
- # Update conda itself
- - conda update --yes conda
-install:
- - conda create --yes -n deblur python=$PYTHON_VERSION pip nose flake8 h5py
- - source activate deblur
- - conda install --yes -c bioconda "VSEARCH=2.7.0" MAFFT=7.310 SortMeRNA=2.0
- - pip install -U pip coveralls
- - pip install --process-dependency-links .
-script:
- - nosetests --with-doctest --with-coverage --cover-package=deblur
- - flake8 --max-line-length=200 deblur scripts/deblur setup.py
-after_success:
- - coveralls
=====================================
ChangeLog.md
=====================================
@@ -1,5 +1,17 @@
# deblur changelog
+## Version 1.1.1
+
+Official version 1.1.1 Released on 2 June 2022.
+
+### Performance enhancements
+
+* Moved CI to GitHub Actions.
+* Updated to Python 3.8.
+* Updated MAFFT to >=7.394.
+* Updated VSEARCH to >=2.7.0.
+* Updated support to click > 8.
+
## Version 1.1.0
Official version 1.1.0 Released on 12 September 2018.
=====================================
README.md
=====================================
@@ -1,16 +1,16 @@
Deblur
======
-[![Build Status](https://travis-ci.org/biocore/deblur.png?branch=master)](https://travis-ci.org/biocore/deblur)
+[![Build Status](https://github.com/biocore/deblur/actions/workflows/main.yml/badge.svg)](https://github.com/biocore/deblur/actions/workflows/main.yml)
[![Coverage Status](https://coveralls.io/repos/github/biocore/deblur/badge.svg?branch=master)](https://coveralls.io/github/biocore/deblur?branch=master)
Deblur is a greedy deconvolution algorithm for amplicon sequencing based on Illumina Miseq/Hiseq error profiles.
Install
=======
-- Deblur requires Python 3.5. If Python 3.5 is not installed, you can create a [conda](http://conda.pydata.org/docs/install/quick.html) environment for Deblur using:
+- Deblur requires Python 3.8. If Python 3.8 is not installed, you can create a [conda](http://conda.pydata.org/docs/install/quick.html) environment for Deblur using:
```
-conda create -n deblurenv python=3.5 numpy
+conda create -n deblurenv python=3.8 numpy
```
and activate it using:
@@ -22,7 +22,7 @@ source activate deblurenv
Install Deblur dependencies and Deblur itself:
```
-conda install -c bioconda -c biocore "VSEARCH=2.7.0" MAFFT=7.310 SortMeRNA=2.0 biom-format deblur
+conda install -c bioconda -c biocore VSEARCH>=2.7.0 MAFFT>=7.394 SortMeRNA=2.0 biom-format deblur
```
N.B. Some dependencies are version restricted at the moment but for different reasons. SortMeRNA 2.1 has a different output format which Deblur is not compatible with yet. A review of the changelog did not reveal any remarkable notes (e.g., bugs) about the reasons for the differences. In testing, the differences affected <0.1% of the sOTUs. As a precaution, we are advising the use of these specific versions for consistency with the manuscript.
=====================================
deblur/__init__.py
=====================================
@@ -6,4 +6,4 @@
# The full license is in the file LICENSE, distributed with this software.
# -----------------------------------------------------------------------------
-__version__ = "1.1.0"
+__version__ = "1.1.1"
=====================================
deblur/deblurring.py
=====================================
@@ -166,10 +166,12 @@ def deblur(input_seqs, mean_error=0.005,
sub_seq_j[mask] == 4)
num_indels = mut_is_indel.sum()
if num_indels > 0:
- # need to account for indel in one sequence not solved in the other
- # (so we have '-' at the end. Need to ignore it in the total count)
- h_dist = np.count_nonzero(np.not_equal(seq_i.np_sequence[:length],
- seq_j.np_sequence[:length]))
+ # need to account for indel in one sequence not solved in the
+ # other (so we have '-' at the end. Need to ignore it in the
+ # total count)
+ h_dist = np.count_nonzero(
+ np.not_equal(seq_i.np_sequence[:length],
+ seq_j.np_sequence[:length]))
num_substitutions = h_dist - num_indels
=====================================
deblur/sequence.py
=====================================
@@ -43,7 +43,8 @@ class Sequence(object):
self.sequence = sequence.upper()
self.length = len(self.sequence)
self.unaligned_length = self.length - self.sequence.count('-')
- self.frequency = float(re.search('(?<=size=)\w+', self.label).group(0))
+ self.frequency = float(
+ re.search(r'(?<=size=)\w+', self.label).group(0))
self.np_sequence = np.array(
[trans_dict[b] for b in self.sequence], dtype=np.int8)
@@ -63,7 +64,7 @@ class Sequence(object):
str
The FASTA representation of the sequence
"""
- prefix, suffix = re.split('(?<=size=)\w+', self.label, maxsplit=1)
+ prefix, suffix = re.split(r'(?<=size=)\w+', self.label, maxsplit=1)
new_count = int(round(self.frequency))
new_label = "%s%d%s" % (prefix, new_count, suffix)
return ">%s\n%s\n" % (new_label, self.sequence)
=====================================
deblur/test/test_deblurring.py
=====================================
@@ -118,13 +118,17 @@ class DeblurringTests(TestCase):
tseq = cseq[:10] + '-' + cseq[10:]
newseqs.append((chead, tseq))
- # now add a sequence with an A insertion at the expected freq. (30 < 0.02 * (720 / 0.47) where 0.47 is the mod_factor) so should be removed
+ # now add a sequence with an A insertion at the expected freq.
+ # (30 < 0.02 * (720 / 0.47) where 0.47 is the mod_factor) so should
+ # be removed
cseq = newseqs[0][1]
tseq = cseq[:10] + 'A' + cseq[11:-1] + '-'
chead = '>indel1-read;size=30;'
newseqs.append((chead, tseq))
- # and add a sequence with an A insertion but at higher freq. (not expected by indel upper bound - (31 > 0.02 * (720 / 0.47) so should not be removed)
+ # and add a sequence with an A insertion but at higher freq. (not
+ # expected by indel upper bound - (31 > 0.02 * (720 / 0.47) so should
+ # not be removed)
cseq = newseqs[0][1]
tseq = cseq[:10] + 'A' + cseq[11:-1] + '-'
chead = '>indel2-read;size=31;'
@@ -142,7 +146,8 @@ class DeblurringTests(TestCase):
"tacggagggtgcaagcgttaatcggaattactgggcgtaaagcgcacgcaggcggt"
"ttgttaagtcagatgtgaaatccccgggctcaacctgggaactgcatctgatactg"
"gcaagcttgagtctcgtagaggggggcagaattccag")]
- # make sure we get 2 sequences as output - the original and the indel2 (too many reads for the expected indel probabilty)
+ # make sure we get 2 sequences as output - the original and the
+ # indel2 (too many reads for the expected indel probabilty)
self.assertEqual(len(obs), 2)
# and that it is the correct sequence
self.assertEqual(obs[0].sequence, exp[0].sequence)
=====================================
deblur/test/test_mixedcase.py
=====================================
@@ -59,7 +59,7 @@ class TestScript(TestCase):
self.assertEqual(table.shape, (2, 2))
# assert that counts from different case entries are collapsed
- self.assertTrue(list(table.to_dataframe().to_dense().loc[
+ self.assertTrue(list(table.to_dataframe().sparse.to_dense().loc[
('TACGGGGGGGGTTAGCGTTATTCAATGATATTTGGCGTAAAGTGCATGTAGATGGTGTTAC'
'AAGTTAAAAAAATAAAAACTAAGGACAAATCTTTTCGTT'), :].values) == [60, 0])
=====================================
deblur/test/test_script.py
=====================================
@@ -1,12 +1,34 @@
from unittest import TestCase, main
from os.path import join, abspath, dirname
+from os import environ
+import subprocess
from biom import load_table
-from deblur.workflow import _system_call
from deblur.workflow import sequence_generator
from tempfile import mkdtemp
+def _system_call(cmd):
+ # this is a wrapper so tests pass in the github actions
+ cmd = ' '.join(cmd)
+
+ conda_env = environ.get('CONDA_DEFAULT_ENV')
+ if conda_env is not None:
+ cmd = f"bash -c '. ~/.profile; conda activate {conda_env}; {cmd};'"
+
+ proc = subprocess.Popen(cmd, universal_newlines=True, shell=True,
+ stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+
+ stdout, stderr = proc.communicate()
+ return_value = proc.returncode
+
+ if stderr != 0:
+ print(cmd)
+ print(stdout, stderr, return_value)
+
+ return stdout, stderr, return_value
+
+
class TestScript(TestCase):
def setUp(self):
@@ -58,7 +80,8 @@ class TestScript(TestCase):
sout, serr, res = _system_call(cmd)
self.validate_results(self.output_biom, self.orig_one_seq_fp)
- # test default parameters except min-reads set to 0, negative mode, single thread
+ # test default parameters except min-reads set to 0, negative mode,
+ # single thread
cmd = ["deblur", "workflow", "--seqs-fp", self.seqs_fp,
"--output-dir", self.output_dir,
"--trim-length", "150", '-w', '--min-reads', '0']
=====================================
deblur/test/test_workflow.py
=====================================
@@ -310,22 +310,31 @@ class workflowTests(TestCase):
obs_seqs = no_artifacts_table.ids(axis='observation')
self.assertEqual(set(obs_seqs), set(orig_seqs))
# test the fasta file
- no_artifacts_fasta_name = join(self.working_dir, 'reference-hit.seqs.fa')
- fasta_seqs = [item[1] for item in sequence_generator(no_artifacts_fasta_name)]
+ no_artifacts_fasta_name = join(
+ self.working_dir, 'reference-hit.seqs.fa')
+ fasta_seqs = [item[1]
+ for item in sequence_generator(no_artifacts_fasta_name)]
self.assertEqual(set(fasta_seqs), set(orig_seqs))
# test the non-hit output biom
artifacts_table_name = join(self.working_dir, 'reference-non-hit.biom')
artifacts_table = load_table(artifacts_table_name)
obs_seqs = artifacts_table.ids(axis='observation')
- artifact_seqs = ['aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaatttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt',
- 'AACAATGGGGGCAAGCGTTAATCATAATGGCTTAAAGAATTCGTAGAATtatatatattatatatatatTAGAGTTAATAAATATTAATTAAAGAATTATAACAATGGGGGCAAGCGTTAATCATAATGGCTTAAAGAATTCGTAGAATT']
+ artifact_seqs = [
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaatttttttttttttttttttttttttttttttttttttttttttt'
+ 'tttttttttttttttttttttt',
+ 'AACAATGGGGGCAAGCGTTAATCATAATGGCTTAAAGAATTCGTAGAATtatatatattatata'
+ 'tatatTAGAGTTAATAAATATTAATTAAAGAATTATAACAATGGGGGCAAGCGTTAATCATAAT'
+ 'GGCTTAAAGAATTCGTAGAATT']
artifact_seqs = [item[:trim_length].upper() for item in artifact_seqs]
obs_seqs = [item[:trim_length].upper() for item in obs_seqs]
self.assertEqual(set(obs_seqs), set(artifact_seqs))
# test the fasta file
- artifacts_fasta_name = join(self.working_dir, 'reference-non-hit.seqs.fa')
- fasta_seqs = [item[1].upper() for item in sequence_generator(artifacts_fasta_name)]
+ artifacts_fasta_name = join(
+ self.working_dir, 'reference-non-hit.seqs.fa')
+ fasta_seqs = [item[1].upper()
+ for item in sequence_generator(artifacts_fasta_name)]
self.assertEqual(set(fasta_seqs), set(artifact_seqs))
self.assertEqual(len(obs_seqs), 2)
@@ -371,12 +380,9 @@ class workflowTests(TestCase):
ref_db_fp = build_index_sortmerna(
ref_fp=(ref_fp,),
working_dir=self.working_dir)
- output_fp, num_seqs_left, tmp_files = remove_artifacts_seqs(seqs_fp=seqs_fp,
- ref_fp=(ref_fp,),
- working_dir=self.working_dir,
- ref_db_fp=ref_db_fp,
- negate=False,
- threads=1)
+ output_fp, num_seqs_left, tmp_files = remove_artifacts_seqs(
+ seqs_fp=seqs_fp, ref_fp=(ref_fp,), working_dir=self.working_dir,
+ ref_db_fp=ref_db_fp, negate=False, threads=1)
obs_seqs = []
for label, seq in sequence_generator(output_fp):
obs_seqs.append(label)
@@ -423,12 +429,9 @@ class workflowTests(TestCase):
# build index
sortmerna_db = build_index_sortmerna([ref_fp], self.working_dir)
output_fp = join(self.working_dir, "seqs_filtered.fasta")
- output_fp, num_seqs_left, _ = remove_artifacts_seqs(seqs_fp=seqs_fp,
- ref_fp=(ref_fp,),
- working_dir=self.working_dir,
- ref_db_fp=sortmerna_db,
- negate=False,
- threads=1)
+ output_fp, num_seqs_left, _ = remove_artifacts_seqs(
+ seqs_fp=seqs_fp, ref_fp=(ref_fp,), working_dir=self.working_dir,
+ ref_db_fp=sortmerna_db, negate=False, threads=1)
obs_seqs = []
for label, seq in sequence_generator(output_fp):
@@ -475,12 +478,9 @@ class workflowTests(TestCase):
self.files_to_remove.append(ref_fp)
ref_db_fp = build_index_sortmerna([ref_fp], self.working_dir)
output_fp = join(self.working_dir, "seqs_filtered.fasta")
- output_fp, num_seqs_left, _ = remove_artifacts_seqs(seqs_fp=seqs_fp,
- ref_fp=(ref_fp,),
- working_dir=self.working_dir,
- ref_db_fp=ref_db_fp,
- negate=True,
- threads=1)
+ output_fp, num_seqs_left, _ = remove_artifacts_seqs(
+ seqs_fp=seqs_fp, ref_fp=(ref_fp,), working_dir=self.working_dir,
+ ref_db_fp=ref_db_fp, negate=True, threads=1)
obs_seqs = []
for label, seq in sequence_generator(output_fp):
obs_seqs.append(label)
=====================================
deblur/workflow.py
=====================================
@@ -33,7 +33,7 @@ sniff_fastq = skbio.io.io_registry.get_sniffer('fastq')
def _get_fastq_variant(input_fp):
- # http://scikit-bio.org/docs/latest/generated/skbio.io.format.fastq.html#format-parameters
+ # https://bit.ly/3GEDIxF
variant = None
variants = ['illumina1.8', 'illumina1.3', 'solexa', 'sanger']
for v in variants:
@@ -276,12 +276,14 @@ def fasta_from_biom(table, fasta_file_name):
Name of the fasta output file
'''
logger = logging.getLogger(__name__)
- logger.debug('saving biom table sequences to fasta file %s' % fasta_file_name)
+ logger.debug(
+ 'saving biom table sequences to fasta file %s' % fasta_file_name)
with open(fasta_file_name, 'w') as f:
for cseq in table.ids(axis='observation'):
f.write('>%s\n%s\n' % (cseq, cseq))
- logger.info('saved biom table sequences to fasta file %s' % fasta_file_name)
+ logger.info(
+ 'saved biom table sequences to fasta file %s' % fasta_file_name)
def remove_artifacts_from_biom_table(table_filename,
@@ -311,13 +313,10 @@ def remove_artifacts_from_biom_table(table_filename,
logger.info('getting 16s sequences from the biom table')
# remove artifacts from the fasta file. output is in clean_fp fasta file
- clean_fp, num_seqs_left, tmp_files = remove_artifacts_seqs(fasta_filename, ref_fp,
- working_dir=biom_table_dir,
- ref_db_fp=ref_db_fp,
- negate=False, threads=threads,
- verbose=verbose,
- sim_thresh=sim_thresh,
- coverage_thresh=coverage_thresh)
+ clean_fp, num_seqs_left, tmp_files = remove_artifacts_seqs(
+ fasta_filename, ref_fp, working_dir=biom_table_dir,
+ ref_db_fp=ref_db_fp, negate=False, threads=threads, verbose=verbose,
+ sim_thresh=sim_thresh, coverage_thresh=coverage_thresh)
if clean_fp is None:
logger.warn("No clean sequences in %s" % fasta_filename)
return tmp_files
@@ -696,13 +695,14 @@ def create_otu_table(output_fp, deblurred_list,
'into output table %s' % (len(deblurred_list), output_fp))
# the regexp for finding the number of reads of a sequence
- sizeregexp = re.compile('(?<=size=)\w+')
+ sizeregexp = re.compile(r'(?<=size=)\w+')
seqdict = {}
seqlist = []
sampset = set()
samplist = []
# arbitrary size for the sparse results matrix so we won't run out of space
- obs = scipy.sparse.dok_matrix((int(1E9), len(deblurred_list)), dtype=np.double)
+ obs = scipy.sparse.dok_matrix(
+ (int(1E9), len(deblurred_list)), dtype=np.double)
# load the sequences from all samples into a sprase matrix
sneaking_extensions = {'fasta', 'fastq', 'fna', 'fq', 'fa'}
@@ -839,13 +839,10 @@ def launch_workflow(seqs_fp, working_dir, mean_error, error_dist,
output_fp=output_derep_fp,
min_size=min_size, threads=threads_per_sample)
# Step 3: Remove artifacts
- output_artif_fp, num_seqs_left, _ = remove_artifacts_seqs(seqs_fp=output_derep_fp,
- ref_fp=ref_fp,
- working_dir=working_dir,
- ref_db_fp=ref_db_fp,
- negate=True,
- threads=threads_per_sample,
- sim_thresh=sim_thresh)
+ output_artif_fp, num_seqs_left, _ = remove_artifacts_seqs(
+ seqs_fp=output_derep_fp, ref_fp=ref_fp, working_dir=working_dir,
+ ref_db_fp=ref_db_fp, negate=True, threads=threads_per_sample,
+ sim_thresh=sim_thresh)
if not output_artif_fp:
warnings.warn('Problem removing artifacts from file %s' %
seqs_fp, UserWarning)
=====================================
scripts/deblur
=====================================
@@ -42,6 +42,9 @@ def error_dist_from_str(ctx, param, value):
if not isinstance(value, str):
return value
try:
+ # if string with [], remove them
+ if value[0] == '[' and value[-1] == ']':
+ value = value[1:-1]
error_dist = list(map(float, value.split(',')))
return error_dist
except ValueError:
=====================================
setup.py
=====================================
@@ -28,7 +28,7 @@ classes = """
Topic :: Software Development :: Libraries :: Application Frameworks
Topic :: Software Development :: Libraries :: Python Modules
Programming Language :: Python
- Programming Language :: Python :: 3.5
+ Programming Language :: Python :: 3.8
Programming Language :: Python :: Implementation :: CPython
Operating System :: POSIX :: Linux
Operating System :: MacOS :: MacOS X
@@ -53,7 +53,7 @@ setup(name='deblur',
scripts=glob('scripts/*'),
extras_require={'test': ["nose >= 0.10.1", "pep8"],
'doc': ["Sphinx >= 1.2.2", "sphinx-bootstrap-theme"]},
- install_requires=['click >= 6', 'numpy >= 1.7',
+ install_requires=['click', 'numpy >= 1.7',
'scikit-bio >= 0.5.0, < 0.6.0',
'biom-format >= 2.1.3, < 2.2.0',
'h5py >= 2.2.0', 'scipy >= 0.15.1'],
View it on GitLab: https://salsa.debian.org/med-team/deblur/-/commit/26effdc31aae9fbf3311ee434ea4583f1dabf641
--
View it on GitLab: https://salsa.debian.org/med-team/deblur/-/commit/26effdc31aae9fbf3311ee434ea4583f1dabf641
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20240130/8677863f/attachment-0001.htm>
More information about the debian-med-commit
mailing list