[Debian-med-packaging] Bug#962484: ITP: deblur -- deconvolution for Illumina amplicon sequencing

Mon Jun 8 18:14:49 BST 2020

Package: wnpp
Severity: wishlist

Subject: ITP: deblur -- deconvolution for Illumina amplicon sequencing
Package: wnpp
Owner: Steffen Moeller <moeller at debian.org>
Severity: wishlist

* Package name    : deblur
  Version         : 1.1.0
  Upstream Author : , Deblur development team
* URL             : https://github.com/biocore/deblur
* License         : BSD-3
  Programming Lang: Python
  Description     : deconvolution for Illumina amplicon sequencing
 Deblur is a greedy deconvolution algorithm for amplicon sequencing
 based on Illumina Miseq/Hiseq error profiles.  The authors recommend
 using Deblur via the QIIME2 plugin q2-deblur. Examples of its use can be
 found within the plugin itself. However, Deblur itself does not depend
 on QIIME2.
 .
 The input to Deblur workflow is a directory of FASTA or FASTQ files
 (1 per sample) or a single demultiplexed FASTA or FASTQ file. These
 files can be gzip'd. The output directory will contain three BIOM
 tables in which the observation IDs are the Deblurred sequences. The
 outputs are contingent on the reference databases used and a more
 focused discussion on them is in the subsequent README section titled
 "Positive and Negative Filtering." The output files are as follows:
 .
  * reference-hit.biom : contains only Deblurred reads matching the
    positive filtering database. By default, a reference composed of 16S
    sequences is used, and this resulting table will contain only those
    reads which recruit at a coarse level to it will be retained. Reads
    are also filtered against the negative reference, which by default
    will remove any read which appears to be PhiX or adapter.
 .
  * reference-hit.seqs.fa : a fasta file containing all the sequences
    in reference-hit.biom
 .
  * reference-non-hit.biom : contains only Deblurred reads that did not
    align to the positive filtering database. Negative filtering is also
    appied to this table, so by default, PhiX and adapter are removed.
 .
  * reference-non-hit.seqs.fa : a fasta file containing all the
    sequences in reference-non-hit.biom
 .
  * all.biom : contains all Deblurred reads. This file represents the
    union of the "reference-hit.biom" and "reference-non-hit.biom" tables.
 .
    * all.seqs.fa : a fasta file containing all the sequences in all.biom
 .
 Deblur uses two types of filtering on the sequences:
 .
  * Negative mode - removes known artifact sequences (i.e. sequences
    aligning to PhiX or Adapter with >=95% identity and coverage).
 .
  * Positive mode - keeps only sequences similar to a reference database
    (by default known 16S sequences). SortMeRNA is used, and any sequence
    with an e-value <= 10 is retained. Deblur also outputs a BIOM table
    without this positive filtering step (named all.biom).
 .
 The FASTA files for both of these filtering steps can be supplied via
 the --neg-ref-fp and --pos-ref-fp options. By default, the negative
 database is composed of PhiX and adapter sequence and the positive
 database of known 16S sequences.
 .
 Deblur uses negative mode filtering to remove known artifact (i.e. PhiX
 and Adapter sequences) prior to denoising. The output of Deblur contains
 three files: all.biom, which includes all sOTUs, reference-hit.biom,
 which contains the output of positive filtering of the sOTUs (default
 only sOTUs similar to 16S sequences), and reference-non-hit.biom,
 which contains only sOTUs failing the positive filtering (default only
 non-16S sOTUs).

Remark: This package is maintained by Debian Med Packaging Team at
   https://salsa.debian.org/med-team/deblur