[Debian-med-packaging] Bug#962484: ITP: deblur -- deconvolution for Illumina amplicon sequencing
Steffen Moeller
moeller at debian.org
Mon Jun 8 18:14:49 BST 2020
Package: wnpp
Severity: wishlist
Subject: ITP: deblur -- deconvolution for Illumina amplicon sequencing
Package: wnpp
Owner: Steffen Moeller <moeller at debian.org>
Severity: wishlist
* Package name : deblur
Version : 1.1.0
Upstream Author : , Deblur development team
* URL : https://github.com/biocore/deblur
* License : BSD-3
Programming Lang: Python
Description : deconvolution for Illumina amplicon sequencing
Deblur is a greedy deconvolution algorithm for amplicon sequencing
based on Illumina Miseq/Hiseq error profiles. The authors recommend
using Deblur via the QIIME2 plugin q2-deblur. Examples of its use can be
found within the plugin itself. However, Deblur itself does not depend
on QIIME2.
.
The input to Deblur workflow is a directory of FASTA or FASTQ files
(1 per sample) or a single demultiplexed FASTA or FASTQ file. These
files can be gzip'd. The output directory will contain three BIOM
tables in which the observation IDs are the Deblurred sequences. The
outputs are contingent on the reference databases used and a more
focused discussion on them is in the subsequent README section titled
"Positive and Negative Filtering." The output files are as follows:
.
* reference-hit.biom : contains only Deblurred reads matching the
positive filtering database. By default, a reference composed of 16S
sequences is used, and this resulting table will contain only those
reads which recruit at a coarse level to it will be retained. Reads
are also filtered against the negative reference, which by default
will remove any read which appears to be PhiX or adapter.
.
* reference-hit.seqs.fa : a fasta file containing all the sequences
in reference-hit.biom
.
* reference-non-hit.biom : contains only Deblurred reads that did not
align to the positive filtering database. Negative filtering is also
appied to this table, so by default, PhiX and adapter are removed.
.
* reference-non-hit.seqs.fa : a fasta file containing all the
sequences in reference-non-hit.biom
.
* all.biom : contains all Deblurred reads. This file represents the
union of the "reference-hit.biom" and "reference-non-hit.biom" tables.
.
* all.seqs.fa : a fasta file containing all the sequences in all.biom
.
Deblur uses two types of filtering on the sequences:
.
* Negative mode - removes known artifact sequences (i.e. sequences
aligning to PhiX or Adapter with >=95% identity and coverage).
.
* Positive mode - keeps only sequences similar to a reference database
(by default known 16S sequences). SortMeRNA is used, and any sequence
with an e-value <= 10 is retained. Deblur also outputs a BIOM table
without this positive filtering step (named all.biom).
.
The FASTA files for both of these filtering steps can be supplied via
the --neg-ref-fp and --pos-ref-fp options. By default, the negative
database is composed of PhiX and adapter sequence and the positive
database of known 16S sequences.
.
Deblur uses negative mode filtering to remove known artifact (i.e. PhiX
and Adapter sequences) prior to denoising. The output of Deblur contains
three files: all.biom, which includes all sOTUs, reference-hit.biom,
which contains the output of positive filtering of the sOTUs (default
only sOTUs similar to 16S sequences), and reference-non-hit.biom,
which contains only sOTUs failing the positive filtering (default only
non-16S sOTUs).
Remark: This package is maintained by Debian Med Packaging Team at
https://salsa.debian.org/med-team/deblur
More information about the Debian-med-packaging
mailing list