[Debian-med-packaging] Bug#963028: ITP: pychopper -- identify, orient and trim full-length Nanopore cDNA reads
Steffen Moeller
moeller at debian.org
Thu Jun 18 00:47:14 BST 2020
Package: wnpp
Severity: wishlist
Subject: ITP: pychopper -- identify, orient and trim full-length Nanopore cDNA reads
Package: wnpp
Owner: Steffen Moeller <moeller at debian.org>
Severity: wishlist
* Package name : pychopper
Version : 2.4.0
Upstream Author : Oxford Nanopore Technologies Ltd.
* URL : https://github.com/nanoporetech/pychopper
* License : MPL-2.0
Programming Lang: Python
Description : identify, orient and trim full-length Nanopore cDNA reads
Pychopper v2 is a tool to identify, orient and trim full-length Nanopore
cDNA reads. The tool is also able to rescue fused reads. The general
approach of Pychopper v2 is the following:
.
* Pychopper first identifies alignment hits of the primers across the
length of the sequence. The default method for doing this is using
nhmmscan with the pre-trained strand specific profile HMMs, included
with the package. Alternatively, one can use the edlib backend,
which uses a combination of global and local alignment to identify
the primers within the read.
* After identifying the primer hits by either of the backends, the
reads are divided into segments defined by two consecutive primer
hits. The score of a segment is its length if the configuration of
the flanking primer hits is valid (such as SPP,-VNP for forward reads)
or zero otherwise.
* The segments are assigned to rescued reads using a dynamic programming
algorithm maximizing the sum of used segment scores (hence the amount
of rescued bases). A crucial observation about the algorithm is that
if a segment is included as a rescued read, then the next segment
must be excluded as one of the primer hits defining it was "used
up" by the previous segment. This put constraints on the dynamic
programming graph. The arrows in read define the optimal path for
rescuing two fused reads with the a total score of l1 + l3.
.
A crucial parameter of Pychopper v2 is -q, which determines the
stringency of primer alignment (E-value in the case of the pHMM
backend). This can be explicitly specified by the user, however by
default it is optimized on a random sample of input reads to produce
the maximum number of classified reads.
Remark: This package is maintained by Debian Med Packaging Team at
https://salsa.debian.org/med-team/pychopper
More information about the Debian-med-packaging
mailing list