[Debian-med-packaging] Bug#959237: ITP: odgi -- optimized dynamic genome/graph implementation

Michael R. Crusoe michael.crusoe at gmail.com
Fri May 1 13:31:44 BST 2020


Package: wnpp
Severity: wishlist

Subject: ITP: odgi -- optimized dynamic genome/graph implementation
Package: wnpp
Owner: Michael R. Crusoe <michael.crusoe at gmail.com>
Severity: wishlist

* Package name    : odgi
  Version         : 0.4.1
  Upstream Author : , Erik Garrison
* URL             : https://github.com/vgteam/odgi
* License         : Expat
  Programming Lang: C
  Description     : optimized dynamic genome/graph implementation
 Representing large genomic variation graphs with minimal memory overhead
 requires a careful encoding of the graph entities. It is possible to build
 succinct, static data structures to store queryable graphs, as in
 https://github.com/vgteam/xg, but dynamic data structures are more tricky to
 implement.
 .
 odgi follows the dynamic https://github.com/jltsiren/gbwt in developing a
 byte-packed version of the graph and paths through it. Each node is represented
 by a byte array into which variable length integers are used to represent,
 1) the node sequence, 2) its edges, and 3) the paths crossing the node.
 .
 The edges and path steps are recorded relativistically, as deltas between the
 current node id and the target node id, where the node id corresponds to the
 rank in the global array of nodes. Graphs built from biological data sets tend
 to have local partial order, and when sorted the stored deltas will tend to be
 small. This allows them to be compressed with a variable length integer
 representation, resulting in a small in-memory footprint at the cost of packing
 and unpacking.
 .
 The savings are substantial. In partially ordered regions of the graph, most
 deltas will require only a single byte. The resulting implementation is able
 to load the whole genome 1000 Genomes Project graph in around 20GB of RAM.
 .
 Initially, `odgi` has been developed to allow in-memory manipulation of graphs
 produced by the https://github.com/ekg/seqwish variation graph inducer.

Remark: This package is maintained by Debian Med Packaging Team at
   https://salsa.debian.org/med-team/odgi



More information about the Debian-med-packaging mailing list