[Debian-med-packaging] Bug#959237: ITP: odgi -- optimized dynamic genome/graph implementation
Michael R. Crusoe
michael.crusoe at gmail.com
Fri May 1 13:31:44 BST 2020
Package: wnpp
Severity: wishlist
Subject: ITP: odgi -- optimized dynamic genome/graph implementation
Package: wnpp
Owner: Michael R. Crusoe <michael.crusoe at gmail.com>
Severity: wishlist
* Package name : odgi
Version : 0.4.1
Upstream Author : , Erik Garrison
* URL : https://github.com/vgteam/odgi
* License : Expat
Programming Lang: C
Description : optimized dynamic genome/graph implementation
Representing large genomic variation graphs with minimal memory overhead
requires a careful encoding of the graph entities. It is possible to build
succinct, static data structures to store queryable graphs, as in
https://github.com/vgteam/xg, but dynamic data structures are more tricky to
implement.
.
odgi follows the dynamic https://github.com/jltsiren/gbwt in developing a
byte-packed version of the graph and paths through it. Each node is represented
by a byte array into which variable length integers are used to represent,
1) the node sequence, 2) its edges, and 3) the paths crossing the node.
.
The edges and path steps are recorded relativistically, as deltas between the
current node id and the target node id, where the node id corresponds to the
rank in the global array of nodes. Graphs built from biological data sets tend
to have local partial order, and when sorted the stored deltas will tend to be
small. This allows them to be compressed with a variable length integer
representation, resulting in a small in-memory footprint at the cost of packing
and unpacking.
.
The savings are substantial. In partially ordered regions of the graph, most
deltas will require only a single byte. The resulting implementation is able
to load the whole genome 1000 Genomes Project graph in around 20GB of RAM.
.
Initially, `odgi` has been developed to allow in-memory manipulation of graphs
produced by the https://github.com/ekg/seqwish variation graph inducer.
Remark: This package is maintained by Debian Med Packaging Team at
https://salsa.debian.org/med-team/odgi
More information about the Debian-med-packaging
mailing list