[Debian-med-packaging] Bug#1042769: provean: incompatible with cd-hit >= 4.8.1-4
Andrius Merkys
merkys at debian.org
Wed Apr 2 08:41:39 BST 2025
Control: tags -1 + patch pending
On Wed, 2 Apr 2025 10:05:52 +0300 Andrius Merkys <merkys at debian.org> wrote:
> I finally managed to isolate the difference in cdhit output which causes
> segfaults in provean. It seems that cdhit >= 4.8.1-4 replaced full FASTA
> headers in its output with partial IDs:
>
> diff -r /home/andrius/provean/good/cdhit.cluster
> /home/andrius/provean/bad/cdhit.cluster
> 1c1
> < >gi|119610548|gb|EAW90142.1| tumor protein p53 (Li-Fraumeni syndrome),
> isoform CRA_c
> ---
> > >EAW90142.1 tumor protein p53 (Li-Fraumeni syndrome), isoform CRA_c
> [Homo sapiens]
>
> I need to look deeper if cdhit could be persuaded to use the old output
> format. If not, provean will have to be adjusted to the change.
I was wrong, it is blastdbcmd which has changed its default format to
not replicate the full input FASTA header. I managed to successfully
patch the code to explicitly set the requested output format.
It would be nice to add an autopkgtest to prevent regressions, but the
input database is ~12GB (and it seems that only one from [1] works).
Andrius
[1] ftp://ftp.jcvi.org/data/provean/nr_Aug_2011/
More information about the Debian-med-packaging
mailing list