[Debian-med-packaging] Bug#1042769: provean: incompatible with cd-hit >= 4.8.1-4

Andrius Merkys merkys at debian.org
Wed Apr 2 08:41:39 BST 2025


Control: tags -1 + patch pending

On Wed, 2 Apr 2025 10:05:52 +0300 Andrius Merkys <merkys at debian.org> wrote:
> I finally managed to isolate the difference in cdhit output which causes 
> segfaults in provean. It seems that cdhit >= 4.8.1-4 replaced full FASTA 
> headers in its output with partial IDs:
> 
> diff -r /home/andrius/provean/good/cdhit.cluster 
> /home/andrius/provean/bad/cdhit.cluster
> 1c1
> < >gi|119610548|gb|EAW90142.1| tumor protein p53 (Li-Fraumeni syndrome), 
> isoform CRA_c
> ---
>  > >EAW90142.1 tumor protein p53 (Li-Fraumeni syndrome), isoform CRA_c 
> [Homo sapiens]
> 
> I need to look deeper if cdhit could be persuaded to use the old output 
> format. If not, provean will have to be adjusted to the change.

I was wrong, it is blastdbcmd which has changed its default format to 
not replicate the full input FASTA header. I managed to successfully 
patch the code to explicitly set the requested output format.

It would be nice to add an autopkgtest to prevent regressions, but the 
input database is ~12GB (and it seems that only one from [1] works).

Andrius

[1] ftp://ftp.jcvi.org/data/provean/nr_Aug_2011/



More information about the Debian-med-packaging mailing list