[Debian-med-packaging] Question about htslib usage in blat / libucsc source

Andreas Tille andreas at fam-tille.de
Mon Jan 27 08:45:23 GMT 2020


Hi John,

thanks a lot for your quick and helpful response.

On Fri, Jan 24, 2020 at 02:34:54PM +0000, John Marshall wrote:
> On 22 Jan 2020, at 16:26, Andreas Tille <andreas at an3as.eu> wrote:
> > I'm currently trying to package blat which is very tricky since it
> > contains non-free parts, a private copy of htslib (eliminating this is
> > my current intention) and upstream who insists that using a code copy of
> > an old version is the right way to go for them.
> 
> Obviously in general the HTSlib maintainers also prefer it when people do not bundle ancient versions of HTSlib.

I can perfectly imagine - from a Debian point of view we fully agree
with this.

> The kent tree's bundled htslib-1.3 contains a number of local patches to vanilla htslib-1.3:
> 
> * Add new cram_get_Md5/cram_get_ref_url/cram_get_cache_dir/cram_set_cache_url functions which change HTSlib to use the kent tree's reference path and cache mechanisms instead of HTS_REF and HTS_CACHE
> 
> * Switch out knetfile to use the kent tree's networking facilities
> 
> * Stop test_and_fetch() from writing a local copy of downloaded indexes
> 
> * Various other trivia, much of which has already been applied to more recent vanilla HTSlib (or equivalents have been)
> 
> Most of this you can ignore. The only item that causes you much difficulty is the first one, but as a first pass at it you can ignore it and require that users set up both kent's and htslib's variables pointing at the reference sources and cache. This is the cause of the following errors:
> 
> > bamFile.c: In function ‘bamFetchAlreadyOpen’:
> > bamFile.c:190:24: warning: implicit declaration of function ‘cram_get_Md5’; did you mean ‘cram_get_refs’? [-Wimplicit-function-declaration]
> > bamFile.c:197:25: warning: implicit declaration of function ‘cram_get_ref_url’; did you mean ‘cram_get_refs’? [-Wimplicit-function-declaration]
> > bamFile.c:198:26: warning: implicit declaration of function ‘cram_get_cache_dir’; did you mean ‘cram_set_header’? [-Wimplicit-function-declaration]
> > bamFile.c: In function ‘bamAndIndexFetchPlus’:
> > bamFile.c:240:5: warning: implicit declaration of function ‘cram_set_cache_url’ [-Wimplicit-function-declaration]
> 
> ...as they invoke these functions that exist only in the locally patched bundled htslib. These calls just set up the hacked up bundled htslib to use the kent reference path/cache mechanisms. As you're ignoring that, you should just nuke this code from bamFile.c, as in the attached patch.
> 
> > bamFile.c: In function ‘bamGetTargetCount’:
> > bamFile.c:264:21: error: ‘SAM_hdr’ {aka ‘struct sam_hdr_t’} has no member named ‘nref’
> >  264 |     tCount = cramHdr->nref;
> 
> This one is an actual case of using private facilities. It can easily be patched to use proper API functions instead (as attached), which also fixes two bugs in their bamGetTargetCount() function.

Thanks a lot for your patch.  I've now applied it to the Debian
packaging Git[1] and removed the code copy from the upstream tarball to
make sure we have a clean build environment.  Unfortunately I'm running
into another build issue:

...
cc -O -g -g -O2 -fdebug-prefix-map=/build/libucsc-0.0+git20200125.51bc9725+dfsg=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_GNU_SOURCE -DMACHTYPE_x86_64   -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -I../inc -I../../inc -I../../../inc -I../../../../inc -I../../../../../inc -I../htslib -I/include -I/usr/include/libpng16  -o knetUdc.o -c knetUdc.c
knetUdc.c: In function ‘kuOpen’:
knetUdc.c:25:3: error: ‘knetFile’ {aka ‘struct knetFile_s’} has no member named ‘udcf’
   25 | kf->udcf = udcf;
      |   ^~
knetUdc.c:26:57: error: ‘knetFile’ {aka ‘struct knetFile_s’} has no member named ‘udcf’
   26 | verbose(2, "kuOpen: returning %lu\n", (unsigned long)(kf->udcf));
      |                                                         ^~
knetUdc.c: In function ‘kuRead’:
knetUdc.c:40:59: error: ‘knetFile’ {aka ‘struct knetFile_s’} has no member named ‘udcf’
   40 | verbose(2, "udcRead(%lu, buf, %lld)\n", (unsigned long)(fp->udcf), (long long)len);
      |                                                           ^~
...


The reason is that in the libucsc code copy in file
htslib/htslib/knetfile.h defines:

typedef struct knetFile_s {
        int type, fd;
        int64_t offset;
        char *host, *port;

        // the following are for FTP only
        int ctrl_fd, pasv_ip[4], pasv_port, max_response, no_reconnect, is_ready;
        char *response, *retr, *size_cmd;
        int64_t seek_offset; // for lazy seek
    int64_t file_size;

        // the following are for HTTP only
        char *path, *http_host;
        struct udcFile *udcf;
} knetFile;


while official htslib has:


typedef struct knetFile_s {
        int type, fd;
        int64_t offset;
        char *host, *port;

        // the following are for FTP only
        int ctrl_fd, pasv_ip[4], pasv_port, max_response, no_reconnect, is_ready;
        char *response, *retr, *size_cmd;
        int64_t seek_offset; // for lazy seek
    int64_t file_size;

        // the following are for HTTP only
        char *path, *http_host;
} knetFile;


So the code copy is relying on the additional member

     struct udcFile *udcf; 

:-(

Any hint how to work around this?
 
> > [2] https://github.com/ucscGenomeBrowser/kent/issues/13
> 
> On this issue you wrote:
> 
> >> tillea commented on 5 Sep 2018
> >> I admit htslib is famous for breaking its interface.
> 
> Once again: This statement is both rude and false.

I'm very sorry specifically about beeing rude.  I remember I was not in
a good mood when writing this - but it was rather caused by the lack of
blat / ucsc authors to discuss their license.  Its not a good idea to
write in a bad mood in public forums specifically when writing about
third part specifically when writing about third party. :-(

> On the contrary the HTSlib maintainers are careful to minimise breakage to HTSlib interfaces.

I confirm I remember your helpful comments how to enhance the Debian
packaging and I really appreciate your help.
 
> Presumably this refers to the various cases of removal of private HTSlib functions that Debian has complained about over the years. The Debian attitude has been that these symbols existed in libhts.so therefore were part of the interface; the HTSlib attitude has been that these functions were never declared in the headers (so they were not callable by user programs) and not documented (so you wouldn't know what to expect them to do anyway) therefore they were not part of the interface. Not acknowledging this distinction is disingenuous.

Also agreed and sorry again.  I for myself need to admit that I'm doing
the htslib maintenance since some time since the former maintainer
became inactive.  I'm not familiar with his motivation and former
discussion, neither am I a user of htslib myself - I simply try to do my
best based on user and upstream comments and bug reports.  So your
contribution is extremely valuable and I hope that we can keep on this
nice ccoperation.

> Debian will be pleased that as of soversion 3, any remaining such private functions no longer appear as visible symbols in libhts.so. So I anticipate that Debian will no longer be accusing htslib of breaking interfaces in this way in the future.

As I tried to express above: Its wrong anyway to blame you somewhere
else about this and I hope we can forget this and clarify potential
issues directly.

> Good luck with packaging blat.

Thanks a lot.  My plan is to strip as much from the source code as
possible to successfully package blat and than ask for a license change
of the remaining code.

Kind regards and thanks again for your help and your patience

      Andreas.
 

[1] https://salsa.debian.org/med-team/libucsc

-- 
http://fam-tille.de



More information about the Debian-med-packaging mailing list