[Debian-med-packaging] Bug#1028713: Bug#1028713: pigx-rnaseq: FTBFS: E: Build killed with signal TERM after 150 minutes of inactivity

Étienne Mollier emollier at emlwks999.eu
Sat Jan 14 17:15:14 GMT 2023


Control: tags -1 confirmed
Control: reassign -1 salmon
Control: affects -1 pigx-rnaseq
Control: retitle -1 salmon call causing pigx-rnaseq to infinitely loop

Hi,

Lucas Nussbaum, on 2023-01-14:
> > /usr/bin/Rscript --vanilla /<<PKGBUILDDIR>>/scripts/runDeseqReport.R --logo=/<<PKGBUILDDIR>>/images/Logo_PiGx.png --prefix='analysis1' --reportFile=/<<PKGBUILDDIR>>/scripts/deseqReport.Rmd --countDataFile=/<<PKGBUILDDIR>>/tests/output/feature_counts/raw_counts/hisat2/counts.tsv --colDataFile=/<<PKGBUILDDIR>>/tests/output/colData.tsv --gtfFile=/<<PKGBUILDDIR>>/tests/sample_data/sample.gtf --caseSampleGroups='HBR' --controlSampleGroups='UHR' --covariates=''  --workdir=/<<PKGBUILDDIR>>/tests/output/report/hisat2 --organism='' --description='This analysis is part of the pigx-rnaseq build-time tests.' --selfContained='True' >> /<<PKGBUILDDIR>>/tests/output/logs/hisat2/analysis1.report.log 2>&1
> > [Fri Jan 13 20:26:39 2023]
> > Finished job 38.
> > 35 of 48 steps (73%) done
> > [Fri Jan 13 20:26:53 2023]
> > Finished job 45.
> > 36 of 48 steps (75%) done
> > E: Build killed with signal TERM after 150 minutes of inactivity

From preliminary investigation, this looks to be caused by a
salmon process caught in infinite loop at the very step shown by
the hereover hardcopy.  Running an strace on salmon process does
not show any system call, which does not help locating where
we're caught.

If I attach to the process with gdb, I notice the loop seems to
be in libjemalloc2, oscillating between functions from
jemalloc/internal/ph.h and jemalloc/internal/edata.h when
running step by step.  Here after is a backtrace caught at a
random point in time during salmon's infinite loop:

	#0  phn_prev_set (offset=<optimized out>, prev=<optimized out>, 
	    phn=<optimized out>) at include/jemalloc/internal/ph.h:128
	#1  phn_merge_ordered (cmp=<optimized out>, offset=40, phn1=0x7f495fa17280, 
	    phn0=0x7f495fa17280) at include/jemalloc/internal/ph.h:126
	#2  phn_merge (cmp=<optimized out>, offset=40, phn1=0x7f495fa17280, 
	    phn0=0x7f495fa17280) at include/jemalloc/internal/ph.h:146
	#3  phn_merge_siblings (cmp=<optimized out>, offset=40, phn=<optimized out>)
	    at include/jemalloc/internal/ph.h:209
	#4  ph_merge_aux (cmp=<optimized out>, offset=40, ph=0x7f495fa03b38)
	    at include/jemalloc/internal/ph.h:231
	#5  ph_first (cmp=<optimized out>, offset=40, ph=0x7f495fa03b38, 
	    ph at entry=0xd4eb5f85f001) at include/jemalloc/internal/ph.h:265
	#6  je_edata_heap_first (ph=ph at entry=0x7f495fa03b38) at src/edata.c:6
	#7  0x00007f4961338cb9 in eset_first_fit (lg_max_fit=<optimized out>, 
	    exact_only=<optimized out>, size=4096, eset=0x7f495fa03b18)
	    at src/eset.c:243
	#8  je_eset_fit (eset=eset at entry=0x7f495fa03b18, esize=esize at entry=4096, 
	    alignment=alignment at entry=4096, exact_only=exact_only at entry=false, 
	    lg_max_fit=<optimized out>) at src/eset.c:270
	#9  0x00007f496133b958 in extent_recycle_extract (ehooks=0x7f495fa000c0, 
	    guarded=false, alignment=4096, size=4096, expand_edata=0x0, 
	    ecache=0x7f495fa03aa8, pac=0x7f495fa03a70, tsdn=0x7f495fdfe968)
	    at src/extent.c:437
	#10 extent_recycle (tsdn=tsdn at entry=0x7f495fdfe968, 
	    pac=pac at entry=0x7f495fa03a70, ehooks=ehooks at entry=0x7f495fa000c0, 
	    ecache=ecache at entry=0x7f495fa03aa8, expand_edata=expand_edata at entry=0x0, 
	    size=size at entry=4096, alignment=<optimized out>, zero=<optimized out>, 
	    commit=<optimized out>, growing_retained=<optimized out>, 
	    guarded=<optimized out>) at src/extent.c:606
	#11 0x00007f496133bad9 in je_ecache_alloc (tsdn=tsdn at entry=0x7f495fdfe968, 
	    pac=pac at entry=0x7f495fa03a70, ehooks=ehooks at entry=0x7f495fa000c0, 
	    ecache=ecache at entry=0x7f495fa03aa8, expand_edata=expand_edata at entry=0x0, 
	    size=size at entry=4096, alignment=4096, zero=false, guarded=false)
	    at src/extent.c:87
	#12 0x00007f4961346d5d in pac_alloc_real (tsdn=tsdn at entry=0x7f495fdfe968, 
	    pac=pac at entry=0x7f495fa03a70, ehooks=ehooks at entry=0x7f495fa000c0, 
	    size=size at entry=4096, alignment=alignment at entry=4096, 
	    zero=zero at entry=false, guarded=false) at src/pac.c:116
	#13 0x00007f4961346ee6 in pac_alloc_impl (tsdn=0x7f495fdfe968, 
	    self=0x7f495fa03a70, size=4096, alignment=4096, zero=<optimized out>, 
	    guarded=<optimized out>, frequent_reuse=true, 
	    deferred_work_generated=0x7ffea3d0cb37) at src/pac.c:178
	#14 0x00007f4961345c5c in pai_alloc (deferred_work_generated=0x7ffea3d0cb37, 
	    frequent_reuse=true, guarded=<optimized out>, zero=false, 
	    alignment=<optimized out>, size=4096, self=0x7f495fa03a70, 
	    tsdn=0x7f495fdfe968) at include/jemalloc/internal/pai.h:43
	#15 je_pa_alloc (tsdn=tsdn at entry=0x7f495fdfe968, 
	    shard=shard at entry=0x7f495fa03a58, size=4096, 
	    alignment=alignment at entry=4096, slab=slab at entry=true, 
	    szind=szind at entry=20, zero=false, guarded=false, 
	    deferred_work_generated=0x7ffea3d0cb37) at src/pa.c:139
	#16 0x00007f49612e7a8d in arena_slab_alloc (tsdn=tsdn at entry=0x7f495fdfe968, 
	    arena=arena at entry=0x7f495fa010c0, binind=binind at entry=20, 
	    binshard=binshard at entry=0, 
	    bin_info=bin_info at entry=0x7f4961588f20 <je_bin_infos+800>)
	    at src/arena.c:839
	#17 0x00007f49612e8f00 in je_arena_cache_bin_fill_small (
	    tsdn=tsdn at entry=0x7f495fdfe968, arena=arena at entry=0x7f495fa010c0, 
	    cache_bin=cache_bin at entry=0x7f495fdfeea8, cache_bin_info=<optimized out>, 
	    binind=binind at entry=20, nfill=1) at src/arena.c:1034
	#18 0x00007f496135b3d9 in je_tcache_alloc_small_hard (
	    tsdn=tsdn at entry=0x7f495fdfe968, arena=arena at entry=0x7f495fa010c0, 
	    tcache=tcache at entry=0x7f495fdfecc0, 
	    cache_bin=cache_bin at entry=0x7f495fdfeea8, binind=binind at entry=20, 
	    tcache_success=tcache_success at entry=0x7ffea3d0cc70) at src/tcache.c:238
	#19 0x00007f49612d5612 in tcache_alloc_small (slow_path=<optimized out>, 
	    zero=false, binind=20, size=<optimized out>, tcache=0x7f495fdfecc0, 
	    arena=0x7f495fa010c0, tsd=0x7f495fdfe968)
	    at include/jemalloc/internal/tcache_inlines.h:68
	#20 arena_malloc (slow_path=<optimized out>, tcache=0x7f495fdfecc0, 
	    zero=false, ind=20, size=<optimized out>, arena=0x0, tsdn=0x7f495fdfe968)
	    at include/jemalloc/internal/arena_inlines_b.h:151
	#21 iallocztm (slow_path=<optimized out>, arena=0x0, is_internal=false, 
	    tcache=0x7f495fdfecc0, zero=false, ind=20, size=<optimized out>, 
	    tsdn=0x7f495fdfe968)
	    at include/jemalloc/internal/jemalloc_internal_inlines_c.h:55
	#22 imalloc_no_sample (ind=20, usize=1024, size=<optimized out>, 
	    tsd=0x7f495fdfe968, dopts=<synthetic pointer>, sopts=<synthetic pointer>)
	    at src/jemalloc.c:2398
	#23 imalloc_body (tsd=0x7f495fdfe968, dopts=<synthetic pointer>, 
	    sopts=<synthetic pointer>) at src/jemalloc.c:2573
	#24 imalloc (dopts=<optimized out>, sopts=<optimized out>)
	    at src/jemalloc.c:2687
	#25 je_malloc_default (size=<optimized out>) at src/jemalloc.c:2722
	#26 0x00007f4961360df9 in fallback_impl<false> (size=928)
	    at src/jemalloc_cpp.cpp:98
	#27 0x00007f4961360ed5 in imalloc_fastpath (
	    fallback_alloc=0x7f4961360df0 <fallback_impl<false>(std::size_t)>, 
	    size=<optimized out>)
	    at include/jemalloc/internal/jemalloc_internal_inlines_c.h:266
	#28 0x000055a26c31cc0d in std::__new_allocator<std::pair<unsigned long, bool> >::allocate (this=0x7f495f8945a8, __n=<optimized out>)
	    at /usr/include/c++/12/bits/new_allocator.h:112
	#29 std::allocator_traits<std::allocator<std::pair<unsigned long, bool> > >::allocate (__n=<optimized out>, __a=...)
	    at /usr/include/c++/12/bits/alloc_traits.h:464
	#30 std::_Vector_base<std::pair<unsigned long, bool>, std::allocator<std::pair<unsigned long, bool> > >::_M_allocate (__n=<optimized out>, 
	    this=<optimized out>) at /usr/include/c++/12/bits/stl_vector.h:378
	#31 std::vector<std::pair<unsigned long, bool>, std::allocator<std::pair<unsigned long, bool> > >::_M_default_append (this=0x7f495f8945a8, __n=58)
	    at /usr/include/c++/12/bits/vector.tcc:650
	#32 0x000055a26c30340c in std::vector<std::pair<unsigned long, bool>, std::allocator<std::pair<unsigned long, bool> > >::resize (__new_size=<optimized out>, 
	    this=<optimized out>) at /usr/include/c++/12/bits/stl_vector.h:1011
	#33 pufferfish::BinaryGFAReader::parseFile (this=0x7ffea3d0e340)
	    at ./external/pufferfish/src/PufferfishBinaryGFAReader.cpp:161
	#34 0x000055a26c306d10 in pufferfishIndex (indexOpts=...)
	    at ./external/pufferfish/src/PufferfishIndexer.cpp:541
	#35 0x000055a26bee06ae in SalmonIndex::buildPuffIndex_ (idxOpt=..., 
	    indexDir=..., this=0x7f495f83e280) at ./include/SalmonIndex.hpp:111
	#36 SalmonIndex::build (idxOpt=..., indexDir=..., this=0x7f495f83e280)
	    at ./include/SalmonIndex.hpp:76
	#37 salmonIndex (argc=<optimized out>, argv=<optimized out>)
	    at ./src/BuildSalmonIndex.cpp:247
	#38 0x000055a26be7b510 in std::function<int (int, char const**, std::unique_ptr<SalmonIndex, std::default_delete<SalmonIndex> >&)>::operator()(int, char const**, std::unique_ptr<SalmonIndex, std::default_delete<SalmonIndex> >&) const (
	    __args#2=std::unique_ptr<SalmonIndex> = {...}, __args#1=<optimized out>, 
	    __args#0=<optimized out>, this=0x7f495f84e338)
	    at /usr/include/c++/12/bits/std_function.h:591
	#39 main (argc=<optimized out>, argv=0x7ffea3d0fea8) at ./src/Salmon.cpp:267

Here is an almost minimal repreducer, using the pigx-rnaseq test
data raising the issue (that's 6.1M of data, so not exactly
minimal neither):

	wget 'https://salsa.debian.org/med-team/pigx-rnaseq/-/raw/debian/0.1.0-1.1/tests/sample_data/sample.cdna.fasta'
	salmon index -t sample.cdna.fasta -i salmon_index -p 8

Interestingly my "reproducer" is flaky and tends to raise
segmentation faults on occasions, instead of getting caught into
infinite loop.  Actually I just witnessed salmon failing to
build from source with such an error.  Other test data may show
consistent crash, or the indexing go through completely.

The issue could very well be a jemalloc bug, although other
reverse dependencies do not seem to be affected by similar
issues, but maybe this is something to keep in mind.  The issue
would need to be reduced further to identify the root cause
exactly.

Have a nice day,  :)
-- 
Étienne Mollier <emollier at emlwks999.eu>
Fingerprint:  8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
Sent from /dev/pts/1, please excuse my verbosity.
On air: Big Big Train - London Plane
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-med-packaging/attachments/20230114/85ce4548/attachment.sig>


More information about the Debian-med-packaging mailing list