[Debian-med-packaging] Bug#1004037: Segmentation fault in plink2 (Was: src:plink2: fails to migrate to testing for too long: autopkgtest regression)

Chris Chang chrchang523 at gmail.com
Sat Feb 19 03:27:20 GMT 2022


I have confirmed that the original code segfaults when compiled with gcc
11.2 on my Debian instance, and that it runs to completion after the latest
patch.  I have also confirmed that, after the latest patch, all other tests
pass.

On Fri, Feb 18, 2022 at 1:27 PM Chris Chang <chrchang523 at gmail.com> wrote:

> I am installing gcc-11 on my Debian instance now, and will be running more
> extensive tests today searching for other things that may have stopped
> working for the same reason.
>
> On Fri, Feb 18, 2022 at 1:22 PM Andreas Tille <andreas at an3as.eu> wrote:
>
>> I confirm its gcc-11.  I'll check tomorrow.  Thanks a lot for your quick
>> and helpful responses, Andreas.
>>
>> Am Fri, Feb 18, 2022 at 12:53:58PM -0800 schrieb Chris Chang:
>> > I have posted an update under the provisional assumption that it's gcc
>> 11's
>> > new ipa-modref pass that is causing this code to fail, since it does
>> seem
>> > to break some similar code.
>> >
>> > On Fri, Feb 18, 2022 at 11:49 AM Chris Chang <chrchang523 at gmail.com>
>> wrote:
>> >
>> > > What compiler version are you using?  This implies that the pgl_malloc
>> > > inline function is not being compiled to the expected code; there is
>> an
>> > > existing non-inlined version that is used for very old gcc versions,
>> but it
>> > > looks like it may also be needed here.
>> > >
>> > > On Fri, Feb 18, 2022 at 11:40 AM Andreas Tille <andreas at an3as.eu>
>> wrote:
>> > >
>> > >> Hi again,
>> > >>
>> > >> I applied this patch and now I get:
>> > >>
>> > >> (gdb) run
>> > >> Starting program: /usr/lib/plink2/plink2-sse2 --debug --pfile
>> tmp_data
>> > >> --export vcf vcf-dosage=DS --out tmp_data2
>> > >> [Thread debugging using libthread_db enabled]
>> > >> Using host libthread_db library
>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> > >> [New Thread 0x7ffff4cc7640 (LWP 4060797)]
>> > >> [New Thread 0x7fffec4c6640 (LWP 4060798)]
>> > >> [New Thread 0x7fffebcc5640 (LWP 4060799)]
>> > >> PLINK v2.00a3 64-bit (29 Jan 2022)
>> > >> www.cog-genomics.org/plink/2.0/
>> > >> (C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public
>> > >> License v3
>> > >> Logging to tmp_data2.log.
>> > >> Options in effect:
>> > >>   --debug
>> > >>   --export vcf vcf-dosage=DS
>> > >>   --out tmp_data2
>> > >>   --pfile tmp_data
>> > >>
>> > >> Start time: Fri Feb 18 19:06:45 2022
>> > >> 31998 MiB RAM detected; reserving 15999 MiB for main workspace.
>> > >> Using up to 4 compute threads.
>> > >> [New Thread 0x7ffff7fc5640 (LWP 4060800)]
>> > >> sizeof(PhenoCol): 40  pheno_cols: 0
>> > >> --debug: setting pheno_cols[0].nonmiss. = nullptr
>> > >>
>> > >> Thread 1 "plink2-sse2" received signal SIGSEGV, Segmentation fault.
>> > >> 0x00005555556fb82e in plink2::LoadPsam (psamname=psamname at entry
>> =0x7fffffffbe70
>> > >> "tmp_data.psam", pheno_range_list_ptr=<optimized out>, fam_cols=...,
>> > >> pheno_ct_max=<optimized out>,
>> > >>     missing_pheno=<optimized out>, affection_01=0, max_thread_ct=4,
>> > >> piip=0x7fffffff8880, sample_include_ptr=0x7fffffff8790,
>> > >> founder_info_ptr=0x7fffffff87a8, sex_nm_ptr=0x7fffffff8798,
>> > >>     sex_male_ptr=0x7fffffff87a0, pheno_cols_ptr=0x7fffffff8770,
>> > >> pheno_names_ptr=0x7fffffff8780, raw_sample_ct_ptr=0x7fffffff8728,
>> > >> pheno_ct_ptr=0x7fffffff8720,
>> > >>     max_pheno_name_blen_ptr=0x7fffffff87b0) at ../plink2_psam.cc:615
>> > >> warning: Source file is more recent than executable.
>> > >> 615             pheno_cols[pheno_idx].nonmiss = nullptr;
>> > >>
>> > >> Kind regards
>> > >>
>> > >>       Andreas.
>> > >>
>> > >> Am Fri, Feb 18, 2022 at 08:45:12AM -0800 schrieb Chris Chang:
>> > >> > Ok, I don't know why that particular line would fail, but I've
>> added
>> > >> > another debug-print before it on GitHub.
>> > >> >
>> > >> > On Fri, Feb 18, 2022 at 4:24 AM Andreas Tille <
>> andreas at fam-tille.de>
>> > >> wrote:
>> > >> >
>> > >> > > Hi Chris,
>> > >> > >
>> > >> > > Am Thu, Feb 17, 2022 at 07:13:49PM -0800 schrieb Chris Chang:
>> > >> > > > I was unable to replicate this issue on a Debian EC2 instance.
>> > >> However,
>> > >> > > > there are very few things that happen between printing "End
>> time:"
>> > >> and
>> > >> > > > program exit, so I have added a bunch of debug-prints (active
>> when
>> > >> the
>> > >> > > > --debug flag is passed in) to the latest GitHub commit that
>> should
>> > >> reveal
>> > >> > > > which of those few things is triggering the segfault; let me
>> know
>> > >> if you
>> > >> > > > are able to run this build.
>> > >> > >
>> > >> > > I think the issue is a bit more complex.  Debian provides a
>> wrapper
>> > >> > > which calls the best / most performant plink2.  The issue seems
>> to
>> > >> > > occure for SFX=avx.  First I do:
>> > >> > >
>> > >> > >
>> > >> > >    /usr/lib/plink2/plink2-avx --debug --dummy 33 65537 0.1
>> > >> dosage-freq=0.1
>> > >> > > --out tmp_data
>> > >> > >
>> > >> > > This works.  In the next step I fire up gdb then which results in
>> > >> > >
>> > >> > >
>> > >> > > (gdb) run
>> > >> > > Starting program: /usr/lib/plink2/plink2-avx --debug --pfile
>> tmp_data
>> > >> > > --export vcf vcf-dosage=DS --out tmp_data2
>> > >> > > [Thread debugging using libthread_db enabled]
>> > >> > > Using host libthread_db library
>> > >> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> > >> > > [New Thread 0x7ffff4cc7640 (LWP 2931408)]
>> > >> > > [New Thread 0x7ffff44c6640 (LWP 2931409)]
>> > >> > > [New Thread 0x7fffebcc5640 (LWP 2931411)]
>> > >> > > PLINK v2.00a3 SSE4.2 (29 Jan 2022)
>> > >> > > www.cog-genomics.org/plink/2.0/
>> > >> > > (C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General
>> Public
>> > >> > > License v3
>> > >> > > Logging to tmp_data2.log.
>> > >> > > Options in effect:
>> > >> > >   --debug
>> > >> > >   --export vcf vcf-dosage=DS
>> > >> > >   --out tmp_data2
>> > >> > >   --pfile tmp_data
>> > >> > >
>> > >> > > Start time: Fri Feb 18 11:58:49 2022
>> > >> > > 31998 MiB RAM detected; reserving 15999 MiB for main workspace.
>> > >> > > Using up to 4 compute threads.
>> > >> > > [New Thread 0x7ffff7fc5640 (LWP 2931412)]
>> > >> > >
>> > >> > > Thread 1 "plink2-avx" received signal SIGSEGV, Segmentation
>> fault.
>> > >> > > plink2::LoadPsam (psamname=psamname at entry=0x7fffffffbe70
>> > >> "tmp_data.psam",
>> > >> > > pheno_range_list_ptr=<optimized out>, fam_cols=...,
>> > >> pheno_ct_max=<optimized
>> > >> > > out>,
>> > >> > >     missing_pheno=<optimized out>, affection_01=0,
>> max_thread_ct=4,
>> > >> > > piip=0x7fffffff8880, sample_include_ptr=0x7fffffff87a0,
>> > >> > > founder_info_ptr=0x7fffffff87b8, sex_nm_ptr=0x7fffffff87a8,
>> > >> > >     sex_male_ptr=0x7fffffff87b0, pheno_cols_ptr=0x7fffffff8780,
>> > >> > > pheno_names_ptr=0x7fffffff8790, raw_sample_ct_ptr=0x7fffffff8738,
>> > >> > > pheno_ct_ptr=0x7fffffff8730,
>> > >> > >     max_pheno_name_blen_ptr=0x7fffffff87c0) at
>> ../plink2_psam.cc:611
>> > >> > > warning: Source file is more recent than executable.
>> > >> > > 611             pheno_cols[pheno_idx].nonmiss = nullptr;
>> > >> > >
>> > >> > >
>> > >> > > I also added some more debug lines in a patch[1].
>> > >> > >
>> > >> > > It seems that there is actually the weak part of the code since
>> the
>> > >> > > output turns to
>> > >> > >
>> > >> > > ...
>> > >> > > Start time: Fri Feb 18 13:19:13 2022
>> > >> > > 31998 MiB RAM detected; reserving 15999 MiB for main workspace.
>> > >> > > Using up to 4 compute threads.
>> > >> > > [New Thread 0x7ffff7fc5640 (LWP 3957711)]
>> > >> > > --debug: setting pheno_cols[0].nonmiss. = nullptr
>> > >> > >
>> > >> > > Thread 1 "plink2-sse2" received signal SIGSEGV, Segmentation
>> fault.
>> > >> > > 0x00005555556fb6ff in plink2::LoadPsam (psamname=psamname at entry
>> > >> =0x7fffffffbe70
>> > >> > > "tmp_data.psam", pheno_range_list_ptr=<optimized out>,
>> fam_cols=...,
>> > >> > > pheno_ct_max=<optimized out>,
>> > >> > >     missing_pheno=<optimized out>, affection_01=0,
>> max_thread_ct=4,
>> > >> > > piip=0x7fffffff8880, sample_include_ptr=0x7fffffff87a0,
>> > >> > > founder_info_ptr=0x7fffffff87b8, sex_nm_ptr=0x7fffffff87a8,
>> > >> > >     sex_male_ptr=0x7fffffff87b0, pheno_cols_ptr=0x7fffffff8780,
>> > >> > > pheno_names_ptr=0x7fffffff8790, raw_sample_ct_ptr=0x7fffffff8738,
>> > >> > > pheno_ct_ptr=0x7fffffff8730,
>> > >> > >     max_pheno_name_blen_ptr=0x7fffffff87c0) at
>> ../plink2_psam.cc:614
>> > >> > > warning: Source file is more recent than executable.
>> > >> > > 614             pheno_cols[pheno_idx].nonmiss = nullptr;
>> > >> > >
>> > >> > >
>> > >> > > I hope this might help a bit to track down the issue
>> > >> > >
>> > >> > >     Andreas.
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > [1]
>> > >> > >
>> > >>
>> https://salsa.debian.org/med-team/plink2/-/blob/master/debian/patches/debug2.patch
>> > >> > >
>> > >> > > --
>> > >> > > http://fam-tille.de
>> > >> > >
>> > >>
>> > >> --
>> > >> http://fam-tille.de
>> > >>
>> > >
>>
>> --
>> http://fam-tille.de
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-packaging/attachments/20220218/3ad2acf8/attachment-0001.htm>


More information about the Debian-med-packaging mailing list