[xml/sgml-pkgs] Bug#858405: xmlto: intermittent Segmentation fault when building manpages for libreswan on mips64el

YunQiang Su wzssyqa at gmail.com
Fri Mar 24 09:36:31 UTC 2017


On Fri, Mar 24, 2017 at 1:06 AM, James Cowgill <jcowgill at debian.org> wrote:
> reassign 858405 xsltproc
> forcemerge 750593 858405
> retitle 750593 xsltproc: bus error on some arches with linux < 4.1
> thanks
>
> Hi,
>
> On 22/03/17 21:01, Daniel Kahn Gillmor wrote:
>> On Wed 2017-03-22 06:22:41 -0400, James Cowgill wrote:
>>> On 22/03/17 01:29, Daniel Kahn Gillmor wrote:
>>>> For debian revisions of 3.20, failures happened on:
>>>>
>>>>   mipsel-manda-02
>>>>   eberlin
>>>>
>>>> Also for revisions of 3.20, successes happened on:
>>>>
>>>>   mipsel-sil-01
>>>>   mipsel-manda-03
>>>>   mipsel-manda-01
>>>
>>> This is a known issue and it only affects Loongson buildds.
>>> Interestingly mipsel-manda-01 is Loongson and didn't fail there so there
>>> may be a random element involved here. I don't think anyone's tracked
>>> down the underlying issue though.
>>
>> thanks, is there a public reference for the known issue that we can
>> point to?
>
> I think #750593 looks a lot like the bug here.
>
> After some investigation, it seems I was being a bit unfair to Loongson.
> This is arguably a non mips specific bug in Linux < 4.1. It just so
> happens that all the Loongson buildds run jessie's 3.16 kernel and all
> the other buildds run >= 4.7 from backports.
>
> In #750593 there was lots of talk about stack overflows causing this but
> there is actually another element to this. Indeed if I reduced the stack
> size down with ulimit, the segfaults become more frequent, but
> increasing the stack size didn't help at all. After looking at the
> mappings for a failing process, I saw this (taken just after starting
> xsltproc):
>
> [...]
>> fff7f50000-fff7f5c000 ---p 00004000 fd:00 1060250                        /usr/lib/mips64el-linux-gnuabi64/libeatmydata.so.1.1.2
>> fff7f5c000-fff7f60000 rw-p 00000000 fd:00 1060250                        /usr/lib/mips64el-linux-gnuabi64/libeatmydata.so.1.1.2
>> fff7f60000-fff7f88000 r-xp 00000000 fd:00 1060375                        /lib/mips64el-linux-gnuabi64/ld-2.24.so
>> fff7f94000-fff7f98000 rw-p 00024000 fd:00 1060375                        /lib/mips64el-linux-gnuabi64/ld-2.24.so
>> fff7f98000-fff7fa0000 r-xp 00000000 fd:00 947544                         /usr/bin/xsltproc
>> fff7fa4000-fff7fac000 rw-p 00000000 00:00 0
>> fff7fac000-fff7fb0000 rw-p 00004000 fd:00 947544                         /usr/bin/xsltproc
>> ffff1d4000-ffff384000 rwxp 00000000 00:00 0                              [heap]
>> ffff9e0000-ffffa04000 rwxp 00000000 00:00 0                              [stack]
>> ffffffc000-10000000000 r-xp 00000000 00:00 0                             [vdso]
>
> Notice that there is a very small gap between the heap and the stack
> here (at least compared to working xsltproc runs). I think that the heap
> is growing to a point where it limits the maximum size of the stack and
> so increasing the stack size with ulimit doesn't help.
>
> The reason the program and the heap are at these very high addresses is
> that xsltproc is built with PIE and the kernel is treating the
> executable like a mmap and grouping it with all the other libraries. In
> d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR") the behavior
> changed and now the program and it's heap will be mapped at a lower
> address so the bug does not affect newer kernels. Using "setarch -L" or
> "setarch -R" is another workaround for this bug because that moves the
> program so that there is a much larger gap between the heap and the stack.
>
> This might affect other applications as well. Effectively it means that
> PIE executables which use lots of stack space might not work properly
> with jessie's kernel. The chances the bug will be hit seems to vary
> between arches however (depending on what each arch does in
> arch_pick_mmap_layout and arch_randomize_brk) - mips64el seems to be hit
> pretty frequently. In xsltproc's case, PIE was enabled some time ago
> which is why this bug is quite old.
>
> I believe any of the following will fix this (but have not all been tested):
> - Reduce the stack usage in xsltproc (the upstream bug)
> - Upgrade the relevant buildds to Linux >= 4.1
> - Apply d1fd836dcf00 to jessie's kernel
> - Disable PIE in xsltproc.
> - Run xsltproc inside setarch -L / setarch -R
>

we have some trouble to run newer kernel on some Loongson machines,
as their pmon can only load initrd with limit size.
So backports patch may ideal for us, now.

>>> For the moment, I'll rebuild libreswan again and hope a good buildd is
>>> picked.
>>
>> i see 5 mips64el rebuilds now at
>> https://buildd.debian.org/status/logs.php?pkg=libreswan&ver=3.20-6&suite=sid,
>> but none of them have succeded yet :/
>>
>> 3 of the builds are from mipsel-manda-02, 1 is from eberlin, and one
>> additional new "bad" builder is:
>>
>>       mipsel-aql-01
>
> There are 3 non-Loongson buildds: mipsel-aql-03, mipsel-manda-03 and
> mipsel-sil-01. I expect libreswan will only build on one of those
> buildds at the moment.
>
> Thanks,
> James
>



-- 
YunQiang Su



More information about the debian-xml-sgml-pkgs mailing list