Fwd: Incoming Debian freeze - it looks like this mail bounced, not sure if you got it or not

Mon Jan 13 09:30:47 GMT 2025

Hi

On Sunday 12 January 2025 17:04:34 CET you wrote:
> I received a message that mailly.debian.org couldn't send this mail to
> your free.fr account.

This is weird. Moreover, your mail was not sent to pkg-rakudo-devel list. So 
there may be an issue between your mail server and Debian's mail server.

> I've been poking at the freeze on amd64 occasionally over the last few
> weeks. Unfortunately, it apparently occurs only about once in every
> hundred runs or so. I caught it in the debugger but the state it's in
> when I hit the check doesn't explain how it could have reached the state
> in question. It remains mysterious and the way I usually deal with
> mysterious bugs like that is to run them under "rr record".
> Unfortunately, RR is currently x86_64 only. There is a way to run qemu
> (presumably qemu-system) under rr, but I haven't tried it yet. It might
> be quite slow, and there's no guarantee I will actually hit the error at
> all on an emulated system ...

> Is it an option at all to skip the arches where we get freezes?

Yes. It's a parameter in debian/control file.

Hopefully, this is a word around until a proper fix is found.

> If I turn the freeze into a crash reliably (I can turn at least a
> fraction of the freezes into crashes, but I can't guarantee it's all)
> and we run the compilation steps of rakudo multiple times, would that be
> acceptable?

Can you do that on porterbox ? Or is this problem happening only on Debian 
build deamons ?

> Unfortunately, until I can verify the actual cause, it's
> probably also possible for these freezes (alternatively, crashes) to
> happen in regular user code.

> It might be possible to react to the "it went wrong", correct the state
> we're in, and try again. My current theory is that we're not doing
> memory barriering properly for some inter-thread communication, so that
> one core reads bogus data for a little bit in a piece of code that
> essentially acts like a tiny interpreter. I have not yet investigated
> how exactly cache coherency and such differs from x86_64 and arm64 (and
> arm32, the other arch where build machines encounter freezes). Maybe
> that's the ticket.

> I have also not yet had the time to check what exactly the current
> problem with FTBFS from the "outdated or wrong version of dependency"
> issue is.
> 
> My last status was that after the transition, everything should have
> been fixed.
> Did we perhaps forget to kick off rebuild attempts?

AFAIR, everything was rebuilt during the transition. Then the problem with 
build hang showed up and was reported as a serious bug. This triggered a 
removal or rakudo and its dependencies from testing.

> Or did I need to mark all the relevant FTBFS bugs as "resolved"?

Difficult to say it that's a problem due to the build hang issue.

Let's first restrict rakudo to build on architectures known to work. What are 
they ?

Then we will be able to check what's going on with module builds.

> Maybe it would be as simple as that. It kind of feels like that last one
> is probably what's gone wrong.
> 
> Thanks again for holding my hand through all the processes.

All the best